Python fh.seek() oddity

Fri Aug 14 14:02:27 EDT 2009

Seems one of my roles in life is to find all the oddities in Python.
Hints from truly experienced Pythonistas welcome.

I am finding that after I do

  fh.seek(pos)
  buf = fh.read(pagesz)
  match = regexobj.search(buf)

the next fh.seek() will always be to _at least_ the end of the match.
I can no longer fh.seek(pos) or fh.seek(pos+1) -- the call succeeds
but the next read() _always_ starts at the end of the last match. But!
We never seek to that particular point.

Is this expected? Known? Normal when reading the docs under the
influence of powerful drugs?

Sample script is attached for the truly curious - try it on a large
CJSON file, providing a separate file with dict keys to search for.
Once you get past the first 'page', and the code has to re-seek after
it found the match, you'll see the problem.

I suspect it's related to Python possibly having the file mmap()ed
behind the scenes, without telling me.

(What I am writing is actually a grep over very large files for cases
where python's mmap is not available. Say for instance our wonky
initrd).

cheers,

m
-- 
 martin.langhoff at gmail.com
 martin at laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: grepforit.py
Type: text/x-python
Size: 3157 bytes
Desc: not available
URL: <http://lists.laptop.org/pipermail/devel/attachments/20090814/d04316ef/attachment.py>