poor man's mmap "sliding window" on Python 2.5.x

Martin Langhoff martin.langhoff at gmail.com
Fri Jul 3 12:14:46 EDT 2009


Still working on reading and validating Canonical JSON files that are
larger than available memory.

Along the way, found that Python 2.5.x doesn't support an offset to
mmap(), which at first blush makes re-mapping with a sliding window
problematic. Well, almost. If you mmap.close(), re-create the mmap and
start reading at an offset (m[myoffset]), python knows how to DTRT.

So every N number of reads (random or linear), close and re-mmap the
fh. If the reads are short, the memory used by N reads will be roughly

   N * mmap.PAGESIZE

Where pagesize is usually, 4KB. So re-mapping every 4MB for example
keeps the whole process under 6MB while working through a file that is
183MB.

On the XO-1, it's the difference of "churning through it" and slowing
the whole OS to a crawl, and then inching towards a big OOM zap.

cheers,



martin
-- 
 martin.langhoff at gmail.com
 martin at laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff



More information about the Devel mailing list