[OLPC-devel] Re: pygtk performance issue

Ian Bicking ianb at colorstudy.com
Wed Sep 6 12:44:03 EDT 2006


Mitch Bradley wrote:
> As an example of what one might do:  The search path resolution 
> mechanism could notice, while looking for file X in a list of 
> directories, that some of those directories don't exist.  It could then 
> prune those entries from the path list.

If you want to go down this route, the trimming should happen in site.py 
(which, in turn, is probably where the superfluous entries were added). 
  Python's site.py is a mess IMHO, and a custom OLPC version seems quite 
reasonable.

Some people on some platforms have reported improvements when using zip 
files instead of directories of modules, because the available files are 
listed all in one place.  Also, Python doesn't expect a zip file to 
change, but will rescan directories frequently during subsequent 
imports.  There's a memory overhead for zip files, in part just because 
the file listing is kept in memory, and I don't know (I would doubt) 
that .so files can be imported out of a zip file.

Also, packages and relative imports cause some overhead, because Python 
must always search for nearby modules before looking for global modules. 
  But there's not much to be done about this; in Python 2.5 you could 
turn on explicit relative imports, but that has to be done 
module-by-module, and the result is not Python 2.4 compatible.  But I 
think the Python 2.5 stat caching improvements might help here (even 
without explicit relative imports), where simply trimming sys.path won't.

There was also the issue that Mike Heam brought up about the linking 
overhead of importing _gtk.so in particular.  Here I must profess that I 
am largely ignorant of the issues, but I'll speculate anyway.  Perhaps 
just splitting up _gtk.so into separate modules would be helpful, 
particularly if there are really different kinds of functionality so 
that some might never be needed by an application (just switching to 
lazy imports only stretches out the performance problem, and I'm not 
sure that's a good solution here).  But even then, the memory overhead 
is quite substantial (8Mb?) so his suggestion to fork an 
already-gtk-initialized Python process might be the only way to address 
both issues reliably.  (My own intuition is that PyGTK is representing 
every GTK object with a corresponding Python object, which causes a 
duplication that may not be necessary, but even if that is true it's 
part of the basic architecture of PyGTK so it's not going to be easy to 
change).


-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org



More information about the Devel mailing list