[sugar] Re: [OLPC-devel] Re: pygtk performance issue

Wed Sep 6 17:02:20 EDT 2006

Ian Bicking wrote:
> Mitch Bradley wrote:
>> As an example of what one might do:  The search path resolution
>> mechanism could notice, while looking for file X in a list of
>> directories, that some of those directories don't exist.  It could
>> then prune those entries from the path list.

This is generally not slow, it just takes 10-20 ms per file, tested on the
OLPC itself.

Python 2.5 includes a patch which caches the lookups, it's however unclear
if this will give any performance improvements.

> If you want to go down this route, the trimming should happen in site.py
> (which, in turn, is probably where the superfluous entries were added).
>  Python's site.py is a mess IMHO, and a custom OLPC version seems quite
> reasonable.

It might be a good idea to investigate this, but I doubt it'll give
any substantial benefits, maybe a couple of ms.

> Some people on some platforms have reported improvements when using zip
> files instead of directories of modules, because the available files are
> listed all in one place.  Also, Python doesn't expect a zip file to
> change, but will rescan directories frequently during subsequent
> imports.  There's a memory overhead for zip files, in part just because
> the file listing is kept in memory, and I don't know (I would doubt)
> that .so files can be imported out of a zip file.

It's correct that.so files cannot be imported from a zip file, they
need to stay in the normal file system.

> There was also the issue that Mike Heam brought up about the linking
> overhead of importing _gtk.so in particular.  Here I must profess that I
> am largely ignorant of the issues, but I'll speculate anyway.  Perhaps
> just splitting up _gtk.so into separate modules would be helpful,

Splitting _gtk.so into several pieces would be benefit, especially combined
with the dynamic namespace patch[1] which is currently disabled (see
the bug for more information).
A problem is that it can only really be split into two pieces, one for
gtk and one for gdk, the others does not provide any benefit.
The gtk<>gdk dependencies inside PyGTK are a little troublesome making
the split non-trivial.

Most of the gains are going to be made by:

1) Avoiding to register the classes for /all/ types, enums, flags, until
they're actually created: gtk & gdk modules.
2) Creating function wrappers on demand, might be done for methods too.
3) Importing modules on demand, applies to atk, pango, gdk, cairo

All of these mostly solved but depends on splitting gtk & gdk for maximum
benefits.

> particularly if there are really different kinds of functionality so
> that some might never be needed by an application (just switching to
> lazy imports only stretches out the performance problem, and I'm not
> sure that's a good solution here).  But even then, the memory overhead
> is quite substantial (8Mb?) so his suggestion to fork an

I wonder where this 8M number comes from
PyGTK uses around 5-6M including GTK+ and Python.

> already-gtk-initialized Python process might be the only way to address
> both issues reliably.  (My own intuition is that PyGTK is representing
> every GTK object with a corresponding Python object, which causes a
> duplication that may not be necessary, but even if that is true it's
> part of the basic architecture of PyGTK so it's not going to be easy to
> change).

Yes that is correct, otherwise you would not be able to use the GObjects
from python at all. It's not possible to avoid this duplication.
All GObject primitives; objects, types, enums, interfaces etc requires
a python side wrapper.

[1]: bugzilla.gnome.org/show_bug.cgi?id=346946

Johan