mwlib: reworking re2c files to use ctypes

Martin Langhoff martin at laptop.org
Fri Jan 28 07:55:45 EST 2011


Hi Ralf, Volker,

writing to you as you seem to be active maintainers of mwlib and the
re2c files.

OLPC ships an early version of mwlib in its WikiBrowse (aka
Wikiserver) activity, and it's a tool of major important. (Thanks for
your code! Having a nice wikislice on the many XOs that have little or
no connectivity makes a huge impact out there.)

The compiled .so files are a bit of a problem currently for us. We
ship "activities" (user-installable program bundles) that are usually
pure python, and (if prepared carefully) can be installed in several
releases of our OSs, which in turn are based on various Fedora
releases.

Binaries are not recommended inside of those bundles, but if they link
to generic libs with stable API/ABI, things are generally ok.

The re2c binaries from mwlib, unfortunately, inteface with Python
using swig, which means that they end up linking directly to
libpython. We use Python extensively, so we update somewhat
aggressively to the latest version in Fedora. So what happens is that
those SO files end up being tied to specific versions.

There is a different, better way to do this -- to create standalone
.so files, and to use them from Python using ctypes. That way, we can
distribute precompiled .so files that are significantly more portable
(they are still arch and glibc ABI specific).

Would that be of interest to you? Has anyone thought about this, or
worked on this?

If yes, I have done some initial hacking on this you might be
interested in. I have attached a WIP patch against an earlier version
of your _expander.re, it drops a lot of the glue, like:

 mwlib/Makefile     |    4 +-
 mwlib/_expander.re |   75 ++++++++++-----------------------------------------
 2 files changed, 17 insertions(+), 62 deletions(-)

It is not finished, definitely work-in-progress. Once it works, you can just use

 import ctypes
 ctypes.cdll.LoadLibrary('_expander.so')
 _expander = ctypes.CDLL('_expander.so')
 _expander.scan('foo')

And same for _uscan.re .

I now see that in your latest code you are actually not using
_expander.re anymore. How does the Python-based tokenizer perform,
compared to the re2c tokenizer? We care a lot about keeping things
fast.

cheers,



m
-- 
 martin at laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: expander_ctypes.patch
Type: text/x-patch
Size: 3137 bytes
Desc: not available
URL: <http://lists.laptop.org/pipermail/devel/attachments/20110128/ed47ba89/attachment.bin>


More information about the Devel mailing list