Patches for a faster wikibrowse

Martin Langhoff martin.langhoff at gmail.com
Mon Nov 29 17:44:30 EST 2010


Hi Chris,

in between things I've applied a few changes to Wikibrowse that make
it quite a bit faster. Feel free to review/comment/pull from
http://dev.laptop.org/git/users/martin/wikiserver/

= Batch inline templates =

Turns out most templates are only used once or twice. If we blindly
inline all templates, page load times drop dramatically and the bzip
file shrinks. So there's a mergetemplates.py in tools. It also sorts
all pages (which tends to put redirects next to their target too).
With those things combined, on XO-1.5

 - es_PE 'processed' file shrinks from 84MB to 81MB

 - Argentina goes from 31s to 12s (it's a 315KB page -- tons of parsing)

 - Andorra goes from 22s to 2.6 (71KB page -- a bit less parsing
effort, but had lots of nested templates)

 - Lógica goes from 6s to 3s

I tried various other strategies with templates. Given that most
templates are single use, and the frequent-use templates have a tiny
'payload', inlining them all makes the most sense.

The batch inliner takes 7hs on my (slowish) laptop(!).

=Various fixes that improve resiliency=

 -- we couldn't process quite a few pages

=pyrlu cache=

A bit less important now that we're using inline templates, but still
a big win re-visiting pages. On its own, it'd give a good 20%
improvement "cold cache" because many pages use tiny templates a
zillion times.


= What's next=

I'll probably prepare a releae of Wikipedia Spanish and English at
least with these and a few content edits -- perhaps with your help?

cheers,



m
-- 
 martin.langhoff at gmail.com
 martin at laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff



More information about the Devel mailing list