Patches for a faster wikibrowse
Martin Langhoff
martin.langhoff at gmail.com
Mon Nov 29 17:44:30 EST 2010
Hi Chris,
in between things I've applied a few changes to Wikibrowse that make
it quite a bit faster. Feel free to review/comment/pull from
http://dev.laptop.org/git/users/martin/wikiserver/
= Batch inline templates =
Turns out most templates are only used once or twice. If we blindly
inline all templates, page load times drop dramatically and the bzip
file shrinks. So there's a mergetemplates.py in tools. It also sorts
all pages (which tends to put redirects next to their target too).
With those things combined, on XO-1.5
- es_PE 'processed' file shrinks from 84MB to 81MB
- Argentina goes from 31s to 12s (it's a 315KB page -- tons of parsing)
- Andorra goes from 22s to 2.6 (71KB page -- a bit less parsing
effort, but had lots of nested templates)
- Lógica goes from 6s to 3s
I tried various other strategies with templates. Given that most
templates are single use, and the frequent-use templates have a tiny
'payload', inlining them all makes the most sense.
The batch inliner takes 7hs on my (slowish) laptop(!).
=Various fixes that improve resiliency=
-- we couldn't process quite a few pages
=pyrlu cache=
A bit less important now that we're using inline templates, but still
a big win re-visiting pages. On its own, it'd give a good 20%
improvement "cold cache" because many pages use tiny templates a
zillion times.
= What's next=
I'll probably prepare a releae of Wikipedia Spanish and English at
least with these and a few content edits -- perhaps with your help?
cheers,
m
--
martin.langhoff at gmail.com
martin at laptop.org -- School Server Architect
- ask interesting questions
- don't get distracted with shiny stuff - working code first
- http://wiki.laptop.org/go/User:Martinlanghoff
More information about the Devel
mailing list