[sugar] [Wikireader] english wikireaders and 0.7
Chris Ball
cjb at laptop.org
Sun Sep 7 19:22:55 EDT 2008
Hi,
> Where is the code for this? Lede-detection code is a priority for
> me, and I'd like to work on it. It should be easy to sense the
> start of the first H2 and drop the rest of the article.
There is no code for lead detection. You'd have to write it from
scratch; take the enwiki.xml.bz2 from ¹, run it through your script,
and output a new enwiki.xml.bz2 with articles substituted for leads
if the article isn't present in ².
¹: http://download.wikimedia.org/enwiki/20080724/enwiki-20080724-pages-meta-current.xml.bz2
²: http://dev.laptop.org/~cjb/enwiki/en-g1g1-8k.txt
> Is there some way to estimate the size impact on the whole of
> adding one template (given how often it is referenced)? If we
> could rank templates by their footprint, it would be easier to
> "fill up" a space allocation for them, as we do for images.
This sounds complicated, and we don't try to do it for pages. Templates
are probably on average the same size as each other (or rather, they're
all small enough that the difference is not very meaningful); find out
the size of an average-looking one and how much disk space we have left,
and take as many as roughly fits our capacity, I guess. I think like
some of the other things you're proposing, the answer here is "yes, we
could write a program to do that, but we don't really have time to".
- Chris.
--
Chris Ball <cjb at laptop.org>
More information about the Sugar
mailing list