Edit/audit wikipedia activity
Benjamin M. Schwartz
bmschwar at fas.harvard.edu
Thu Oct 21 13:35:33 EDT 2010
On 10/21/2010 12:06 PM, Martin Langhoff wrote:
> Unfortunately, there is a clear need to organise a facility to
> audit/edit the wikipedia snapshots we have and "repack" the archive.
> Do we have any easy way to do this?
I'm the wrong person to answer this question, but the activity's archive
production system does already have support for an article blacklist (and
indeed many articles were excluded from the current bundles). I don't
know who is in possession of this list, or exactly who took responsibility
for producing the most recent version. Nonetheless, excluding articles is
Actually editing article text is not something we have attempted AFAIK.
Ideally, I think, we would fix textual problems upstream as they are
discovered. The most recent available snapshots for English and Spanish
are 10-14 days old, so this strategy does create a delay, during which
time things can continue to change.
In general, I believe that auditing wikipedia is a fool's errand. There
are 3.5 million articles in English Wikipedia, growing by over a thousand
a day. Spanish wikipedia has >650,000 articles. If people want to
create snapshots containing only whitelisted articles, that's fine, but
many of the links will be broken and the amount of information will be
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 198 bytes
Desc: OpenPGP digital signature
More information about the Devel