#10842 NORM Not Tri: Wikiserver parser fails on full enwiki dump
Zarro Boogs per Child
bugtracker at laptop.org
Thu Apr 28 05:42:47 EDT 2011
#10842: Wikiserver parser fails on full enwiki dump
------------------------+---------------------------------------------------
Reporter: rmo | Owner: cjb
Type: defect | Status: new
Priority: normal | Milestone: Not Triaged
Component: wikiserver | Version: not specified
Keywords: | Next_action: never set
Verified: 0 | Deployment_affected:
Blockedby: | Blocking:
------------------------+---------------------------------------------------
I stumbled on a few strange parser bugs while testing ''server.py'' with a
complete, recent, English Wikipedia dump.
The first problem was a maximum recursion depth reached at just about
every article of moderate complexity, which I worked around by
blacklisting a few templates that often came up before the overflow [1].
It soon became clear, however, that the problem went deeper. Here's a
fragment of output demonstrating unexpected template
expansion (to the non-existent ''Template:Main other''):
{{{
{{#switch:
<!--If no or empty "demospace" parameter then detect namespace-->
{{#if:{{{demospace|}}}
| {{lc: {{{demospace}}} }} <!--Use lower case "demospace"-->
| {{#ifeq:{{NAMESPACE}}|{{ns:0}}
| main
| other
}}
}}
| main = {{{1|}}}
| other
| #default = {{{2|}}}
}}<noinclude>
{{documentation}}
<!-- Add categories and interwikis to the /doc subpage, not here! -->
</noinclude>
expander.info >> parsing template "u'Template:Main other'"
expander.warn >> using dummy resolver for fullpagename
expander.warn >> using dummy resolver for fullpagename
expander.warn >> using dummy resolver for fullpagename
expander.warn >> using dummy resolver for fullpagename
...
}}}
It seems the omission of the exclamation point article from the
`en_US_g1g1' is fortuitous; when present, it is frequently and
absurdly embedded in other articles (eg. wp:Cloud) that nest a common
{{!}} template.
This strange behavior revealed another series of bugs: (1)
there's no way, even with the edit directory enabled, to delete
an article; (2) removing all of the text from an article fails
silently; and (3) changes to an article are not reflected when
invoked through a template.
I wonder how much this situation would improve with upstream changes to
mwlib?
[1] I blacklisted ''Template:#time'', ''Template:Cite book'', and
''Template:Key press/core,'' but it should be noted that the blacklist
isn't currently loaded; a small patch:
{{{
diff --git a/server.py b/server.py
index 02d01a9..84287c1 100755
--- a/server.py
+++ b/server.py
@@ -467,7 +467,7 @@ class WikiRequestHandler(SimpleHTTPRequestHandler):
self.lang = conf['lang']
self.flang = conf['flang']
self.templateprefix = conf['templateprefix']
- self.templateblacklist = set()
+ self.templateblacklist = conf['templateblacklist']
self.imgbasepath = self.flang + '/images/'
self.wpheader = conf['wpheader']
self.wpfooter = conf['wpfooter']
}}}
--
Ticket URL: <http://dev.laptop.org/ticket/10842>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system
More information about the Bugs
mailing list