Page MenuHomePhabricator

WikibasePage.templatesWithParams fails without mwparserfromhell
Closed, ResolvedPublic

Description

Following on from bug 66120, we now get a TypeError during save (and other Page write methods). The test added in the last changeset is identifying the issue:

ERROR: test_page_methods (__main__.TestPageMethods)

Test ItemPage methods inherited from superclass Page.

Traceback (most recent call last):

File "tests/wikibase_tests.py", line 418, in test_page_methods
  self.assertRaises(pywikibot.PageNotSaved, self.wdp.save)
File "/usr/local/lib/python2.6/site-packages/unittest2/case.py", line 475, in assertRaises
  callableObj(*args, **kwargs)
File ".../pywikibot/__init__.py", line 476, in wrapper
  return method(*__args, **__kw)
File ".../pywikibot/page.py", line 954, in save
  if not force and not self.botMayEdit():
File ".../pywikibot/page.py", line 883, in botMayEdit
  templates = self.templatesWithParams()
File ".../pywikibot/__init__.py", line 476, in wrapper
  return method(*__args, **__kw)
File ".../pywikibot/page.py", line 1257, in templatesWithParams
  templates = textlib.extract_templates_and_params(self.text)
File ".../pywikibot/textlib.py", line 923, in extract_templates_and_params
  return extract_templates_and_params_regex(text)
File ".../pywikibot/textlib.py", line 943, in extract_templates_and_params_regex
  thistxt = removeDisabledParts(text)
File ".../pywikibot/textlib.py", line 293, in removeDisabledParts
  return toRemoveR.sub('', text)

TypeError: expected string or buffer

That test hasnt caused an error for a while, but this test is against test.wikidata.org which has had a few other bugs needing fixing, and the server config appears to be changing frequently (getting better).

The error we see now is:

  1. Page.save method calls botMayEdit
  2. which calls templatesWithParams to look for {{nobots}}
  3. which loads 'self.text' and
  4. asks textlib.extract_templates_and_params() to extract all the templates

Page.text calls Page.get, which is supposed to return a string.

On a WikibasePage, .get returns a dict instead of a string.


Version: core-(2.0)
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=70978
https://bugzilla.wikimedia.org/show_bug.cgi?id=55150

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:37 AM
bzimport set Reference to bz69664.
  • Bug 70241 has been marked as a duplicate of this bug. ***

Currently I see there two “easy” fixes: One way to fix it would be in the second step to look either if text is a str/unicode.

The second would be to check if page is not a WikibasePage. But that one would break as soon as another Page subclass is introduced which also hasn't a text. Maybe it doesn't make sense that Wikibase is a subclass of Page, because the value stored internally are very different (but that now a bit late, because it's in the API now).

I'm not sure how redirects work actually. Would they return a string?

The same problem is happening on py3 win32

  • Bug 70978 has been marked as a duplicate of this bug. ***

Okay that didn't go so well: What I meant to write was, that when mwparserfromhell is not installed it uses the regex which get's a dict in all versions of Python. I was able to reproduce the same error in Python 2.7.8.

But mwparserfromhell uses the string representation of the text so it quietly tries to interpret the dictionary, and as long as there is no 'template' code in any of the data it won't return any templates.

I might have a fix ready (at least in theory).

Duh, I should have figured out it was mwparserfromhell which was the commonality here. I've re-added py3 compatibility, as installing mwparserfromhell on windows is not easy on py3, but has been solved on py2 with a packaged prebuilt dll.

Change 161201 had a related patch set uploaded by XZise:
[FIX] WikibasePage: Allow saving wikibase pages without mwpfh

https://gerrit.wikimedia.org/r/161201

Change 161201 merged by jenkins-bot:
[FIX] WikibasePage: Allow saving wikibase pages without mwpfh

https://gerrit.wikimedia.org/r/161201

jayvdb renamed this task from WikibasePage.save fails without mwparserfromhell to WikibasePage.templatesWithParams fails without mwparserfromhell.Dec 11 2014, 7:29 PM
jayvdb reopened this task as Open.
jayvdb set Security to None.

Change 179177 had a related patch set uploaded (by John Vandenberg):
Fix WikibasePage.templatesWithParams without mwpfh

https://gerrit.wikimedia.org/r/179177

Patch-For-Review

A significant cause of this problem is that WikibasePage.text is a dict rather than a string as is expected (and quite logical given the name of the property). This causes _regex to fail badly, and _mwpfh is more graceful.

Another significant issue not yet solved is that textlib.extract_templates_and_params_regex is only able to support 'wikitext' -like text, specifically syntax which uses '{{' and '}}', and we dont have a use-case for extract_templates_and_params other than wikitext templates. The MW 'templates'/'transclusions' concept also includes Lua modules and possibly other extensions, but is there any benefit in extracting parameters from lua invocations? Probably not. And I doubt that mwpfh is likely to support Lua 'transclusions'.

So even if WikibasePage.text returned the raw JSON as a string, calling extract_templates_and_params is simply wasted computation.

So, I propose we move templatesWithParams from BasePage into Page.

I don't agree that the mwpfh method is handling it more gracefully. Yes it doesn't throw an exception but simply casting it into a string and working on that isn't really what we want. I prefer the exception _regex is throwing, because if it's not a string we don't know what we can do with it and this is exactly what exceptions are supposed to do: Throw up the arms and stop working on it.

Moving it into Page seems sensible, because we don't know how templates are going to look like in Wikibase as afaik there is no similar system there.

Change 179439 had a related patch set uploaded (by John Vandenberg):
Move templatesWithParams from BasePage into Page

https://gerrit.wikimedia.org/r/179439

Patch-For-Review

Change 179439 merged by jenkins-bot:
Move templatesWithParams from BasePage into Page

https://gerrit.wikimedia.org/r/179439

It looks like we have fixed the instances of this problem within pywikibot, and there are some tests which cover the differences of the two implementations (which is T78378 ). Anything left to do here?

The main component (tests) of that changeset was done in 05a5406e. Now only a edge case performance improvement remains.