Page MenuHomePhabricator

redirectRegex throws type error
Closed, DeclinedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1201/
Reported by: dnessett
Created on: 2010-06-24 16:31:01
Subject: redirectRegex throws type error
Assigned to: xqt
Original description:
Running MW 1.13.2, the following command throws a type error:

$ python add\_text.py -cat:Pages\_with\_too\_many\_expensive\_parser\_function\_calls -text:" " -summary:"Test edit:Category jog for \[\[:Category:Pages with too many expensive parser function calls|Pages with too many expensive parser function calls\]\]"

The result is:

Getting \[\[Category:Pages with too many expensive parser function calls\]\]...
Loading 2009 White House Forum on Health Reform/Related Articles...
Do you want to accept these changes? \(\[y\]es, \[N\]o, \[a\]ll\) a
Updating page \[\[2009 White House Forum on Health Reform/Related Articles\]\] via API
Loading 2010 United Kingdom general election/Related Articles...
Traceback \(most recent call last\):
File "add\_text.py", line 417, in <module>
main\(\)
File "add\_text.py", line 413, in main
create=talkPage\)
File "add\_text.py", line 201, in add\_text
text = page.get\(\)
File "/usr/local/src/python/pywikipedia/local\_sites/wikipedia.py", line 619, in get
self.\_contents = self.\_getEditPage\(get\_redirect = get\_redirect, throttle = throttle, sysop = sysop\)
File "/usr/local/src/python/pywikipedia/local\_sites/wikipedia.py", line 727, in \_getEditPage
m = self.site\(\).redirectRegex\(\).match\(pagetext\)
File "/usr/local/src/python/pywikipedia/local\_sites/wikipedia.py", line 6644, in redirectRegex
pattern = r'\(?:' + '|'.join\(keywords\) + '\)'
TypeError

version.py output is:

$ python version.py
Pywikipedia \[http\] trunk/pywikipedia \(r8311, 2010/06/22, 13:20:10\)
Python 2.5.2 \(r252:60911, Jan 20 2010, 21:48:48\)
\[GCC 4.2.4 \(Ubuntu 4.2.4-1ubuntu3\)\]
config-settings:
use\_api = True
use\_api\_login = True

This error occurs due to the following bug in the code. At line 6642 is the following code fragment:

try:
keywords = self.getmagicwords\('redirect'\)
pattern = r'\(?:' + '|'.join\(keywords\) + '\)'
except KeyError:
\# no localized keyword for redirects
pattern = r'\#%s' % default

getmagicwords is a one line method that simply calls siteinfo \(line 5480\) with the key 'magicwords'. At line 5518, siteinfo calls getData to obtain site data. When looking for magicwords, the method executes "for entry in data\[key\]" at line 5527. For certain versions of MW, magicwords are not returned as part of the site data and therefore data\[key\] returns a null result. Eventually, this leads to the KeyError exception at line 5538.

The bug arises because siteinfo catches the KeyError exception and returns a result of "None". When the call is unwound back to line 6643 the provision for a KeyError at line 6645 is vacuous. The KeyError has already been caught by siteinfo.

Consequently, the statement at line 6644 executes. This causes a TypeError since the keyword arguement to .join\(\) is null.


Version: compat-(1.0)
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1201

Details

Reference
bz55272

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:31 AM
bzimport set Reference to bz55272.
bzimport added a subscriber: Unknown Object (????).

Thanks a lot for analyzing it and these details. I'll fix it tomorrow.

  • assigned_to: nobody --> xqt

The bug fix in r8329 doesn't correct the problem. This is perhaps because I mis-analyzed the problem. In fact the try ... except block in siteinfo accomplishes nothing, since the KeyError occurs outside its scope. So, what really happens is the exception occurs and propagates. However, the value returned on an exception is None. So, it propagates through getmagicwords to redirectRegex. For some reason I don't understand, it is not caught by the except clause there before the pattern statement executes \(causing the type error\).

The solution \(which I have tested\) is to put a try ... except block in getmagicwords and return None when a KeyError occurs. This consumes the KeyError exception and allows the change in r8333 to redirectRegex to work properly. In addition, it makes no sense to have the try ... except block in siteinfo, since it isn't possible for a KeyError to occur as the result of either of the two return statements.

I will attach a patch against r8333 that fixes the problem.

patch against r8333 to fix the bug

The patch has been applied yet so It's still reproducible (I haven't checked though) since it's compat bug I mark it as low priority

Aklapper lowered the priority of this task from Low to Lowest.Jun 5 2015, 1:41 PM
Aklapper subscribed.

Pywikibot has two versions: Compat and Core. This task was filed about the older version, called Pywikibot-compat, which is not under active development anymore. Hence I'm lowering the priority of this task to reflect the reality. Unfortunately, the Pywikibot team does not have the manpower to retest every single bug report / feature request against the (maintained) Pywikibot code base. Furthermore, the code base of Pywikibot-Compat has changed a lot compared to the code base of Pywikibot-Core so there is a chance that the problem described in this task might not exist anymore. Please help: Unfortunately manpower is limited and does not allow testing every single reported task again. If you have time and interest in Pywikibot, please upgrade to Pywikibot-Core and add a comment to this task if the problem in this task still happens in Pywikibot-Core (or directly edit the task by removing the Pywikibot-compat project and adding the Pywikibot project to this task). To learn more about Pywikibot and to get involved in its development, please check out https://www.mediawiki.org/wiki/Manual:Pywikibot/Development Thank you for your understanding.

Won't fix in compat