Page MenuHomePhabricator

Support optional capturing groups in replaceExcept
Closed, ResolvedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/patches/555/
Reported by: eranroz
Created on: 2012-07-03 18:35:29
Subject: Bugfix for optional caputring group
Original description:
Patch for pywikibot/textlib.py for the replace function \(replaceExcept\) for supporting for empty/optional capturing groups.
This is a bugfix for a crash that occur when using replace.py with a regex containing optional capturing group \(eg AAA in this regex "bla\(AAA\)?bla" \)


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/patches/555

Details

Reference
bz54562

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:11 AM
bzimport set Reference to bz54562.
bzimport added a subscriber: Unknown Object (????).

support for empty capturing group

Is this path for bug \#3539444?

See my comment at the corresponding bug tracker. Maybe it would be ok to accept this patch, anyway I've asked for a third opinion in this matter.

I don't understand this bug. What is the traceback before this patch is implemented. And what should that replaceexcept\(\) do in your special case Could you give me a full example. You may exclude this group by "bla\(?:AAA\)?bla"; would this help?

Yea, this is bugfix for 3539444 .
In short:
when running the following regex "ADMA \(a\)?poria" => "ADMA \1porya"
on text containing ADMA poria \(with no a before poria\) it crashs with the following error
doReplacements
res = replace.ReplaceRobot.doReplacements\(self,original\_text\)
File "D:\myBot\python\pywikipedia-nightly\replace.py", line 390, in doReplacements
allowoverlap=self.allowoverlap\)
File "D:\myBot\python\pywikipedia-nightly\pywikibot\textlib.py", line 179, in replaceExcept
match.group\(groupID\) + \
TypeError: coercing to Unicode: need string or buffer, NoneType found

You may suggest to rewrite the specific regex and it may probably work, but it is just workaround - regex with optional capturing group is correct and should work properly.
Longer story :\) :
In Hebrew Wikipedia there is a list of regexs that are used for replacements in all articles \(almost\). which is here:
http://he.wikipedia.org/wiki/%D7%95%D7%A7:%D7%A8%D7%94
The columns in the table there are:
ID | old | new | exceptText
The list is used by C\# bot implementation which isn't active, and by JS userscript implementation which is used for specific page replacements.
I have ported it to work with replace.py, but if fails when it gets to replacement with optional capturing group. After my fix \(locally\) I ran it for 250 test edits and it worked properly without crashes

@XZise might have fixed this recently as part of T99032.

Yes ba6b6714 should fix this one too. There is actually a test using (x)? (which would actually fail with re.sub).

jayvdb assigned this task to XZise.
jayvdb set Security to None.