Page MenuHomePhabricator

interwiki problems in km wikipedia
Closed, DuplicatePublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1382/
Reported by: Anonymous user
Created on: 2011-11-27 12:57:44
Subject: interwiki problems in km wikipedia
Original description:
it seems like iw bots running different Python versions read Khmer text in a different way. Please see http://en.wikipedia.org/w/index.php?title=Angelina\_Jolie&action=history. Python 2.7.1 bot adds a link and Python 2.5.1 bot removes a link to km, but when you follow that removed link it in fact points to nothing. Is there any way to fix the problem?


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/1382
https://bugzilla.wikimedia.org/show_bug.cgi?id=27446

Details

Reference
bz55227

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:28 AM
bzimport set Reference to bz55227.
bzimport added a subscriber: Unknown Object (????).

Seem unicode bug \#3081100 is back

Interwiki bots running under python 2.7.1 should just be blocked indefinitely for not paying attention to the pwb mailing list and console warnings.

I guess it is vice versa. py 2.5.1 does this failure but unicode test sounds ok \[1\]. I checked these links and found the last 3 characters are missed at 2.5.1-Bot.

\[1\]: http://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%BD%D0%B8%D0%B5\_%D1%83%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA%D0%B0:Volkov\#Khmer\_wikilinks

That means we should discard py2.5 for running pwbots. This would make things easier.

Duplicate of #55256 - the cause is a buggy page name ( km:អែន​ជេ​លីណា ចូលី​ ends in \u200b zero width space ):

Not #3081100, but related. (cur | prev) 00:09, 12 November 2012‎ ElphiBot (talk | contribs)‎ m . . (95,243 bytes) (+10)‎ . . (r2.7.1) (Robot: Modifying km:អែនជេលីណា ចូលី to km:អែន​ជេ​លីណា ចូលី​) most clearly shows what is happening:

This is combined with a change in behavior -- to cite myself:

To clarify; the pywikipedia bug was caused by calling .strip() on the page
title. When working with Unicode < 4.0, this will strip the U+200B character
(python < 2.7), with Unicode > 4.0, this will *not* strip the U+200B character
(python >= 2.7).

  • This bug has been marked as a duplicate of bug 55256 ***
  • This bug has been marked as a duplicate of bug 55246 ***