Page MenuHomePhabricator

Run cleanupTitles.php and cleanupImages.php on any wikis with unicode whitespace in page/file names
Closed, ResolvedPublic

Assigned To
None
Authored By
bzimport
Sep 19 2009, 3:57 PM
Referenced Files
F6414: bad_titles.txt
Nov 21 2014, 10:57 PM
F6415: bad_images.txt
Nov 21 2014, 10:57 PM
F6411: results.txt
Nov 21 2014, 10:57 PM
F6413: imageresults.txt
Nov 21 2014, 10:57 PM

Description

Author: herd

Description:
Per r55382 (ref bug 15248 that introduced the change) there are now many pages and files inaccessable on some projects that contain these characters.

Example: http://toolserver.org/~nikola/grep.php?pattern=%E3%80%80&lang=ja&wiki=wikipedia&ns=6

http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:110%E3%80%80PICT0001.JPG
becomes
http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:110_PICT0001.JPG


Version: unspecified
Severity: critical
URL: http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=55382

Details

Reference
bz20741

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:56 PM
bzimport set Reference to bz20741.

herd wrote:

*** Bug 20738 has been marked as a duplicate of this bug. ***

Change to critical since it's related to several data made inaccessible.

  • Bug 20746 has been marked as a duplicate of this bug. ***

Upping to BLOCKER. Lots of pages on frwikinews are unavailable: http://toolserver.org/~nikola/grep.php?pattern=%C2%A0&lang=fr&wiki=wikinews&ns=0

Adding Brion as CC so he can delegate someone for this.

herd wrote:

*** Bug 20747 has been marked as a duplicate of this bug. ***

Wiki.Melancholie wrote:

*** Bug 20703 has been marked as a duplicate of this bug. ***

The script is running in a screen on zwinger.

Done, invalid titles can be found with Special:PrefixIndex/Broken/

(In reply to comment #10)

Done, invalid titles can be found with Special:PrefixIndex/Broken/

No, invalid titles can'be be found with Special:PrefixIndex/Broken/

See fr-wikinews : five hundred page are still unavailable : 10 % of the project is canceled.

Histories are canceled, too ! It's not a bug, but a disaster.

Looks like the script is stopping after fixing one title. Looking...

(In reply to comment #13)

Looks like the script is stopping after fixing one title. Looking...

Cf bug 17479. Maybe a problem with TableCleanup itself?

Pending on fixes to cleanupTitles/namespaceDupes per Tim.

On French wikisource this page [[Discussion:auteur:Charles Baudelaire]] has disappeared: [http://fr.wikisource.org/w/index.php?title=Discussion:Charles_Baudelaire&action=history the history gives this].

The cache in Google is http://209.85.229.132/search?q=cache:92baBBRt9lsJ:fr.wikisource.org/wiki/Discussion:Broken/Auteur%255Cx3aCharles_Baudelaire+wikisource+discussion+auteur+baudelaire&cd=1&hl=en&ct=clnk&client=firefox-a

When I ask for this: http://toolserver.org/~nikola/grep.php?pattern=Charles+Baudelaire&lang=fr&wiki=wikisource&ns=1

I get an answer

Auteur:Charles Baudelaire

but the link to Talk:Auteur:Charles Baudelaire is not good, I have this message:

Mauvais titre

Le titre de la page demandée est invalide, vide, ou il s’agit d’un titre inter-langue ou inter-projet mal lié. Il contient peut-être un ou plusieurs caractères qui ne peuvent pas être utilisés dans les titres.

(Bad title)

This page represented many hours of work, is it possible to have it back?

Zeph

The same thing happened to another author talk page: Émile Zola but I was more lucky this time because I renamed the broken page like this:

15 octobre 2009 à 00:20 Zyephyrus (discuter | contributions | bloquer) m (5 265 octets) (Discussion:Broken/Auteur\x3a\xc3\x89mile Zola renommé en Discussion Auteur:Émile Zola) (révoquer | défaire)

Unfortunately I can't find again the previous broken page: Discussion auteur Charles Baudelaire. Is there some way to find it?

Zeph

I just ran cleanupTitles on frwikisource and it fixed nothing, and the linked list is empty. Please REOPEN if you still find issues.

As Natalie pointed out on IRC, this bug requests that cleanupTitles be run against all wikis, which I obviously didn't do yet. This should probably happen during office hours.

Pdhanda volunteered to handle this one.

pdhanda wrote:

Dry run results from maintenance/cleanupTitles.php

This took a few hours to run. So I'll start the actual run early morning PDT on 04/14.

Attached:

pdhanda wrote:

Dry run results from maintenance/cleanupImages.php

Attached:

pdhanda wrote:

Results after running maintenance/cleanupTitiles on all wikis in all.dblist

Attached:

pdhanda wrote:

Results after running maintenance/cleanupImages on all wikis in all.dblist

Attached:

pdhanda wrote:

Done for all wikis in all.dblist