Page MenuHomePhabricator

mwdumper doesn't set page.is_redirect for borderline #redirect syntax
Closed, ResolvedPublic

Description

Author: bugs.wikimedia

Description:
in at least the 20060915 frwiki dump, the is_redirect column (in the page table) is set to 0 for a wide variety of articles where
it should be 1 (first one of them is page_id=204, then 758, 917, ...).

These articles all have in common that they had a life before being declared as redirects, and when they were, the is_redirect
field was apparently not updated to reflect the new state.


Version: unspecified
Severity: normal

Details

Reference
bz7497

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:28 PM
bzimport set Reference to bz7497.
  1. The three given examples are all missing spaces:

#redirect[[Écrivains de langue française, par ordre chronologique]]
#redirect[[calcul parasitaire]]
#REDIRECT[[Période Chosŏn]]

  1. You don't specify whether you're looking at the 'page' SQL table dump or the result

of some kind of import from an XML dump.

All three pages have page_is_redirect set to 1 in the live page table, so should also
be set to 1 in the SQL dump of the page table.

If you are looking at the results of an XML import, please specify:

a) exactly which file you're importing
b) exactly how you're importing it
c) exact version of MediaWiki

bugzilla.wikipedia.org wrote:

Hi Brion

  1. Yes, I've seen the space problem on many other examples, it might be the root cause.
  2. Here are more details : I'm using mediawiki 1.8-svn. I have imported the 060915-pages-articles xml dump (after a mwdumper

-> sql 1.5 translation with no strange options) and all .sql dumps *except* page.sql of course. Mwdumper was from svn too.

Update: you're right, mwdumper is the culprit.
I've just translated frwiki-20060929-pages-articles.xml (this time using the precompiled mwdumper.jar at http://
download.wikimedia.org/tools/) with the command line
java -server -jar mwdumper.jar --progress=50000 --output=file:frwiki-20060929-pages-articles.sql --format=sql:1.5
frwiki-20060929-pages-articles.xml

and I can read in the generated sql :
INSERT INTO page (...) (204,0,'Auteurs_par_ordre_chronologique','',0,0,0,RAND(),DATE_ADD('1970-01-01', INTERVAL
UNIX_TIMESTAMP() SECOND),2334654,69) (...)

Assigning to brion. Problably created before new mwdumper issues were auto-assigned.

Created attachment 7262
use redirect-tag to set page_is_redirect field

Since r53271 the XML-Export has a extra tag.

The attached patch use that tag to set the field page_is_redirect of the page table.

attachment bug7497.patch ignored as obsolete

  • Bug 31906 has been marked as a duplicate of this bug. ***

*** Bug 38919 has been marked as a duplicate of this bug. ***

From bug [[bug:38919]] (not sure whether the same bug or not, since this bug is from 2006)

The problem was on the code

https://github.com/bcollier/mwdumper/blob/master/src/org/mediawiki/importer/Revision.java

It only look for English word "Redirect", while in non-English wikis it might be localized ("Alih" in Indonesian, for example)

Same patch, minor changes

Minor changes, in hopes of progress.

Attached:

(In reply to comment #7)
Same problem, this patch resolves that problem too, because sets the page_is_redirect differently, based on the redirect tag.