Page MenuHomePhabricator

all-titles file doesn't include namespace prefix
Closed, ResolvedPublic

Description

the file *-all-titles.gz seems not include namespace prefix
for example

http://dumps.wikimedia.org/commonswiki/20131121/commonswiki-20131121-all-titles.gz

It seems that in the curent process (currently: http://git.wikimedia.org/blob/operations%2Fdumps.git/11e9b23b4bc76bf3d89e1fb32348c7a11079bd55/xmldumps-backup%2Fworker.py#L4043 )
it's a simple query
query="select page_title from page;"

and the namespace is not in page_title

it makes this file nearly useless as one is unable to make the difference between a title in the main namespace and a title in an other namespace or between two different namespace


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:30 AM
bzimport set Reference to bz57739.

a work around is to use stub-meta-current which contain the prefixed titles but this file is quite bigger as it contains extra informations

Change 318901 had a related patch set uploaded (by ArielGlenn):
dump namespace along with page title for allpagetitle dump

https://gerrit.wikimedia.org/r/318901

Change 318901 merged by ArielGlenn:
dump namespace along with page title for allpagetitle dump

https://gerrit.wikimedia.org/r/318901

Fix deployed and should take effect for the next dump run, i.e. tomorrow.

Files are being generated properly, closing.