Page MenuHomePhabricator

Export: add namespace to page
Closed, ResolvedPublic

Description

Please add the namespace of a page when export.

It is very hard to determine the namespace of a title. It is better when MediaWiki can give the right number inside the export/dump.

function openPage in Export.php has the full row of page_table, so the namespace information is there. Please add the information to the output.

Thanks.


Version: unspecified
Severity: enhancement

Details

Reference
bz25542

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:20 PM
bzimport set Reference to bz25542.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 7740
add ns to title-tag

The attached patch add a "ns" attribute to the title tag.

Attached:

The dump preamble includes the full list of content namespace prefixes in the <siteinfo> section, which makes it a pretty straightforward string comparison. Adding a namespace attribute on the <page>'s <title> wouldn't hurt terribly, but would be redundant and could cause confusion if xml dump files are edited in some way such that the namespaces no longer match.

Would require update to the .xsd schema file as well.

May require updates to other dump processing infrastructure to maintain it through bulk data dumps, otherwise it's not of much use.

(In reply to comment #2)

The dump preamble includes the full list of content namespace prefixes in the
<siteinfo> section, which makes it a pretty straightforward string comparison.
Adding a namespace attribute on the <page>'s <title> wouldn't hurt terribly,
but would be redundant and could cause confusion if xml dump files are edited
in some way such that the namespaces no longer match.
Would require update to the .xsd schema file as well.
May require updates to other dump processing infrastructure to maintain it
through bulk data dumps, otherwise it's not of much use.

String comparison can be expensive (depends on language) and you have also to check with a colon after the namespace to avoid false namespace (a page named "Template Test" is not a template).

If you find, that this is redundant and each dump user can find the namespace himself, than WONTFIX this, it is only a idea for easy work with the dump.