Page MenuHomePhabricator

Unique identity constraints for XML dump format schema
Closed, ResolvedPublic

Description

Author: elvstone

Description:
The XML Schema for the XML dump format used by MediaWiki has no constraints for
the page and revision identifiers. This can be easally fixed with the attached
patch. Having it enforced in the XSD makes sense, since I think that some
parsers capable of Schema validation can work more efficiently if they're there.
Another reason is that (however unlikely) some other software might output files
in this format are not obliged to keep the IDs unique, according the the XSD in
its current form.


Version: unspecified
Severity: normal

Details

Reference
bz4220

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:59 PM
bzimport set Reference to bz4220.
bzimport added a subscriber: Unknown Object (MLST).

elvstone wrote:

Adds unique identity constraints for page/id and page/revision/id

Attached:

elvstone wrote:

Wow, activity on my over 3.5 year old bug. I even changed my real name in the meantime ;) Is the bug still applicable?

elvstone wrote:

Heh. I just checked. Yes patch still applies. Not that I care too much about this bug anymore, but could someone apply it?

sumanah wrote:

Elvis, I'm sorry for the very, very late response. I'm asking developers to look at your patch soon.

I've looked at it and it looks good to me. Should this apply to only version 0.6 of the XSD or should it apply to all versions of XSD?

elvstone wrote:

Heh better late then never.

Diederik: I'm not sure and I'm on the train atm, but I guess it would make sense to enforce it in all versions. But 5 years is a long time, can't remember which version I made the patch against. Will check when I get home.

Cheers.

This patch looks good to me also. Might as well apply it as far back as we can; if someone is producing old schema dumps that violate these constraints they have bigger problems on their hands than this enforcement change.

Hmm with one exception I guess, if someone produces XML files with multiple entries for a given pageid (but each entry contains different revision ids), that could be a problem.

sumanah wrote:

Elvis, can you respond to Ariel's suggestion? And did you have a chance to check what version(s) it should apply to?

sumanah wrote:

Elvis: Thanks again for the patch. Are you interested in using developer access to directly suggest it into our Git source control system?

https://www.mediawiki.org/wiki/Developer_access

I have submitted the change in Gerrit for review, see https://gerrit.wikimedia.org/r/8889

Patch credited to Elvis Stansvik.

Once reviewed and merged in master, we will have to update the publicly facing URL at http://www.mediawiki.org/xml/export-0.7/ . This is covered by bug 37111.

Change merged. Will deploy export file.

elvstone wrote:

Thanks! If I need a change in MW by Christmas 2018 I'll let you know. (Just kidding!) :)