Page MenuHomePhabricator

Mediawiki XML file version 0.4 does not validate against its own DTD file
Closed, ResolvedPublic

Description

Author: rodrigosprimo

Description:
Patch to fix issues on Mediawiki DTD

Hi,

I'm trying to validate a Mediawiki XML file against its DTD file but the validation is failing. I'm trying using PHP DOMDocument, I haven't tried the validation with other tools so I can't be sure if the problem is on PHP or Mediawiki XML file, but I guess it is more likely to be on Mediawiki.

I'm testing with the attached script (testMediawikiXml.php). When I try to validate the XML from http://en.wikipedia.org/wiki/Special:Export/Train I get the following error:

Element '{http://www.w3.org/2001/XMLSchema}element': The attribute 'name' is required but missing

This error can be fixed by commenting line 119 of http://www.mediawiki.org/xml/export-0.4.xsd. The content of this line is:

<element minOccurs="0" maxOccurs="1" type="mw:DiscussionThreadingInfo" />

I guess the best solution is to add the "name" attribute but I haven't investigate and I don't know much about DTD to know what should be the value of the "name" attribute.

If I try to run the script again another error occurs:

Element '{http://www.mediawiki.org/xml/export-0.4/}namespace', attribute 'case': The attribute 'case' is not allowed.

To fix this one I have added the following line below line 92:

<attribute name="case" type="string" />

After those two changes to the DTD file I'm able to validate the XML file. I'm attaching the script I'm using to test and a patch with the changes I made to the DTD file. I guess that the second change is ok but the first issue need to be properly fixed (instead of just commenting the line).

Thanks, Rodrigo.


Version: unspecified
Severity: normal

Attached:

Details

Reference
bz25753

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:21 PM
bzimport set Reference to bz25753.

rodrigosprimo wrote:

Script used to test the validation

Attached:

Tomasz, weren't you the one that last messed around with this?

These are both fine in trunk and 1.16wmf4.

However, the XSD file in /usr/local/apache/common/docroot/mediawiki/xml needs updating and I'm not sure how to sync files from there to the cluster.

Tweaking to be a shell request bug.

File has been sync'd to the servers and purged from squid. Should validate now.