Author: robertb
Description:
Trying to convert the simple English wikipedia xml dump to an sql file (i.e. without on-the-fly insert into database), I get a Java exception after partial successful conversion. Here's what is displayed:
...
8,740 pages (41.603/sec), 376,000 revs (1,789.777/sec)
8,778 pages (41.713/sec), 377,000 revs (1,791.493/sec)
8,801 pages (41.713/sec), 378,000 revs (1,791.554/sec)
Exception in thread "main" java.lang.IllegalArgumentException: Invalid contributor
at org.mediawiki.importer.XmlDumpReader.closeContributor(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
at org.mediawiki.dumper.Dumper.main(Unknown Source)
Versions:
OS: Linux 2.6.17-1.2142 (Fedora Core 4)
Java: 1.6.0_13-b03
mwdumper: 2008-04-13
Data: Simple English Wikipedia dump of 2009-03-30
Invocation:
java -Xmx512m -Xms128m -XX:NewSize=32m -XX:MaxNewSize=64m -XX:SurvivorRatio=6 -XX:+UseParallelGC -XX:GCTimeRatio=9 -XX:AdaptiveSizeDecrementScaleFactor=1 -server -jar mwdumper.jar --format=sql:1.5 simplewiki-20090330-pages-meta-history.xml > simplewiki-20090330-pages-meta-history.sql &
What's up and how to fix this problem?
Version: unspecified
Severity: normal