I tried to run the GUI version of the newest revision (r60229) of mwdumper under Java 6 update 17 on an Intel Core i7 with 3,25G RAM and WinXP SP3, and it gave this error:
Exception in thread "Thread-8" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.StringCoding.safeTrim(Unknown Source)
at java.lang.StringCoding.access$300(Unknown Source)
at java.lang.StringCoding$StringEncoder.encode(Unknown Source)
at java.lang.StringCoding.encode(Unknown Source)
at java.lang.String.getBytes(Unknown Source)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:493)
at com.mysql.jdbc.StringUtils.getBytes(StringUtils.java:603)
at com.mysql.jdbc.ByteArrayBuffer.writeStringNoNull(ByteArrayBuffer.java:544)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1638)
at com.mysql.jdbc.Connection.execSQL(Connection.java:2972)
at com.mysql.jdbc.Connection.execSQL(Connection.java:2902)
at com.mysql.jdbc.Statement.execute(Statement.java:529)
at org.mediawiki.importer.SqlServerStream.writeStatement(SqlServerStream.java:25)
at org.mediawiki.importer.SqlWriter.flushInsertBuffer(SqlWriter.java:195)
at org.mediawiki.importer.SqlWriter.bufferInsertRow(SqlWriter.java:184)
at org.mediawiki.importer.SqlWriter15.writeRevision(SqlWriter15.java:68)
at org.mediawiki.importer.PageFilter.writeRevision(PageFilter.java:67)
at org.mediawiki.dumper.ProgressFilter.writeRevision(ProgressFilter.java:56)
at org.mediawiki.importer.XmlDumpReader.closeRevision(XmlDumpReader.java:346)
at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:204)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
According to the Java docs, default max heap size is 3/4 of the physical memory, that is, around 800M. Since a single revision is at most 2M, there is no reason for mwdumper to require that much space. (It ran on the huwiki full history dump, directly writing to the database.)
Version: unspecified
Severity: enhancement
OS: Windows XP
Platform: PC