There are several readers available for mediawiki xml.bz2 dumps, some able to read the native format, and others which transform the data.
All suffer from there not being an index into this data. It is a major barrier to development and adoption by users.
The simplest remedy would be to register a dump filter which creates a text file mapping article title -> byte offset. If this is done during the backup process, there is almost no resource overhead.
I can write a patch if other developers agree this would be a worthwhile pursuit.
Version: unspecified
Severity: enhancement