Page MenuHomePhabricator

enable importing of edits from newly released historical English Wikipedia database dumps to the current enwiki database
Closed, ResolvedPublic

Description

Most edits from February 2002 onwards have survived intact in the Wikipedia database, but some have not, mostly due to deletion-related accidents. I've compiled a list of these at:
http://en.wikipedia.org/wiki/User:Graham87/Page_history_observations

I'd like to be able to use the newly released historical database dumps to re-add some of these missing edits. Ideally, these database dumps should be placed on read-only wikis, where any admin can use Special:Import to retrieve the necessary edits like the Nostalgia Wikipedia (see bug 20280).


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)/Archive_85#Restoring_long-lost_edits_using_the_newly_released_historical_database_dumps

Details

Reference
bz34465

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:19 AM
bzimport set Reference to bz34465.
bzimport added a subscriber: Unknown Object (MLST).

On my todo list is to find all broken revisions on all projects and wade through the dumps to see what's recoverable. It's pretty far down on the list though :-(

(In reply to comment #0)

I'd like to be able to use the newly released historical database dumps to
re-add some of these missing edits. Ideally, these database dumps should be
placed on read-only wikis, where any admin can use Special:Import to retrieve
the necessary edits like the Nostalgia Wikipedia (see bug 20280).

Can't you "just" check them by hand and use Special:Import with importupload right (see [[m:Importer]])?
If the XML is too big you could also import it on some test wiki, delete the pages you don't need, re-export all the rest and import the XML.

Ah, remember that [[m:NWI]] are masters of such XML dumps jobs. ;-)

(In reply to comment #2)

Can't you "just" check them by hand and use Special:Import with importupload
right (see [[m:Importer]])?
If the XML is too big you could also import it on some test wiki, delete the
pages you don't need, re-export all the rest and import the XML.

Hmmm, interesting idea (especially getting the help of the small-wiki importers from comment 3!). It's just that the dumps I'm most interested in (from May 2003 and 2002) aren't in XML format at all ... they're in the format of the old versions of MediaWiki and UseModWiki. They do have a warning on them that they shouldn't be wholesale dumped into the latest version of MediaWiki, after all ...

Hm, I'm not sure I understand what dumps you're talking about then: do you mean the content Tim Starling recovered from the diff txt files of UseModWiki, and which Reagle reworked a bit? http://reagle.org/joseph/blog/social/wikipedia/10k-redux.html
That may be tricky indeed. :/

Yes, it would be nice to restore the dumps from August 2001 that Tim Starling recovered, but for the purposes of this bug I'm specifically talking about these dumps:
http://dumps.wikimedia.org/archive/

I've finally bitten the bullet and imported the January 2003 dump to a local copy of MediaWiki (not an easy task for someone with almost no MySQL experience!) I did so with the help of MediaWiki 1.3 (using a skeleton database) and MediaWiki 1.5 (using its updater). I have requested import rights on Meta, so I can import some of the needed revisions, at Steward Requests/Permissions. My request is here:
http://meta.wikimedia.org/wiki/Steward_requests/Permissions#Miscellaneous_requests

I've finally bitten the bullet and imported the January 2003 dump to a local copy of MediaWiki (not an easy task for someone with almost no MySQL experience!) I did so with the help of MediaWiki 1.3 (using a skeleton database) and MediaWiki 1.5 (using its updater). I have requested import rights on Meta, so I can import some of the needed revisions, at Steward Requests/Permissions. My request is here:
http://meta.wikimedia.org/wiki/Steward_requests/Permissions#Miscellaneous_requests

What happened to this?

I got the necessary rights and ended up doing quite a lot with them, but there's still more to do:
https://en.wikipedia.org/wiki/User:Graham87/Page_history_observations
https://en.wikipedia.org/wiki/User:Graham87/SHA-1

Thanks. So, do you want to close this report because the "enable importing" part has been fulfilled, or do you prefer to replace "enable" with "complete" and keep it open, assigned to you?

Graham87 claimed this task.

The first option would probably be best. I've done it, I think.