This is a bug I'm tracking down and fixing, I'm putting it here so I have a place for notes and something to refer to.
CGZ compression was first committed in October 2004, r5940. In December 2004, r6640, this bug was discovered and a temporary fix put in place. Apparently nobody submitted it to Bugzilla at the time.
The issue was that the deletion UI was blind to the compression scheme, and was causing CGZ blobs and pointers to be moved into the archive table. Undeletion would move them back. Pointers to deleted rows cannot work and will give you an error message, so the text of these pointers is unreadable. If the whole article was undeleted, the CGZ blob would get a different old_id, which means that the pointers still don't work.
If the article was partially undeleted, then you could have pointers which point to deleted rows.
However, undeleted CGZ rows would still give you their default text, which left them open to subsequent irreversible corruption by recompressTracked.php, which may have deleted some of these CGZ blobs, replacing them with a pointer to the primary text only.
The subsequent fixes (r6640, r8983) only fixed the text corruption at the source (i.e. deletion). Apparently no script was run to fix corrupted archive rows or undeleted text rows.
Some archive rows even have pointers to external storage, apparently moved in from old/text via the same bug.
The reason this is coming up now is that there are a fair few revisions which are either accessible (CGZ default text), or inaccessible but recoverable (CGZ pointers), which are now at risk of being lost permanently due to recompressTracked.php.
The basic plan of action is to compile a list of content hashes in affected CGZ blobs, and to match them up with broken pointers by comparing those content hashes.
I may be able to take this opportunity to normalise the entire archive table, by converting archive rows to the MW 1.5+ format, with a non-null ar_text_id, and blank ar_text and ar_flags. This will free up core database space and allow the deleted text to be recompressed.
Version: 1.4.x
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=34925