Page MenuHomePhabricator

Regular deletion of revisions deleted with rev_deleted breaks links in log entries
Closed, ResolvedPublic

Description

Author: FT2.wiki

Description:
With admin level RevisionDelete now viewable and not the same as suppression in practice, we're seeing revisions with a wide range of combinations of delete/restore and RevDelete activity.

A major problem has come to light. The differences in how deleted and undeleted revisions are referenced, discussed in bug 18104, are causing RevisionDelete to break badly all over the place. Examples:

  1. Links in the suppression log often now error out. For example in my suppression 00:21 October 25 2009 (and several following it), "change visibility" now clicks through to non-working links. the links that once worked, no longer work, presumably since the revision(s) have since been deleted/undeleted.
  1. Diffs no longer show page history properly. Revisions exist where something's missing in history but it's not clear what. For example the first suppression on the deleted article [[Chris Fournier]], by IP user 124.171.155.209. Clicking "diff" on this revision shows a diff that appears to delete data but not add any new data (nothing on right hand side), and a revision text that includes clear defamatory material that (apparently) was neither introduced in that diff, nor existed in the previous revision.
  1. Revisions that were moved from "normal" to "deleted" after some RevDelete actions and before others, change their designation from "revision" to "archive" or something. RevDelete logs and page history becomes very hard to follow as the links fail and as revisions are split between multiple pages and sections within pages.

Sorry to throw this at the developers, but this is a ghastly mess; it's getting hard to track down what happened when something comes up on a wiki.

I think trying to keep two parallel schemes for "deleted revisions" and "all other revisions" is to blame.

The solution seems to be to ensure deleted revisions can also be looked up (and DIFFed) by their original revision_id even after deletion, which would mean RevisionDelete and other functionality wouldn't have to account for a possible change of reference id at some future time.

Suggestion for fixing:

  • Ensure that revision_id's will transparently link through to the correct deleted revision if the revision is deleted, preventing breakage of links when a revision is deleted or restored. This allows revisions to be referenced by one fixed identifier in RevisionDelete regardless of later deletion/restoration.
  • Modify RevDel and all functions that add page information to the logs, so that all links related to a revision are referenced by the revision's revision_id alone, whether or not the revision is deleted.

A big job but it's got to the point that we need revisions accessible via the same reference id (possibly keeping timestamp as a 2nd identifier for ease and compatibility) whether they are deleted or not.


Version: unspecified
Severity: critical

Details

Reference
bz21279

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 10:54 PM
bzimport set Reference to bz21279.

FT2.wiki wrote:

(2 line summary: The interaction of RevDel and traditional delete, which involves multiple systems of revision identification, is causing major problems for page referencing, and leading to widespread issues of broken links and "lost" page history, and apparently missing or incorrect diffs.)

happy.melon.wiki wrote:

Fundamentally, "deleting" stuff by moving it from the revision table to an archive table is not the most durable way of doing it. Using another bit of the rev_deleted bitfield to indicate "deleted" would allow us to 'delete' just by setting that bit, and duplicate traditional oversight (bug20290) by setting the bit along with the "hide from admins" bit. And it would facilitate selective deletion, etc etc. But of course, that would involve rewriting our entire deletion and page management architecture (would have to stop deleting rows from the page table on deletion, but instead mark them with a page_deleted). Any volunteers? :D

FT2.wiki wrote:

Making the transparent mapping "usual revision_id/url" -> "deleted revision_id/url" work, either as an additional field or a failover catch in software, might work reasonably well as a short term fix. Revision_id would then at least be guaranteed to locate all edits/links/revisions/diffs whether deleted or not, until a more complete fix was possible.

FT2.wiki wrote:

See also bug 21312 for a possible option.

FT2.wiki wrote:

Confirming for user info how this breaks in a separate case. If RevDelete was used on multiple revisions, and some of those revisions later change deleted/undeleted state, then an attempt to click on the RevDelete action's link in the delete log to view the revisions concerned, will show (eg) 5 revisions in the log, but only 3 revisions will be listed when clicked. Locating the other 2 actions the admin acted upon could be easy or almost impossible, depending on the history and deleted history of the page.

FT2.wiki wrote:

(See also bug 18780 comment #12 for a few extra details)

I have a deleted_pages table local branch. It can get around this problem, the O(N) delete/restore problem, and page suppression issues.

Dusted it off today, with a few merge conflict fixes. I don't see anything changing soon (which I why stopped bothering with the branch after a while).

Aaron, time to merge into trunk?

RevisionDelete was enabled now; what will happen with that urgent bug request concerning RevDelete?

Assigning to Aaron.

Aaron: The concise version of this bug is that revision deletion log entries contain links to the revisions in question, which fail when those revisions have been deleted using the standard action=delete method.

FT2.wiki wrote:

... also in the much less common situation where RevDelete is applied to a deleted revision which becomes undeleted in future.

(Scenarios can arise where RevDelete would be applied to a deleted revision, but obviously not nearly as common)

Steps to reproduce (on a local trunk wiki):

  1. Create a page with two revisions.
  2. Use the single-revision deletion (rev_deleted column) interface to delete the first revision of the page.
  3. Delete the page.
  4. Undelete the page.
  5. View Special:Log/delete for the page.
  6. Click on one of the "more", "diff" or "change" links.

Expected result:
The interface shows the diff, suppression details or visibility editing interface.

Actual result:
An error message is shown.

FT2.wiki wrote:

Trivial example page (4 revisions, 2 revdeleted actions) exists at: http://en.wikipedia.org/wiki/Special:Undelete/User:FT2/test4 showing the errors generated by the 2 different modes of failure of log entries (revdeleted then delete, or revdeleted then undelete).

The same bug happens with page moves.

Please have a look at http://en.wikipedia.org/w/index.php?title=Special:Log&dir=prev&offset=20100331100647&limit=1&type=delete&user=DerHexer Clicking on "more..." takes you to http://en.wikipedia.org/w/index.php?title=Special:RevisionDelete&target=User%3ADerHexer%2Ftest2&type=revision&ids=364727852 which tells me "No matching items in log." That only happened because I've moved my page: http://en.wikipedia.org/w/index.php?title=User:DerHexer/test3&action=history Nobody can see now who did that revision deletion. That's a transparency issue which should be fixed as fast as possible.

FT2.wiki wrote:

Updated test - fix needed:

1/ revision with obvious deletions is not showing log entries.

Look at the first RevDel action at

http://en.wikipedia.org/w/index.php?title=Special:Log&page=User%3AFT2%2Ftest7

The link for "visible revisions" leads to a RevDel page with 2 log actions (correct). The link for "deleted revisions" leads to a RevDel page with none (error).

FT2.wiki wrote:

At the same time, can the RevDel page indicate clearly if the revisions listed are "Visible revisions" or "Deleted revisions". It would help.

Example page:

http://en.wikipedia.org/w/index.php?title=Special:RevisionDelete&target=User%3AFT2%2Ftest7&type=revision&ids=366963568

Simple fix, if the text at the top states "Selected [visible | deleted] revision(s) of {{PAGENAME}}"

and under "Other/additional reason" a line in bold "These revisions are currently [visible | deleted]".

Thanks!

FT2.wiki wrote:

Visible revisions have checkboxes and a "del/undel selected revisions" for multiple revisions in an action.

Can [[Special:Undelete]] have a similar button added? It already has the checkboxes.

A "del/undel selected revisions" button would be useful (eg to remove a RevDelete redaction on multiple deleted revisions), or to prevent the data being visible if deletion is reversed (eg page deletion is undone)

FT2.wiki wrote:

2nd fail case, see:

http://en.wikipedia.org/w/index.php?title=Special:Log&page=User%3AFT2%2Ftest8

Reproduction:

  1. Create page with 3 revisions
  2. Delete page
  3. RevDelete a field from middle revision (eg edit summary)
  4. Undelete page

Page delete log for the RevDelete action fails with "bad links".

FT2.wiki wrote:

Last, when the above fail case is fixed, can you check it also fixes the following (minor) fail case which I believe may be a byproduct of it.

I considered what happens if RevDelete is used on on old deleted revision without a rev_id and the revision is later restored - the issue being no rev_id so would log links then fail.

I theorized that if such a revision were selective undeleted and then redeleted, the selctive undeletion assigns a rev_id, so once redeleted this wouldn't be a problem, providing a quick workaround. In other words:

  1. Deleted revision without rev_id is redacted
  2. Revision is undeleted
  3. Deletion log links for the RevDelete action may fail.

WORKAROUND:

  1. Deleted revision without rev_id is undeleted and gains a rev_id, then redeleted
  2. Revision is _then_ redacted and undeleted at any future time
  3. Deletion log links for the RevDelete action should work correctly as the redaction will be linked to a correct permanent rev_id.

Can you check this at the same time? I suspect the issue at comment #19 may be affecting this so it can't be tested yet. I appreciate it's a pure edge case, but it's possible we may see some revisions moving from deletion to revdelete and it would be nice to check the log links issue wouldn't resurface or be an issue.

happy.melon.wiki wrote:

(In reply to comment #18)

Visible revisions have checkboxes and a "del/undel selected revisions" for
multiple revisions in an action.

Can [[Special:Undelete]] have a similar button added? It already has the
checkboxes.

A "del/undel selected revisions" button would be useful (eg to remove a
RevDelete redaction on multiple deleted revisions), or to prevent the data
being visible if deletion is reversed (eg page deletion is undone)

These are not directly related to this bug; please file (or find, I suspect this may have been raised before) separate bugs for them.

(In reply to comment #20)

I theorized that if such a revision were selective undeleted and then
redeleted, the selctive undeletion assigns a rev_id, so once redeleted this
wouldn't be a problem, providing a quick workaround. In other words:

You could check whether this definitely occurs with an old revision of the sandbox or something. I can imagine it actually making even more of a mess than that if ar_rev_id is null.

FT2.wiki wrote:

(In reply to comment #21)

1/ I'm using this bug as a tracking bug for the few last issues Werdna needs to check, to put the "RevDelete log links" issues to bed at this point. It sounds like he'll be tackling the last few together so this makes it easier for him to check them off, rather than opening several bugs for matters he will address in common. of course any tagged as "wontfix" or "later" may need extra bugs opened, but I think on this one, let's see how it goes before creating paperwork that may never be needed.

2/ I've tested the bug at #20 and tested revDelete on old deleted revisions of this kind. It appears to fail persistently. However it also fails on deleted revisions that do have an ar_rev_id. So at the moment it's impossible to tell whether in fact that's the problem and (by itself) old deleted revisions with ar_rev_id = null are actually not a big problem.

So I've noted this as an outstanding issue, and asked Werdna, when he's fixed RevDelete on deleted revisions that are later undeleted, to go back and recheck this one and see if that fixes this point as well. It's easier that way than waiting, then filing a new bug afterwards.

happy.melon.wiki wrote:

(In reply to comment #22)

(In reply to comment #21)

1/ I'm using this bug as a tracking bug for the few last issues Werdna needs to
check, to put the "RevDelete log links" issues to bed at this point.

The "button to mass-revdel deleted revisions is missing" issue is not related to this "RevDelete log links is borked" bug, nor is it something which will be fixed as a side-effect. They're totally unconnected, and it just clutters things to lump them together.

2/ I've tested the bug at #20 and tested revDelete on old deleted revisions of
this kind. It appears to fail persistently. However it also fails on deleted
revisions that do have an ar_rev_id. So at the moment it's impossible to tell
whether in fact that's the problem and (by itself) old deleted revisions with
ar_rev_id = null are actually not a big problem.

Fair point :D

So I've noted this as an outstanding issue, and asked Werdna, when he's fixed
RevDelete on deleted revisions that are later undeleted, to go back and recheck
this one and see if that fixes this point as well. It's easier that way than
waiting, then filing a new bug afterwards.

I'm not saying that "log links might also break in *this* way" or "log links also break in *that* way" should be separate bugs; it's certainly reasonable to keep them here. I'm saying that *totally separate issues* should be kept, well, separate... :D

Log entries also break if the page was moved. You can only see what happened if you're looking for the “revision author” in Special:Log/suppress. It can neither be found by “user” nor by “title”.

FT2.wiki wrote:

Anyone working on this - see bug 18780 which contains further discussion that could be relevant, or shed light on it.

The original case for this bug appears to have pretty much been resolved with a workaround:

  1. Adjust visibility for some revision of a page that exists
  2. Delete the page (or delete the page, then undelete all but that revision -- in any case, end up with the rev you adjusted deleted)
  3. Look at the log entries for the page

The link on the log page automatically changes from &type=revision&ids=<rev_id> to &type=archive&ids=<ar_timestamp> when this happens.

If you had saved the old link, then it won't work. But the link that's on the live page will update correctly, and it'll flip back again if you undelete the revision.

It doesn't seem to work the other way; if you adjusted visibility while deleted, then undelete it, the log entry doesn't update to point to the live revision.

And of course if you rename the page, all bets are off. Archived revisions are referenced by timestamp due to ar_rev_id being a new addition a few years ago, so -krrrrk- can't translate that.

I think the nicest thing to do would be to use &type=revision&ids=<rev_id|ar_rev_id> and save &type=archive for those with no saved ar_rev_id. These will be transportable across page moves and deletion/undeletion state changes, and no need to worry about them updating properly.

FT2.wiki wrote:

(In reply to comment #26)

If you had saved the old link, then it won't work. But the link that's on the
live page will update correctly

That's the problem. The wiki is full of discussions of past matters, also active users often have to keep a personal note of any diffs related to possible abuse wherefuture resolution might be needed (this is expected/directed behavior: users are told to keep negative diffs off-wiki unless a case of some kind is imminent and often a case can only be opened after considerable "history" has happened). These diffs are exactly the kind that no longer work.

FT2.wiki wrote:

A fairly straightforward workaround that would solve this (in almost all cases) without schema change is at bug 18104 comment 12.

Bug 18104 comment 12 seems like it would make it easier to (temporarily?) undelete particular deleted revisions that you have a rev id for but not the title+timestamp. Could be used to undelete something, tweak the link, then re-delete it?

Could help, but I still think it'll be better to just use the revision ID references whenever they're available on this side (the rev deletion / change visibility links).

Will see if I can throw that together...

FT2.wiki wrote:

(In reply to comment #29)

Could be used to undelete something, tweak the link, then re-delete it?

Could help, but I still think it'll be better to just use the revision ID
references whenever they're available on this side (the rev deletion / change
visibility links).

Will see if I can throw that together...

The aim is "old links shouldn't break under deletion", rather than "the old linked pages can be traced and then the old pages referring to them all edited to contain updated links". I _think_ that's what you're saying....? :)

Bug 18104 comment 12 is saying that because deleted/undeleted revisions still have their oldid (the few old historical exceptions can be given an id as a once-off task) which is not lost during deletion, _therefore_:

1/ If an oldid is not found in current revisions, Mediawiki can quickly check the _deleted_ revisions table before reporting "revision not found". This would guarantee that links and diffs would always work transparently regardless of deletion/undeletion.

2/ Special pages such as [[Special:Undelete]] and [[Special:DeletedContributions]] could provide links based on oldid rather than page/timestamp, and these oldid based links would also be guaranteed to always work regardless of future deletion/undeletion, unlike page/timestamp based links which could fail.

3/ Because a revision can be found whether current or deleted, diffs can be generated between any two revisions, even if one or both revisions is subsequently deleted/undeleted. At present they cannot.

4/ Traceability despite deletion is made simpler. At present it can be difficult.

5/ Any future migration to using revision id as the sole means to locate a revision would be made easier but no schema change would be needed for the above.

Speaking specifically to my plan for links in rev delete logs:

For any revision that's either existing in the revision table, or has an ar_rev_id value stored, we will output permanently working links in the form &type=revision&ids=<rev_id|ar_rev_id>.

This will apply to both old and new log entries, though old links that have been _copied_ will obviously remain in their original form.

New and old &type=revision&ids=<rev_id|ar_rev_id> links will always work in the future, by looking up each rev via both the revision & archive tables.

Old &type=archive&ids=<ar_timestamp> links should remain similarly workable for later-undeleted revisions *unless the page is renamed*, in which case we won't be able to reliably find it in the revision table.

Diff view, special:undelete, etc can benefit from similar improvements but they don't rely on each other directly.

FT2.wiki wrote:

Should a separate request (dependant on this one) be set up for modifying the links that are displayed in other pages, where those are page/timestamp based?

I'd say yes -- they'll be related in the underlying issue/goal, but keeping them separate will keep our eyes from glazing over as we run away from the giant bug report :D

Special:Undelete I think will need more work for a serious overhaul, while it looks like it should more isolated here.

FT2.wiki wrote:

Done. See bug 28819 ("Deleted revision links should use "rev_id" not
"page/timestamp") and bug 28820 ("Diffs using rev_id should work when one or
both revisions are deleted").

FT2.wiki wrote:

And bug 28821 ("Special:Undelete to use same radio button styling for diffs as
history pages") - forgot that.

(In reply to comment #33)

Special:Undelete I think will need more work for a serious overhaul, while it
looks like it should more isolated here.

If you rework Special:Undelete, try to make it use Title->userCan() instead of User->isAllowed
That's something that has been waiting for months.

This should cover most of the initial basics on trunk:

r87803 - fix for FakeResultWrapper

r87804 - the main stuff: allow Special:RevisionDelete to accept revision IDs for deleted revs, and keep using those same URLs in log entries.

r87805 - quick fix to Special:Undelete to use the revision IDs on links to Special:RevisionDelete, which should cover most cases generating those links. (Will still use type=archive for old items that have no rev id saved)

r87806 - quick fix for diff view to at least link to Special:Undelete on the error page for missing diff revisions. Needs more polishing (via bug 28820)

Not closing the bug yet as it'll need a little more tweaking to be fully nice.

(In reply to comment #37)

r87805 - quick fix to Special:Undelete to use the revision IDs on links to
Special:RevisionDelete, which should cover most cases generating those links.
(Will still use type=archive for old items that have no rev id saved)

What about links from Special:DeletedContributions?

Good catch -- looks like that needs update as well. Probably a mass-search for revDeleteLink and revDeleteLinkDisabled will turn up more cases to merge -- should make sure they're all treated consistently.

Assigning back to brion. I have no clue around Revision Delete, I just ran some searches :-)

(In reply to comment #38)

(In reply to comment #37)

r87805 - quick fix to Special:Undelete to use the revision IDs on links to
Special:RevisionDelete, which should cover most cases generating those links.
(Will still use type=archive for old items that have no rev id saved)

What about links from Special:DeletedContributions?

See r93860.

See r93860.

Does that mean this can be closed now?