Page MenuHomePhabricator

Add primary key and log fields to archive table; perhaps also use original page_id when page is restored?
Open, MediumPublicFeature

Description

Author: leucosticte512

Description:
I offer the following proposed/possible changes for discussion, before I write the implementing patch:

(1) Add to archive table these fields:

  • ar_id (primary key)
  • ar_log_id (the log_id of the deletion action)

(2) What about these fields? Is there a need for them or can we just rely on JOINs with the logging table?

  • ar_log_timestamp (time of deletion)
  • ar_log_user (deleting user id)
  • ar_log_user_text (deleting user name)
  • ar_log_comment (deletion reason)

(3) Would it be better to restore deleted pages to the same page id they had originally? That page id is stored in ar_page_id.


Version: 1.20.x
Severity: enhancement

Details

Reference
bz39675

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:11 AM
bzimport set Reference to bz39675.

Thank you for these ideas.

To submit a patch, if you don't already have a developer access, please require one at https://www.mediawiki.org/wiki/Developer_access

You'll then be able to submit your code to our code review system (we use Gerrit).


(1) Could you give a sample of a query with a performance gain for these new indexes?

(2) The fields you noted doesn't seem to be currently in use in core. I haven't checked extensions.

(3) Could you document pro/cons to use old/new page id?


[ Adding schema-change keyword. Priority: normal. Confirming the bug.

Bug assigned to bug submitter, as he offers to prepare a patch. ]

leucosticte512 wrote:

(1) The reason for adding a primary key is because, according to Brion's comments at https://www.mediawiki.org/w/index.php?title=Proposed_Database_Schema_Changes&diff=48189&oldid=48188 , not having a primary key "is generally annoying, and means that we refer to deleted revisions with a (namespace,title,timestamp) tuple which isn't guaranteed to be unique. (Dupe timestamps can occur in a page's history due to funny bugs or history merging from two formerly separate pages; very quick consecutive operations may also produce dupes on second-resolution timestamps.)"

(2) I'm sorry, I didn't make clear that I was seeking to open discussion on whether we should add those four fields (ar_log_timestamp, ar_log_user, ar_log_user_text, and ar_log_comment), which do not presently exist; and/or add ar_log_id. Do we need all that stuff from the logging table, or will it suffice to just store that ar_log_id so that row in the logging table can be looked up by that primary key (log_id)? The purpose is largely to help sort out different sets of deleted revisions from the same page; this is important if we're going to implement suggestion (3) below, because we won't be able to distinguish them by ar_page_id anymore, since going forward they'll all have the same ar_page_id. (We'll still need to keep ar_page_id for backward compatibility.)

(3) Pros: It seems logical that page IDs shouldn't change, if we can avoid having them change, for the same reasons that good URLs shouldn't change. If anything references those page IDs, it's helpful to have it continue referencing the same page when it's restored. Cons: I haven't thought of any yet.

[ Adding Asher Feldman as cc, to get performance advice. ]

What changes would people like to see to Special:Undelete and ApiQueryDeletedRevs? For example, do we want to add some filters or sorts to Special:Undelete or some more modes and parameters to ApiQueryDeletedRevs, to search through revisions by such criteria as deleting user, deletion timestamp, deletion summary, deletion action (i.e. logid), etc.?

We could use more discussion on http://www.mediawiki.org/wiki/Requests_for_comment/Page_deletion. I wouldn't invest to much time into that table unless redoing it entirely.

Follow-up to comment 4: Also, Special:DeletedContributions.

Change 51675 merged by jenkins-bot:
Add archive, externallinks PK

https://gerrit.wikimedia.org/r/51675

Change 92433 had a related patch set uploaded by saper:
Add ar_id and el_id sequences for PostgreSQL

https://gerrit.wikimedia.org/r/92433

Change 92433 merged by jenkins-bot:
Add ar_id and el_id sequences for PostgreSQL

https://gerrit.wikimedia.org/r/92433

All patches have been merged. Is there more work to do here (if yes: What exactly?), or can this ticket be closed as RESOLVED FIXED?

Some of the other proposals, such as ar_log_id, didn't make it into the final patch (I can't remember why; it may be that I decided it was too much stuff to put in one change). But those other proposals could become a moot point depending on what ultimately happens with the page deletion RFC.

The "perhaps also use original page_id when page is restored?" part was done in T28123.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM
Aklapper removed a subscriber: leucosticte.