Page MenuHomePhabricator

Change of page_id when moving makes life difficult
Closed, DeclinedPublic

Description

Author: swalling

Description:
When you move a page, the page_id changes. The new id doesn't show up in the log. While end users can see the old or new page name and namespace, this logging makes it hard to programmatically hard to track page moves.

I see two solutions here:

  • Don't change the page_id when moving
  • Log the new and the old page_id when moving

Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=57084

Details

Reference
bz57079

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:41 AM
bzimport set Reference to bz57079.
bzimport added a subscriber: Unknown Object (MLST).

swalling wrote:

(In reply to comment #0)

When you move a page, the page_id changes. The new id doesn't show up in the
log. While end users can see the old or new page name and namespace, this
logging makes it hard to programmatically hard to track page moves.

I see two solutions here:

  • Don't change the page_id when moving
  • Log the new and the old page_id when moving

Blergh. Should read: "makes it hard to programmatically track page moves."

I don't think the page ID changes, that would require an O(N) update of rev_page that does not exist AFAIK.

It doesn't change based on my local tests:

php> echo Title::newFromText('After move')->getArticleId();
320

php> echo Title::newFromText('After move')->getArticleId();
320

Same is confirmed using wgArticleId. Can someone provide a way to reproduce this? Otherwise, I will close as WORKSFORME.

I agree. It seemed strange to me too. I just realized that the logging table could be storing the page ID of the redirect that was created. I'll try to demo the issue.

Get the last page move in the logging table on enwiki:

SELECT log_namespace, log_title, log_page, log_params FROM logging WHERE log_action = "move" and log_type = "move" AND log_namespace = 0 ORDER BY log_id DESC LIMIT 1;

+---------------+-----------+----------+------------------------------------------------------------------------+

log_namespacelog_titlelog_pagelog_params

+---------------+-----------+----------+------------------------------------------------------------------------+

0Chappo41083072a:2:{s:9:"4::target";s:14:"Chappo (album)";s:10:"5::noredir";s:1:"0";}

+---------------+-----------+----------+------------------------------------------------------------------------+
1 row in set (0.29 sec)

That means "Chappo" was moved to "Chappo (album)". But the recorded page_id points to a new page that has become a redirect:

select page_title, page_namespace from page where page_id = 41083072;

+------------+----------------+

page_titlepage_namespace

+------------+----------------+

Chappo0

+------------+----------------+
1 row in set (0.04 sec)

If you view the content of the article at that page ID, you should see:

#REDIRECT [[Chappo (album)]]

{{R from move}}

My proposed solution is to store the page_id of the *moved page* in log_page and optionally store the page_id of the created redirect in the hashed array in log_params.

This solution is slightly problematic since old code could be relying on the page_id of the redirect page appearing in log_page. Though, I can't imagine why anyone would be looking for that there.

Sure enough. log_page = 0 is no redirect is created.

SELECT log_id, log_namespace, log_title, log_page, log_params FROM logging WHERE log_action = "move" and log_type = "move" AND log_params LIKE '%"5::noredir";s:1:"1";%' ORDER BY log_id DESC LIMIT 1;

+----------+---------------+----------------------+----------+------------------------------------------------------------+

log_idlog_namespacelog_titlelog_pagelog_params

+----------+---------------+----------------------+----------+------------------------------------------------------------+

526270690UCI_(disambiguation)0a:2:{s:9:"4::target";s:3:"UCI";s:10:"5::noredir";s:1:"1";}

+----------+---------------+----------------------+----------+------------------------------------------------------------+
1 row in set (0.04 sec)

swalling wrote:

Resolving per all the comments.

Yeah the old page ID makes sense to store, with the new one in params and possibly log_relations.

See bug 57084 for a new report covering the actual issue.