Page MenuHomePhabricator

rcid cannot be easily retrieved - implementation seems patchy
Closed, DeclinedPublic

Description

Author: martinp23

Description:
If I do a query using the recentchanges list with rctype=new, the rcid is given in the results. This seems to be the only way to retrieve the rcid for a new page creation using the API, while commonsense might suggest that a query for the first revision of a page (eg http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Judith%20Wood&rvdir=newer&rvlimit=1&rvprop=timestamp|ids ) would show it.

The rcid is needed to construct a URL in order to mark a page patrolled (on en.wikipedia).


Version: 1.12.x
Severity: enhancement
URL: http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Judith%20Wood&rvdir=newer&rvlimit=1&rvprop=timestamp|ids

Details

Reference
bz12394

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:00 PM
bzimport set Reference to bz12394.

rcid is only listed in list=recentchanges 'cause it's stored in the recent changes table, and not in the revision table. For regular edits (i.e. edits other than page creations), rcid is also only available through Special:Recentchanges. The [Mark this page patrolled] link is an exception, but as you point out you can get it through list=recentchanges&rctype=new as well. It could be a good idea to add filtering by page title to list=recentchanges though, I'll add that.

Added rctitles in r31348. Now you can get the rcid of a page through list=recentchanges&rctitles=Foo

matthew.britton wrote:

Using 'rctitles' parameter on en.wikipedia seems to cause timeouts and teal screens of death.

(In reply to comment #3)

Using 'rctitles' parameter on en.wikipedia seems to cause timeouts and teal
screens of death.

Removed rctitles parameter because of performance concerns in r46823. Patrolling stuff isn't very easy right now, I agree, but fixing bug 17237 should improve that.

This one was removed in 1.15 (documentation wasn't updated yet...)

Through the commits related to this one I read that when this was first implemented it would scan the entire recent changes table, hence the slowness if the disired result is far in the back (or perhaps not in it at all).

However, I see in the current database structure there's a seperate column for rc_title.
I'm not sure since when this exists, and/or if it was originally utilized, but when using that in the query (AND WHERE rc_title='Foobar') it'd be like any other condition currently in the recentchanges API right (same thing for rc_user, with $this->addWhereFld(); )

Sorry if this was indeed the way it was already done or if it's not a good way at all, just hoping to get this one fixed :-)

(In reply to comment #5)

However, I see in the current database structure there's a seperate column for
rc_title.
I'm not sure since when this exists, and/or if it was originally utilized, but
when using that in the query (AND WHERE rc_title='Foobar') it'd be like any
other condition currently in the recentchanges API right (same thing for
rc_user, with $this->addWhereFld(); )

The original implementation did do a WHERE on rc_namespace and rc_title, yes, but that's not the same as doing a WHERE on rc_user because the latter is indexed. Implementing this feature would require adding an index for it (kinda leery of that) and even then it'd have to sort by namespace, then title, then timestamp in order to work efficiently.

(In reply to comment #6)

(In reply to comment #5)

However, I see in the current database structure there's a seperate column for
rc_title.
I'm not sure since when this exists, and/or if it was originally utilized, but
when using that in the query (AND WHERE rc_title='Foobar') it'd be like any
other condition currently in the recentchanges API right (same thing for
rc_user, with $this->addWhereFld(); )

The original implementation did do a WHERE on rc_namespace and rc_title, yes,
but that's not the same as doing a WHERE on rc_user because the latter is
indexed. Implementing this feature would require adding an index for it
(kinda
leery of that) and even then it'd have to sort by namespace, then title, then
timestamp in order to work efficiently.

There is a rc_namespace_title index though. But unlike the one for rc_user, it doesn't have rc_timestamp.

mediawiki-core@master:/maintenance/tables.sql:
INDEX rc_timestamp ON recentchanges (rc_timestamp);
INDEX rc_namespace_title ON recentchanges (rc_namespace, rc_title);
INDEX rc_cur_id ON recentchanges (rc_cur_id);
INDEX new_name_timestamp ON recentchanges (rc_new,rc_namespace,rc_timestamp);
INDEX rc_ip ON recentchanges (rc_ip);
INDEX rc_ns_usertext ON recentchanges (rc_namespace, rc_user_text);
INDEX rc_user_text ON recentchanges (rc_user_text, rc_timestamp);

Would it make sense for rc_namespace_title to have it? I wonder what it is used for and if those uses would have a problem with the extra rc_timestamp sort. The default sort for rc_namespace_title is presumably rc_id which should have be very close to the sort order of rc_timestamp.

The main reason it needs rc_timestamp is not for the sort order, but to be able to do rcstart and rcend.

Filed T57377 for adding support for rctitles to query=recentchanges.

@Catrope, this is one of the oldest tasks assigned to someone. Are you planning to work on it, and is this Normal priority correct?

Qgil removed Catrope as the assignee of this task.Feb 14 2015, 3:37 PM
Qgil set Security to None.

The rcid is needed to construct a URL in order to mark a page patrolled (on en.wikipedia).

No longer the case as of 50faf2138cb80a74db03abc3b02b5506dda2d310 (from 2013).

I guess we can decline this, there shouldn't be a reason to need a first revisions rc_id anymore.