Page MenuHomePhabricator

Pages with FlaggedRevisions protection set should give the latest-flagged version through search engine "API feeds"
Open, MediumPublicFeature

Description

Per https://en.wikipedia.org/wiki/Wikipedia_talk:PC2012/RfC_2#Comment_-_The_Google_results_will_show_the_.22vandalized.22_versions

This undermines part of the goal of Pending changes of making vandalized page less visible to the general public.


Version: master
Severity: enhancement

Details

Reference
bz41003

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:47 AM
bzimport set Reference to bz41003.
bzimport added a subscriber: Unknown Object (MLST).

Somewhere in the linked discussions, Jdforrester (WMF) wrote:

Unfortunately I've confirmed that the PC work did not include modifying the
output of the API feeds in this way (so yes, GoogleBot et al. will get the
"latest" version regardless of the page's PC state).

Which "API feeds", specifically, do Googlebot and other search engine bots use?

(In reply to comment #1)

Somewhere in the linked discussions, Jdforrester (WMF) wrote:

Unfortunately I've confirmed that the PC work did not include modifying the
output of the API feeds in this way (so yes, GoogleBot et al. will get the
"latest" version regardless of the page's PC state).

Which "API feeds", specifically, do Googlebot and other search engine bots use?

I do not know which API feeds but, it hold be restricted to all feeds unless the account or bot that is requesting the latest revision has the reviewer bit. This is for security.

(In reply to comment #2)

I do not know which API feeds but, it hold be restricted to all feeds unless
the account or bot that is requesting the latest revision has the reviewer bit.
This is for security.

What "security"? Anyone, even people without accounts, can view the most recent revision of the article. This is by design. The ''only'' thing that pending changes does is change which revision is shown by default (i.e. when someone visits "http://en.wikipedia.org/wiki/Title").

(In reply to comment #3)

(In reply to comment #2)

I do not know which API feeds but, it hold be restricted to all feeds unless
the account or bot that is requesting the latest revision has the reviewer bit.
This is for security.

What "security"? Anyone, even people without accounts, can view the most recent
revision of the article. This is by design. The ''only'' thing that pending
changes does is change which revision is shown by default (i.e. when someone
visits "http://en.wikipedia.org/wiki/Title").

That's not according Wikipedia. It's only supposed to show the most recent approved revision and hide the unstable one until approved. Please elaborate.

(In reply to comment #4)

That's not according Wikipedia. It's only supposed to show the most recent
approved revision and hide the unstable one until approved. Please elaborate.

No, it's not. It's supposed to show the most recent approved revision ''by default'', but there is nothing in the specification or the implementation on enwiki that says it's supposed to make the unreviewed revisions unviewable.

Try it out. Go to [[en:Wikipedia:Pending changes/Testing/4]] and either make an edit (if you're not a reviewer) or unapprove the most recent edit (if you are), and then look at the page while logged out. See the "Pending changes" tab between "Read" and "Edit"? Click that, and then click where it tells you there are unreviewed edits. Or click the "Edit" button and see how it tells you there are unreviewed edits and includes them in the edit box. Or click the History tab and just go to the unreviewed revision directly.

Clarified the title to turn it into an 'ask'. This is aimed not just at GoogleBot, but clearly that's an important criterion.

(In reply to comment #1)

Somewhere in the linked discussions, Jdforrester (WMF) wrote:

Unfortunately I've confirmed that the PC work did not include modifying the
output of the API feeds in this way (so yes, GoogleBot et al. will get the
"latest" version regardless of the page's PC state).

Which "API feeds", specifically, do Googlebot and other search engine bots use?

I don't know what feed this is either, but there are some vague outdated meta pages that allowed to some extension or feed somewhere. Maybe brion knows?

(In reply to comment #6)

Clarified the title to turn it into an 'ask'. This is aimed not just at
GoogleBot, but clearly that's an important criterion.

Clarified slightly more. It seems unlikely to me that "API feeds" has anything to do with the MediaWiki API, since I doubt Google is constantly hitting and then parsing action=query&list=recentchanges.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM