Page MenuHomePhabricator

Export should use the table "page_restrictions" and not the field of the table "page"
Closed, ResolvedPublic

Description

Author: dherding

Description:
Spezial:Export is supposed to show that a page is locked. This works for the
main page on de.wikipedia:
http://de.wikipedia.org/wiki/Spezial:Exportieren/Hauptseite has the line
<restrictions>edit=sysop:move=sysop</restrictions>, as it is supposed to be.

http://de.wikipedia.org/wiki/Spezial:Exportieren/Adolf_Hitler also works fine,
showing this line:
<restrictions>edit=autoconfirmed:move=autoconfirmed</restrictions>

But for other locked or semi-locked pages, this doesn't work:
http://de.wikipedia.org/wiki/Sekte is fully locked, however
http://de.wikipedia.org/wiki/Spezial:Exportieren/Sekte doesn't have restriction
tags.

http://de.wikipedia.org/wiki/Nationalsozialismus is semi-locked, but
http://de.wikipedia.org/wiki/Spezial:Exportieren/Nationalsozialismus doesn't
have restriction tags either.


Version: 1.10.x
Severity: normal

Details

Reference
bz9226

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:41 PM
bzimport set Reference to bz9226.

dherding wrote:

Note that this does not only occur on the German Wikipedia, for example
http://nl.wikipedia.org/wiki/Speciaal:Export/Sjabloon:Atlas also lacks
restriction tags although http://nl.wikipedia.org/wiki/Sjabloon:Atlas is fully
locked.

rotemliss wrote:

The export tool seems to use the obsolete "page.page_restrictions" field, while
the table "page_restrictions" is currently used and updated. It should probably
use the table.

robchur wrote:

Marking as blocking bug 700 (code quality) and increasing severity, since it's a
data consistency issue.

dherding wrote:

As a note why I noticed this bug: The PyWikipediaBot Framework relies on the
restrictions tag; if missing, the bot will try to edit locked pages with a
non-sysop account, which will lead to an error message.

The unfortunate thing about this is it's hard to cleanly join on for the bulk
export query. Hmmm, maybe a join with a COUNT(*) and then look up individual
items just for the pages with protection entries?

Need to test to ensure that won't bork up the speed of the query.

Created attachment 3326
Attempted quick hack to use page_restrictions table

The query looks ok to me, but what do I know. :)

There's a few issues with this, though...

a) Information drawn from page_restrictions table may be out of sync.
A protected page could perhaps be deleted or have its protection levels changed
between the start of the query and the time the row is read, leading to
slightly inconsistent output if transactions aren't doing the right thing.

b) New features such as expirations and cascade options are not reported.
We should probably think about an expandable schema for protection information
and toss that in.

c) Who knows what else might be wrong. ;)

As for the bot editing case, I have to warn that a page that's protected by
cascade from another page wouldn't end up listing here anyway, so I'm not sure
how much totally we can do here? Maybe something else is best?

Attached:

Mass compoment change: <some> -> Export/Import