Page MenuHomePhabricator

Inconsistencies in how lists of protected pages are found via API
Closed, DeclinedPublic

Description

Author: a.d.bergi

Description:
I was asked to build a list of protected pages (at de:WP), sorted by the block date [1]. As the protection property of prop=info doesn't provide that data, I had to query the log for each page. OK, adding this information to the info module might be another bug, but doing the query I found:

  • pages listed by api.php?action=query&list=allpages&apprtype=move|edit (and having protection properties in prop=info), but having no log entries. Examples are [[de:Amerika]], [[de:Neonazismus]] and others. They can be found in the list [1] with "undefined" user/timestamp/comment.
  • pages having more than two protection properties in prop=info. This happened over 300 times, if it is intended please explain. For example [[de:Nationalsozialistische Deutsche Arbeiterpartei]] was only one time protected pursuant to the log, but has 3 edit and 3 move protections (api.php?action=query&prop=info&inprop=protection&titles=Nationalsozialistische%20Deutsche%20Arbeiterpartei).
  • pages listed at [[de:Spezial:Geschützte Seiten]], but not in the api query. For example [[de:Flatulenz]], which was protected, unprotected and re-protected according to the log and has two protection properties in prop=info, but is not listed in list=allpages: api.php?action=query&list=allpages&apprtype=move|edit&apprefix=Fl&prop=info&inprop=protection&titles=Flatulenz.

Could all these cases be caused by protect actions in previous MW versions (and conversion scripts missed them)? Are they intended? Should I file individual bugs for each?

[1]: The result of my query can be found at [[de:Wikipedia:Liste der am längsten geschützten Artikel]]


Version: 1.18.x
Severity: normal

Details

Reference
bz33304
TitleReferenceAuthorSource BranchDest Branch
dev: update toolforge-weld to 1.4.0+repos/cloud/toolforge/builds-cli!46sstefanovaslavina/update-toolforge-weldmain
dev: update toolforge-weld versionrepos/cloud/toolforge/envvars-cli!18sstefanovaslavina/update-toolforge-weldmain
d/changelog: bump to 0.0.8repos/cloud/toolforge/builds-cli!34sstefanovaslavina/bump_builds-climain
Fix: don't use colalign in tabulaterepos/cloud/toolforge/builds-cli!32sstefanovaslavina/fix-tabulate-issuemain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:04 AM
bzimport set Reference to bz33304.

a.d.bergi wrote:

What is the difference in the database query between Special:Protected Pages and api.php?action=query&list=allpages&apprtype=move|edit?

For the multiple entries see bug 28751

Protection moves are logged since some version ago. Have a look at the move log to find the protection.

It is not comprehensible why
api.php?action=query&generator=allpages&prop=info&inprop=protection&gapprefix=Flatulenz

list the page with protection properties, but

api.php?action=query&generator=allpages&prop=info&inprop=protection&gapprefix=Flatulenz&gapprtype=edit|move

does not list the page at all. The page is protected. Maybe some problems with the old way (or with the maintenance script), which protection are stored? Need a look into the database.

The page from comment 1 are move protected only. There are editable, but not movable and there are listed:

api.php?action=query&generator=allpages&prop=info&inprop=protection&gapprefix=Bahnhof Eisenach&gapprtype=move

The difference is that Special:ProtectedPages can only filter for one type.

There are other problems too. Querying [1] results in a) a lot of duplicates and b) semi-protected pages are also listed along with full protected ones.

So far I haven't been able to check if it's respecting the indefinite parameter or not.

Using it as a generator [2] results in removal of the doubles, but does not change the inclusion of semi-protected pages in the list.

Example: [[hi:अंकोरवाट मंदिर]] which is (as of now) semi-protected indefinitely, is included both times. See its log: [3].

[1]: http://hi.wikipedia.org/w/api.php?action=query&list=allpages&apprlevel=sysop&apprexpiry=indefinite&aplimit=500

[2]: http://hi.wikipedia.org/w/api.php?action=query&generator=allpages&gapprlevel=sysop&gapprexpiry=indefinite&gaplimit=500&prop=revisions&rvprop=timestamp

[3]: http://hi.wikipedia.org/w/index.php?title=%E0%A4%B5%E0%A4%BF%E0%A4%B6%E0%A5%87%E0%A4%B7%3ALog&page=%E0%A4%85%E0%A4%82%E0%A4%95%E0%A5%8B%E0%A4%B0%E0%A4%B5%E0%A4%BE%E0%A4%9F_%E0%A4%AE%E0%A4%82%E0%A4%A6%E0%A4%BF%E0%A4%B0

https://de.wikipedia.org/w/api.php?action=query&prop=info&inprop=protection&titles=Nationalsozialistische%20Deutsche%20Arbeiterpartei&format=jsonfm

mysql> SELECT pr_page, pr_type, pr_level, pr_expiry, pr_cascade, page_namespace, page_title FROM page_restrictions, page WHERE page_id=pr_page AND pr_page=3627;
+---------+---------+---------------+-----------+------------+----------------+------------------------------------------------+

pr_pagepr_typepr_levelpr_expirypr_cascadepage_namespacepage_title

+---------+---------+---------------+-----------+------------+----------------+------------------------------------------------+

3627editautoconfirmedinfinity00Nationalsozialistische_Deutsche_Arbeiterpartei
3627moveautoconfirmedinfinity00Nationalsozialistische_Deutsche_Arbeiterpartei

+---------+---------+---------------+-----------+------------+----------------+------------------------------------------------+
2 rows in set (0.00 sec)

This bug is somewhat of a mess.

Multiple different points have been raised, and it makes it somewhat difficult to work out what's what.

Pages not appearing as protected in one place, but are in another is one issue.

Pages having multiple protection issues is something different entirely. A quick look at the ApiQueryInfo code suggests it's likely to be the check of protections via different methods. Probably simply fixed by rather than just "blindly" adding another protection to the array, we key it with something (level maybe?) so duplicates won't be inserted.

Can we move these to seperate bugs (and/or keep some of it here)? People then going "and then there is also this similar issue" and dumping more information onto the same bug isn't exactly helpful either.

(In reply to comment #5)

This bug is somewhat of a mess.

That's an understatement.

(In reply to comment #0)

  • pages listed by api.php?action=query&list=allpages&apprtype=move|edit (and

having protection properties in prop=info), but having no log entries.
Examples
are [[de:Amerika]], [[de:Neonazismus]] and others. They can be found in the
list [1] with "undefined" user/timestamp/comment.

I note that both are very old pages. According to [[en:Wikipedia:Protection log]], protections were not automatically logged before 23 December 2004. Is it possible those pages were protected before that date (or the corresponding date for dewiki, if it differs)? If so, this is not a bug.

  • pages having more than two protection properties in prop=info. This

happened
over 300 times, if it is intended please explain. For example
[[de:Nationalsozialistische Deutsche Arbeiterpartei]] was only one time
protected pursuant to the log, but has 3 edit and 3 move protections
(api.
php?action=query&prop=info&inprop=protection&titles=Nationalsozialistische%20
Deutsche%20Arbeiterpartei).

As mentioned in comment 2, this is bug 28751.

  • pages listed at [[de:Spezial:Geschützte Seiten]], but not in the api query.

For example [[de:Flatulenz]], which was protected, unprotected and
re-protected
according to the log and has two protection properties in prop=info, but is
not
listed in list=allpages:
api.
php?action=query&list=allpages&apprtype=move|edit&apprefix=Fl&prop=info&inpro
p=protection&titles=Flatulenz.

That's an actual bug: old protections store indefinite protection with pr_expiry = NULL, but the query used by list=allpages does not take this into account.

Gerrit change 46662 should fix it.

(In reply to comment #1)

  • There are also pages which are listed as protected by api's allpages (and

wich have protection properties in prop=info as also some matching log
entries), but which are a) editable and b) not listed on Special:Protected
Pages. Examples for that are
http://de.wikipedia.org/w/api.
php?action=query&prop=info&inprop=protection&titles=Lansdowne%20Portrait|Schw
eizerische%20Käseunion|Bahnhof%20Eisenach|Eisenbahn%20in%20Thüringen
What is the difference in the database query between Special:Protected Pages
and api.php?action=query&list=allpages&apprtype=move|edit?

As mentioned in comment 2, these are not edit protected, only move protected. Special:ProtectedPages (now) has a dropdown to select the type of protection to search for; these pages will show up if you choose "move" rather than "edit".

(In reply to comment #3)

There are other problems too. Querying [1] results in a) a lot of duplicates

Yeah, that shouldn't be happening. For some reason the module is only adding DISTINCT to the query when apprtype is used, while your query uses only apprexpiry.

Gerrit change 46665 should fix it.

and b) semi-protected pages are also listed along with full protected ones.

apprlevel is only effective when combined with apprtype. This is documented.

The error could be detected and reported when apprlevel is used with apprexpiry but not apprtype, but that would break clients that use queries like those in comment 3 and I'm not sure it's worth breaking backwards compatibility.

Brad, thanks for the detailed analysis here, much apppreciated.

Updating and summarizing based on comment 6 by Brad:

The first issue (before 2004) might not be a software bug.
The second issue is already handled in another bug report.
The two following issues have received patches that have been merged into the codebase, so they should be fixed.
The last issue is not a bug either and expected behavior.

So to me this report seems to be 1/5 INVALID, 2/5 FIXED, 1/5 DUPLICATE and 1/5 WONTFIX (in that order).

I'm setting RESOLVED WORKSFORME as that is between everything.

bergi: Please leave a comment here with exact steps to reproduce if any of the described (valid) issues still happens to you. Thanks!