Page MenuHomePhabricator

Allow multiple categories for categorymembers list
Closed, DeclinedPublic

Description

Author: jutiphan

Description:
In list=categorymembers function, it would be great to add support to cmtitle to handle more than one category. This would be useful when requesting members from multiple categories at once. The possible scenario I would use this for to to query list of categories to be checked for tagging WikiProject banners.


Version: unspecified
Severity: enhancement
URL: http://th.wikipedia.org/w/api.php

Details

Reference
bz14425

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:09 PM
bzimport set Reference to bz14425.

This can't be implemented efficiently, because we sort results by category name first, then by sortkey, then by page ID. That means cmtitles=Category:Foo|Category:Bar will first list all members of Category:Foo, then all members of Category:Bar, meaning it's no better than just doing two queries.

aaron wrote:

roan, an inefficient implementation would still help me a lot, so i hope you might reconsider.

i am writing an application which, for a given article, acquires lists of peer articles in some of its categories. without paging, this means making one asynchronous HTTP connection per category, which breaks badly on "Amtrak" (with over 60 categories, many of which have almost no members). the order the contents were returned is less important than a big cut in the number of connections i need to make.

by executing action=query&generator=categories&prop=categoryinfo in advance, i can pick out which categories have suitable numbers of pages in them to proceed to the member listing step. this means that a category with hundreds of members could come last in the list, giving me access to a good subset of the category members in a single HTTP request (several if i page on through the results, but still not 60).

(In reply to comment #2)

roan, an inefficient implementation would still help me a lot, so i hope you
might reconsider.

I meant inefficient on the server side. So I'm sure it would help you a lot, but it would also overload the database servers.

i am writing an application which, for a given article, acquires lists of peer
articles in some of its categories. without paging, this means making one
asynchronous HTTP connection per category, which breaks badly on "Amtrak" (with
over 60 categories, many of which have almost no members). the order the
contents were returned is less important than a big cut in the number of
connections i need to make.

I understand that ordering isn't important to everyone, but it's needed to make query-continue possible.

by executing action=query&generator=categories&prop=categoryinfo in advance, i
can pick out which categories have suitable numbers of pages in them to proceed
to the member listing step. this means that a category with hundreds of members
could come last in the list, giving me access to a good subset of the category
members in a single HTTP request (several if i page on through the results, but
still not 60).

Sounds like a good solution.