Page MenuHomePhabricator

Implement a special page to show items with the most sitelinks
Closed, DeclinedPublic

Description

On a MediaWiki install we have special pages (https://www.wikidata.org/wiki/Special:SpecialPages) like Most linked-to pages and Pages with the most interwikis. The wikibase extension should add an extension for items with the most sitelinks.
Rough query:

SELECT ips_item_id, COUNT(ips_item_id) FROM wb_items_per_site GROUP by ips_item_id ORDER BY COUNT(ips_item_id) DESC LIMIT 100;

URL: https://www.wikidata.org/wiki/Special:SpecialPages

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:23 AM
bzimport set Reference to bz46217.
bzimport added a subscriber: Unknown Object (MLST).

Maarten: You mean we should have such a special page on Wikidata? Or on the client (Wikipedia, Wikivoyage, Commons, ...)?

I was thinking about Wikidata itself.

Change 94830 had a related patch set uploaded by Bene:
(bug 46217) Implement a special page to show items with the most sitelinks

https://gerrit.wikimedia.org/r/94830

The suggested patch needs to be changed to allow more efficient SQL (see bug 40157 and bug 58032).

Change 94830 abandoned by Bene:
(bug 46217) Implement a special page to show items with the most sitelinks

Reason:
this is not likely to get implemented this way

https://gerrit.wikimedia.org/r/94830

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Multichill set Security to None.

Looks like I worked on this months ago, I might as well finish it.

Change 181902 had a related patch set uploaded (by Multichill):
Implement a special page to show items with the most sitelinks

https://gerrit.wikimedia.org/r/181902

Patch-For-Review

I took a different approach than Bene: I extended the standard QueryPage. Would appreciate input. If this approach works I plan to add some more special pages and maybe rewrite Special:unconnectedpages.

Change 181902 had a related patch set uploaded (by Siebrand):
Implement a special page to show items with the most sitelinks

https://gerrit.wikimedia.org/r/181902

Patch-For-Review

Change 181902 abandoned by Multichill:
Implement a special page to show items with the most sitelinks

Reason:
Too frustrating, not going to invest time in this any more.

https://gerrit.wikimedia.org/r/181902

Multichill removed a project: Patch-For-Review.
Multichill added a subscriber: Ladsgroup.

Abandoned. Too frustrating, not going to invest time in this any more. Up for grabs.

The major problem with both patches is that they use COUNT and JOIN and GROUP BY. This does not scale well for such big tables. What we do now is to store the number of sitelinks per item in a "wb-sitelinks" page property (in the page_props table). pp_sortkey is a numeric field that can be accessed then, and possibly ordered (need to check this). This should already be deployed, as far as I know.

So whoever wants to pick this up, please pick one of the existing patches, reopen it and change the SQL query to query pp_sortkey in page_props instead.

Change 181902 restored by JanZerebecki:
Implement a special page to show items with the most sitelinks

Reason:
Restoring to make it possible for Ricordisamoa to work on it.

https://gerrit.wikimedia.org/r/181902

With https://gerrit.wikimedia.org/r/232698 I explore a different approach: it would be possible to use something like https://www.wikidata.org/wiki/Special:PagesWithProp/wb-sitelinks?sortbyvalue=1.
The same would work with wb-claims, etc.

The page property wb-sitelinks got introduced. That might open some options. It's stored as a string so no simple output

MariaDB [wikidatawiki_p]> SELECT * FROM page_props WHERE pp_propname='wb-sitelinks' ORDER BY pp_value DESC LIMIT 20;
+---------+--------------+----------+------------+
| pp_page | pp_propname  | pp_value | pp_sortkey |
+---------+--------------+----------+------------+
|   11009 | wb-sitelinks | 99       |         99 |
|   28674 | wb-sitelinks | 99       |         99 |
|   43781 | wb-sitelinks | 99       |         99 |
|    6406 | wb-sitelinks | 99       |         99 |
|    6151 | wb-sitelinks | 99       |         99 |
|   28679 | wb-sitelinks | 99       |         99 |
| 6271239 | wb-sitelinks | 99       |         99 |
|   28680 | wb-sitelinks | 99       |         99 |
| 8267529 | wb-sitelinks | 99       |         99 |
|     780 | wb-sitelinks | 99       |         99 |
| 6307853 | wb-sitelinks | 99       |         99 |
| 5741069 | wb-sitelinks | 99       |         99 |
| 8838925 | wb-sitelinks | 99       |         99 |
|   51214 | wb-sitelinks | 99       |         99 |
|   10768 | wb-sitelinks | 99       |         99 |
|   14868 | wb-sitelinks | 99       |         99 |
|   28698 | wb-sitelinks | 99       |         99 |
| 6674203 | wb-sitelinks | 99       |         99 |
|    3614 | wb-sitelinks | 99       |         99 |
|    7710 | wb-sitelinks | 99       |         99 |
+---------+--------------+----------+------------+
20 rows in set (2 min 50.05 sec)

This change got very old and in the meantime we have this information in SPARQL so anyone can retrieve this data. I don't think this is needed any more. I propose to abandon the change and close the task.

Change 181902 abandoned by Thiemo Mättig (WMDE):
Implement a special page to show items with the most sitelinks

https://gerrit.wikimedia.org/r/181902

@thiemowmde: Ricordisamoa was working on this change so I didn't close it yet because I wanted him to have the chance to comment. @Ricordisamoa sorry for the quick close.