Page MenuHomePhabricator

DBQ-182 List of superseded images still used in the article name space
Closed, ResolvedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DBQ-182.
Summary: List of superseded images still used in the article name space
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Hoo man <hoo@online.de>


From: Danhash <danhash@gmail.com>

Date: Mon, 09 Apr 2012 18:41:10

Pasted from WT:Database reports
On the English Wikipedia, the categories "Images made obsolete by a PNG version" and "Wikipedia images available as SVG" contain files which have been superseded. Many of these images have been replaced in articles with their superseded versions, but there are a lot of articles which still need to be updated to use superseded images. I would like to have a list of files from each category that are still used in main space, preferably sorted by the number of articles the file is used in. I can't seem to find a way to do this with automated tools such as AWB or CatScan, but if there is a way I'd be happy to do it myself if possible. These lists would help me with the running of my bot (DanhashBot). Can anybody write and run database queries for these two lists, or explain how I can use an automated tool to compile such a list myself? Thanks!


Version: unspecified
Severity: major

Details

Reference
bz59461

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:32 AM
bzimport set Reference to bz59461.

From: Hoo man <hoo@online.de>

Date: Mon, 09 Apr 2012 20:57:04

Sorry, but I can't help you with either CatScan or AWB, as I'm not firm with those, so I did it in SQL:
SQL:

SELECT image_pages.page_title AS image FROM categorylinks INNER JOIN page AS image_pages ON cl_from = image_pages.page_id INNER JOIN imagelinks ON image_pages.page_title = il_to INNER JOIN page ON il_from = page.page_id WHERE cl_to = 'Images_made_obsolete_by_a_PNG_version' AND page.page_namespace = 0 GROUP BY cl_from HAVING COUNT(*) > 0;

(Replace Images_made_obsolete_by_a_PNG_version with Wikipedia_images_available_as_SVG for the SVGs)

Result:
PNG: http://toolserver.org/~hoo/dbq/dbq-182_1.txt
SVG: http://toolserver.org/~hoo/dbq/dbq-182_2.txt


From: Danhash <danhash@gmail.com>

Date: Tue, 10 Apr 2012 14:04:32

Thanks! I had asked before at WT:database reports with no luck. These lists will be very helpful for my bot task. There was a problem with mojibake with three of the files (they have special characters in their names and showed up incorrectly in the .txt): 451px-Escudo legal de Panamá 2.jpg, Cáncer1EN.png, and AppleChancery1¼FractionExample.png; perhaps character encoding is set incorrectly somewhere?

Could you run this again against the Commons categories PNG version available and Vector version available? There will probably be many more superseded images on Commons still used in English Wikipedia articles.
Thanks!
-danhash


From: Hoo man <hoo@online.de>

Date: Tue, 10 Apr 2012 21:17:49

Well, the files are encoded as UTF-8, but for some reason the browsers aren't identifying that correctly, so you need to set it per hand to see the right file names. And I'm sorry, but as the commons database at the moment isn't available on the server that got enwiki I can't look for commons images (I don't know how long it will take till they fix that, you might want to create a new ticket in a few weeks).


From: Danhash <danhash@gmail.com>

Date: Wed, 11 Apr 2012 14:42:35

Alright thanks! I posted again at WT:Database reports and gave a link to your query and am trying to get it set up as a regularly run report. Do you have a bugzilla/jira/blog post/whatever link to the issue you mentioned?
-danhash


From: Hoo man <hoo@online.de>

Date: Wed, 11 Apr 2012 14:50:25

By the time https://jira.toolserver.org/browse/MNT-1227 is resolved commons will be usable again (as rosemary got commons) ![][1]

[1]: https://jira.toolserver.org/images/icons/emoticons/wink.gif

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: hoo@online.de
CC list: hoo@online.de, danhash@gmail.com