Page MenuHomePhabricator

DBQ-201 Find Wikimedia Commons files without license
Closed, ResolvedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DBQ-201.
Summary: Find Wikimedia Commons files without license
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Tim.Landscheidt <tim@tim-landscheidt.de>


From: Jarek Tuszynski <jaroslaw.w.tuszynski@saic.com>

Date: Tue, 12 Mar 2013 19:41:37

Could someone run a query to find images without any of the following templates:
License template tag
PD-Layout
GNU-Layout
CC-Layout
no license
Delete
Speedydelete
The resulting files are not transcluding any of the standard licenses and are not labeled as such. In case there is a large number of those we can limit the search to 10k files, for now. I tested the query on smaller sets of files using CatScan2, which unfortunately does not have number of files limit and times out.


Version: unspecified
Severity: major

Details

Reference
bz59483

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:34 AM
bzimport set Reference to bz59483.

From: Jarek Tuszynski <jaroslaw.w.tuszynski@saic.com>

Date: Tue, 02 Apr 2013 13:17:20

In the mean time while waiting for this query I run several smaller queries with CatScan2 within medium size categories. I identify so far ~3.5k files and added them to http://commons.wikimedia.org/wiki/Category:Media_without_a_license:_needs_history_check for processing. So ideally the query would look for all the files missing the above list of templates and which are not in the Category:Media_without_a_license:_needs_history_check.


From: Jarek Tuszynski <jaroslaw.w.tuszynski@saic.com>

Date: Wed, 22 May 2013 19:44:48

the query would be

select /* SLOW_OK */ page_title from page where page_is_redirect=0 and page_namespace=6 and not exists (select * from templatelinks where tl_from=page_id and tl_namespace=10 and tl_title in ("License_template_tag","PD-Layout","GNU-Layout","CC-Layout","No_license","Delete","Speedydelete") limit 1 )


From: Tim.Landscheidt <tim@tim-landscheidt.de>

Date: Wed, 22 May 2013 22:05:45

Run on Tools.


From: Jarek Tuszynski <jaroslaw.w.tuszynski@saic.com>

Date: Thu, 23 May 2013 13:03:24

Thanks a lot. Now I just have to process the results ![][1]

[1]: https://jira.toolserver.org/images/icons/emoticons/smile.gif

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: tim@tim-landscheidt.de
CC list: jaroslaw.w.tuszynski@leidos.com, tim@tim-landscheidt.de