Page MenuHomePhabricator

Make a tool to find images and categories that fell out of the monument category tree
Open, LowestPublic

Description

For every source we have a main category, a tracker template and a tracker category. I'll take the Netherlands as an example:

Every image and category that is tagged with the tracker template should also be somewhere in the tree under the main category (excluding the tracker category). This can go down several steps:

The tool should report for which categories and for which images this is not the case so it can be fixed.


Version: unspecified
Severity: normal

Details

Reference
bz54323

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:01 AM
bzimport set Reference to bz54323.
bzimport added a subscriber: Unknown Object (MLST).

The query to get this is quite horrible:

SELECT CONCAT('*[[:{{subst:ns:', tracked.page_namespace, '}}:', tracked.page_title, ']]') FROM page AS tracked
JOIN categorylinks AS trackcat ON tracked.page_id=trackcat.cl_from

WHERE
(page_namespace=6 OR
page_namespace=14)
AND page_is_redirect=0
AND cl_to='Rijksmonumenten_with_known_IDs'
AND NOT EXISTS(
SELECT * FROM categorylinks AS c1
JOIN page AS p1 ON c1.cl_to=p1.page_title

JOIN categorylinks AS c2 ON p1.page_id=c2.cl_from
JOIN page AS p2 ON c2.cl_to=p2.page_title

JOIN categorylinks AS c3 ON p2.page_id=c3.cl_from
JOIN page AS p3 ON c3.cl_to=p3.page_title

JOIN categorylinks AS c4 ON p3.page_id=c4.cl_from
JOIN page AS p4 ON c4.cl_to=p4.page_title

JOIN categorylinks AS c5 ON p4.page_id=c5.cl_from
JOIN page AS p5 ON c5.cl_to=p5.page_title

JOIN categorylinks AS c6 ON p5.page_id=c6.cl_from
JOIN page AS p6 ON c6.cl_to=p6.page_title

JOIN categorylinks AS c7 ON p6.page_id=c7.cl_from
JOIN page AS p7 ON c7.cl_to=p7.page_title

WHERE
tracked.page_id=c1.cl_from

AND NOT c1.cl_to='Rijksmonumenten_with_known_IDs'
AND NOT p1.page_title='Rijksmonumenten_with_known_IDs'
AND p1.page_namespace=14
AND p1.page_is_redirect=0

AND NOT c2.cl_to='Rijksmonumenten_with_known_IDs'
AND NOT p2.page_title='Rijksmonumenten_with_known_IDs'
AND p2.page_namespace=14
AND p2.page_is_redirect=0

AND NOT c3.cl_to='Rijksmonumenten_with_known_IDs'
AND NOT p3.page_title='Rijksmonumenten_with_known_IDs'
AND p3.page_namespace=14
AND p3.page_is_redirect=0

AND NOT c4.cl_to='Rijksmonumenten_with_known_IDs'
AND NOT p4.page_title='Rijksmonumenten_with_known_IDs'
AND p4.page_namespace=14
AND p4.page_is_redirect=0

AND NOT c5.cl_to='Rijksmonumenten_with_known_IDs'
AND NOT p5.page_title='Rijksmonumenten_with_known_IDs'
AND p5.page_namespace=14
AND p5.page_is_redirect=0

AND NOT c6.cl_to='Rijksmonumenten_with_known_IDs'
AND NOT p6.page_title='Rijksmonumenten_with_known_IDs'
AND p6.page_namespace=14
AND p6.page_is_redirect=0

AND NOT c7.cl_to='Rijksmonumenten_with_known_IDs'
AND NOT p7.page_title='Rijksmonumenten_with_known_IDs'
AND p7.page_namespace=14
AND p7.page_is_redirect=0

AND (c1.cl_to='Rijksmonumenten'
OR c2.cl_to='Rijksmonumenten'
OR c3.cl_to='Rijksmonumenten'
OR c4.cl_to='Rijksmonumenten'
OR c5.cl_to='Rijksmonumenten'
OR c6.cl_to='Rijksmonumenten'
OR c7.cl_to='Rijksmonumenten')

) /* ORDER BY tracked.page_namespace DESC, tracked.page_title ASC */
LIMIT 500;

I used this to create the report at https://commons.wikimedia.org/wiki/Category_talk:Rijksmonumenten#Categories_not_in_the_tree

Aklapper lowered the priority of this task from Low to Lowest.Dec 29 2014, 12:35 AM