Page MenuHomePhabricator

harvest_template should ignore placeholder images
Open, LowPublic

Description

When importing images with harvest_template.py, there are often various placeholder images. Bot shouldn't import them

https://commons.wikimedia.org/wiki/Category:Image_placeholders

Usually there are few images in one group of articles, so parameter -ignore:file:placeholder.svg,file:placeholder2.svg should help.

Better is to check, if bot checks image categories for these strings:

placeholders, 'SVG flags of missing‎', ' Replace this image', 'Fully transparent images' ..


Version: core-(2.0)
Severity: major

Details

Reference
bz69286

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:30 AM
bzimport set Reference to bz69286.
bzimport added a subscriber: Unknown Object (????).
jayvdb set Security to None.
jayvdb removed a subscriber: Unknown Object (????).
Xqt triaged this task as Low priority.Jan 13 2019, 1:44 PM
Xqt added a subscriber: Multichill.

For the category option, it gets tricky because the bot would have to resolve all subdirectories of unknown depth. Also there are a lot of files in that category, PetScan is showing 2.5 thousand files. Even if just the names were stored, that's a decent amount of pictures.