Page MenuHomePhabricator

robots.txt for upload.wikimedia.org to noindex archive images
Closed, ResolvedPublic

Description

User:Docu suggests to add a robots.txt to upload.wikimedia.org to prevent the indexing of http://upload.wikimedia.org/wikipedia/commons/archive/*


Version: unspecified
Severity: enhancement
URL: http://upload.wikimedia.org/robots.txt

Details

Reference
bz24319

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:04 PM
bzimport set Reference to bz24319.
bzimport added a subscriber: Unknown Object (MLST).

test5555 wrote:

It's now archived at [[Commons:Commons:Village_pump/Archive/2010Jul#http:.2F.2Fupload.wikimedia.org.2Frobots.txt]]. No issues or concerns were raised.

User-agent: *
Disallow: /wikipedia/commons/archive/

would probably be sufficient for robots.txt, but I'd rather leave this technicality to a specialist.

I'm just curious, what would the benefit of this be/why do we want to do that. Any images we don't want others to see would be actually deleted and not indexable. (I'm not opposed to doing this, I'm just curious as to the why).

It could potentially make it so if you viewed an image page in the archive.org archive you wouldn't be able to see the image history section (which to be honest, isn't really very important).

Hmm, I just implemented this, having opened the bug before the last comment was made. If you folks change your mind, please reopen and let me know.

test5555 wrote:

Thanks for implementing this.

@Bawolff: there is rarely anything useful in history and Wikimedia itself provides no way to display these images.