Page MenuHomePhabricator

Provide filearchive table with fa_storage_key or, if it exists and is sufficiently indexed and populated, fa_sha1 for commonswiki
Closed, ResolvedPublic

Description

I'd (possibly) like to create a JSON/XML-API that could be asked prior to uploading stuff whether it was uploaded before.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=58993

Details

Reference
bz57697

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:27 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz57697.

I can see several issues with this, not least of which the ability to identify whether an arbitrary file has been uploaded in the past which may have legal implications.

Since this table contains data not normally available to non-administrators (or to any user, when the sha1 is concerned), this will need evaluation from Legal.

What's the difference between this one and 58993? I feel like I'm missing something.

And note that with 58993, I have given legal signoff if that wasn't entirely clear :)

(In reply to Luis Villa (personal-for work use lvilla@wikimedia.org) from comment #3)

What's the difference between this one and 58993? I feel like I'm missing
something.

And note that with 58993, I have given legal signoff if that wasn't entirely
clear :)

bug 58993 is asking for the information to be available to everyone via http://commons.wikimedia.org/w/api.php

This bug is asking for it to be available in the db replicas at tools.wmflabs.org

From a legal perspective, not much different (probably). From a technical perspective, very different areas of Wikimedia, with different groups working on it.

Handing off to Sean as this now has Legal okay for the filearchive table.

Change 137938 had a related patch set uploaded by coren:
Labs: new replication views

https://gerrit.wikimedia.org/r/137938

Change 137938 merged by coren:
Labs: new replication views

https://gerrit.wikimedia.org/r/137938

  • Bug 61813 has been marked as a duplicate of this bug. ***

Thank you. What I, however observe is that fa_sha1-queries are notably slower compared to img_sha1 queries and oi_sha1 queries:

18:38:38 SELECT * FROM commonswiki_p.image WHERE img_sha1="qtexhtbcwt0tnkuxb2wf3xs7d7j761u" LIMIT 0, 1000 1 row(s) returned 0.172 sec / 0.000 sec

18:36:31 SELECT * FROM commonswiki_p.oldimage WHERE oi_sha1="0mpoldytyxspxrdbf44r1kc7m8vtbq67" LIMIT 0, 1000 0 row(s) returned 0.156 sec / 0.000 sec

18:36:07 SELECT * FROM commonswiki_p.filearchive WHERE fa_sha1="0mpoldytyxspxrdbf44r1kc7m8vtbq67" LIMIT 0, 1000 1 row(s) returned 5.990 sec / 0.000 sec

5.990 sec vs. 0.172 sec is a huge difference. Is something broken with indexing?

18:36:07 SELECT * FROM commonswiki_p.filearchive WHERE fa_sha1="0mpoldytyxspxrdbf44r1kc7m8vtbq67" LIMIT 0, 1000 1 row(s) returned 5.990 sec / 0.000 sec

5.990 sec vs. 0.172 sec is a huge difference. Is something broken with indexing?

See https://phabricator.wikimedia.org/T71088#2338421