Page MenuHomePhabricator

Adding the sha1 to the imageinfo query
Closed, ResolvedPublic

Description

Author: Bryan.TongMinh

Description:
Now that the sha1 hashes of the images are available in the database, it would be handy to also expose them through the api. I have attached a trivial patch.


Version: 1.11.x
Severity: enhancement

Details

Reference
bz11115

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:52 PM
bzimport set Reference to bz11115.

Bryan.TongMinh wrote:

Patch

As above

Attached:

Bryan.TongMinh wrote:

Some characters are not allowed in XML and I think that we need to hexlify the hash before returning it.

robchur wrote:

A SHA-1 hash *is* a hexadecimal number, and is thus alphanumeric.

Even if it weren't, assuming the API isn't insane, it'll be passing all values through the standard XML functions (see includes/Xml.php), or else using something sensible to generate valid XML, so this wouldn't be an issue.

(In reply to comment #3)

Even if it weren't, assuming the API isn't insane, it'll be passing all values
through the standard XML functions (see includes/Xml.php), or else using
something sensible to generate valid XML, so this wouldn't be an issue.

I haven't looked at the imageinfo code, but I dare say our XML (and other output formats, FTM) handling is sane. I'll test this patch tomorrow, and if it works, I'll commit it.

For some insane reason, the DB stores SHA1 hashes in base 36. I understand base 64 may not have been feasible due to uppercase/lowercase issues, but base 32 makes a lot more sense to me than 36. Also, storing it in hex (much less trouble) costs only 8 bytes per image (a negligible difference considering the images themselves range from 10 KB to 10 MB) and eliminates the wfBaseConvert() overhead. Would this be a sensible schema change?

Either way, it's fixed in r25456. Thanks to Bryan for the patch, which needed very little work.

I used a base-36 version of the hash keys for *filenames* in the deletion archive repository because they're more compact, and thus less unfriendly for URLs.

Not sure if they're the best thing to use in the DB, but *eh*. Gotta pick something. :)

(In reply to comment #6)

I used a base-36 version of the hash keys for *filenames* in the deletion
archive repository because they're more compact, and thus less unfriendly for
URLs.

Well like I said, the difference between base 36 and base 16 is only 8 bytes per image, and eliminates conversions. But then those conversions don't take up an earth-shattering amount of time either...