Page MenuHomePhabricator

False positives for MEDIATYPE_VIDEO due to looking for string theora in (audio) ogg files
Open, MediumPublic

Description

Edit: The specific example below is fixed, but there might be other false positives.

Steps to reproduce:
https://commons.wikimedia.org//w/api.php?action=query&prop=imageinfo&format=xml&iiprop=mediatype&titles=File%3APresident%20Obama%20Speaks%20to%20the%20Muslim%20World%20from%20Cairo%2C%20Egypt%20%28audio%29.ogg

Actual Result:
<imageinfo><ii mediatype="VIDEO"/></imageinfo>

Expected Result:
<imageinfo><ii mediatype="AUDIO"/></imageinfo>

Note that TMH returns the correct value for the file in question:
https://commons.wikimedia.org/wiki/File:President_Obama_Speaks_to_the_Muslim_World_from_Cairo,_Egypt_%28audio%29.ogg

Details

Reference
bz63584

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:17 AM
bzimport set Reference to bz63584.
bzimport added a subscriber: Unknown Object (MLST).

Wow is the code to determine media type ever hacky...

We determine if the media type is VIDEO, by seeing if the word theora appears in the first 256 bytes of the file (MimeMagic::getMediaType). In this case, the file begins with:

00000000 4f 67 67 53 00 02 00 00 00 00 00 00 00 00 ea 37 |OggS...........7|
00000010 77 5a 00 00 00 00 32 03 54 97 01 1e 01 76 6f 72 |wZ....2.T....vor|
00000020 62 69 73 00 00 00 00 02 44 ac 00 00 00 00 00 00 |bis.....D.......|
00000030 80 38 01 00 00 00 00 00 b8 01 4f 67 67 53 00 00 |.8........OggS..|
00000040 00 00 00 00 00 00 00 00 ea 37 77 5a 01 00 00 00 |.........7wZ....|
00000050 c2 13 86 6c 0f 4b ff ff ff ff ff ff ff ff ff ff |...l.K..........|
00000060 ff ff ff a9 03 76 6f 72 62 69 73 1d 00 00 00 58 |.....vorbis....X|
00000070 69 70 68 2e 4f 72 67 20 6c 69 62 56 6f 72 62 69 |iph.Org libVorbi|
00000080 73 20 49 20 32 30 30 37 30 36 32 32 01 00 00 00 |s I 20070622....|
00000090 1a 00 00 00 45 4e 43 4f 44 45 52 3d 66 66 6d 70 |....ENCODER=ffmp|
000000a0 65 67 32 74 68 65 6f 72 61 2d 30 2e 32 33 01 05 |eg2theora-0.23..|
000000b0 76 6f 72 62 69 73 21 42 43 56 01 00 00 01 00 18 |vorbis!BCV......|
000000c0 63 54 29 46 99 52 d2 4a 89 19 73 94 31 46 99 62 |cT)F.R.J..s.1F.b|
[...]

So the vorbis comment that the encoder is ffmpeg2theora-0.23 triggers MediaWiki to think the file is a video as the word "theora" appears at the beginning of the file.

Moved to Change-Id Ib39ed06f895624b55d15a001cb0a2cd3129f4bb0

Change 130966 had a related patch set uploaded by Brian Wolff:
Less false positives for MEDIATYPE_VIDEO

https://gerrit.wikimedia.org/r/130966

Just as an aside, on newer ffmepeg2theora, this probably won't happen as the first 255 bytes will be taken up by ogg skeleton.

Change 130966 merged by jenkins-bot:
Less false positives for MEDIATYPE_VIDEO

https://gerrit.wikimedia.org/r/130966

Resetting to new - the above patch helps the situation, but isn't a "proper" fix.

(In reply to Gerrit Notification Bot from comment #6)

Change 130966 merged by jenkins-bot:
Less false positives for MEDIATYPE_VIDEO

https://gerrit.wikimedia.org/r/130966

It was merged last month but did not yet hit commons?

(In reply to Marco from comment #8)

It was merged last month but did not yet hit commons?

It was merged on June 29, which means that it will be deployed to WMF wikis with 1.24wmf12. That version went out to Commons yesterday; see https://www.mediawiki.org/wiki/MediaWiki_1.24/Roadmap for the full schedule.

(In reply to Brad Jorsch from comment #9)

(In reply to Marco from comment #8)

It was merged last month but did not yet hit commons?

It was merged on June 29, which means that it will be deployed to WMF wikis
with 1.24wmf12. That version went out to Commons yesterday; see
https://www.mediawiki.org/wiki/MediaWiki_1.24/Roadmap for the full schedule.

Also it should be noted, it will only apply to new files (or files that you ?action=purge)