Page MenuHomePhabricator

.odb file recognized as application/acad (first type entry in our list)
Closed, ResolvedPublic

Description

fresh OpenDocument database

When uploading on opendocument database, it is recognized as application/acad type.

finfo does not recognize the type using magic.mime which return application/octet-stream. MediaWiki thus tries to guess the file content according to its extension. With upload stash, the temporary file does not have any file extension (the path is something like 'mwrepo://local/temp/4/46/20120104160114!phpFfHK5z.'), hence the detection gives unknown/unknown and then default to application/acad (the first one in our list).

The attached file is an empty opendocument database file. Analyzing it with File::getPropsFromPath() yield :

Array
(

[fileExists] => 1
[mime] => application/vnd.oasis.opendocument.database
[media_type] => OFFICE
[metadata] => 
[sha1] => 5tlzupz0ww3q8w9pkowrdq72g0wnsqa
[width] => 0
[height] => 0
[bits] => 0
[file-mime] => unknown/unknown
[minor_mime] => vnd.oasis.opendocument.database
[major_mime] => application
[size] => 2498

)

Please note how 'mime' is correct but 'file-mime' is incorrect :-(


Version: 1.20.x
Severity: normal

Attached:

Details

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:09 AM
bzimport set Reference to bz33515.
bzimport added a subscriber: Unknown Object (MLST).

A long time later it is detected as application/zip on my machine:

$ php maintenance/shell.php 
>>> $mwProps = new MWFileProps( MediaWiki\MediaWikiServices::getInstance()->getMimeAnalyzer() );
=> MWFileProps {#334}

>>> $mwProps->getPropsFromPath( '/tmp/DB.odb', true );
=> [
     "fileExists" => true,
     "size" => 2498,
     "file-mime" => "application/zip",
     "major_mime" => "application",
     "minor_mime" => "zip",
     "mime" => "application/zip",
     "sha1" => "5tlzupz0ww3q8w9pkowrdq72g0wnsqa",
     "metadata" => "",
     "width" => 0,
     "height" => 0,
     "bits" => 0,
     "media_type" => "ARCHIVE",
   ]
>>> 

Debug log:

[Mime] MimeAnalyzer::loadFiles: loading mime types from /home/hashar/projects/mediawiki/core/includes/libs/mime/mime.types
[Mime] MimeAnalyzer::loadFiles: loading mime info from /home/hashar/projects/mediawiki/core/includes/libs/mime/mime.info
[Mime] MimeAnalyzer::doGuessMimeType: analyzing head and tail of /tmp/DB.odb for magic numbers.
[Mime] MimeAnalyzer::doGuessMimeType: ZIP header present in /tmp/DB.odb
[Mime] MimeAnalyzer::detectZipType: unable to identify type of ZIP archive
[Mime] MimeAnalyzer::guessMimeType: guessed mime type of /tmp/DB.odb: application/zip
[Mime] MimeAnalyzer::improveTypeFromExtension: improved mime type for .odb: application/zip
MediaHandlerFactory::getHandler: no handler found for application/zip.

But we have:

includes/libs/mime/mime.types
application/vnd.oasis.opendocument.database odb

Eventually the issue is in our MimeAnalyzer::detectZipType in includes/libs/mime/MimeAnalyzer.php. There is a list of known open document types but it lacks the base type which is used for databases:

function detectZipType( $header, $tail = null, $ext = false ) {

    $opendocTypes = [
        'chart-template',
        'chart',
        'formula-template',
        'formula',
        'graphics-template',
        'graphics',
        'image-template',
        'image',
        'presentation-template',
        'presentation',
        'spreadsheet-template',
        'spreadsheet',
        'text-template',
        'text-master',
        'text-web',
        'text' ];

    // https://lists.oasis-open.org/archives/office/200505/msg00006.html
    $types = '(?:' . implode( '|', $opendocTypes ) . ')';
    $opendocRegex = "/^mimetype(application\/vnd\.oasis\.opendocument\.$types)/";

    if ( preg_match( $opendocRegex, substr( $header, 30 ), $matches ) ) {
        $mime = $matches[1];
        $this->logger->info( __METHOD__ . ": detected $mime from ZIP archive\n" );

Our code is thus based on a list from 2005 https://lists.oasis-open.org/archives/office/200505/msg00006.html which is much predate the introduction of the database type.

The OASIS Open Document Format version 1.2 states:

3.19 MIME Types and File Name Extensions

Appendix C contains a list of MIME types and file name extensions to be used for office documents that conform to this specification and that are contained in a package. See 3.1.3.

Office documents that conform to this specification but are not contained in a package should use the MIME type text/xml.

Only MIME types and extensions that have been registered according to [RFC4288] should used for office documents that conform to this specification. The MIME types and extensions listed in appendix C should be used where appropriate.

Appendix C list the mime types registered at the time the format got published but otherwise point to the IANA to get the latest list (which is where types are registered based on RFC 4288). The list is available online at https://www.iana.org/assignments/media-types/media-types.xhtml

NameTemplateReference
vnd.oasis.opendocument.chartapplication/vnd.oasis.opendocument.chart[Svante_Schubert][OASIS]
vnd.oasis.opendocument.chart-templateapplication/vnd.oasis.opendocument.chart-template[Svante_Schubert][OASIS]
vnd.oasis.opendocument.databaseapplication/vnd.oasis.opendocument.database[Svante_Schubert][OASIS]
vnd.oasis.opendocument.formulaapplication/vnd.oasis.opendocument.formula[Svante_Schubert][OASIS]
vnd.oasis.opendocument.formula-templateapplication/vnd.oasis.opendocument.formula-template[Svante_Schubert][OASIS]
vnd.oasis.opendocument.graphicsapplication/vnd.oasis.opendocument.graphics[Svante_Schubert][OASIS]
vnd.oasis.opendocument.graphics-templateapplication/vnd.oasis.opendocument.graphics-template[Svante_Schubert][OASIS]
vnd.oasis.opendocument.imageapplication/vnd.oasis.opendocument.image[Svante_Schubert][OASIS]
vnd.oasis.opendocument.image-templateapplication/vnd.oasis.opendocument.image-template[Svante_Schubert][OASIS]
vnd.oasis.opendocument.presentationapplication/vnd.oasis.opendocument.presentation[Svante_Schubert][OASIS]
vnd.oasis.opendocument.presentation-templateapplication/vnd.oasis.opendocument.presentation-template[Svante_Schubert][OASIS]
vnd.oasis.opendocument.spreadsheetapplication/vnd.oasis.opendocument.spreadsheet[Svante_Schubert][OASIS]
vnd.oasis.opendocument.spreadsheet-templateapplication/vnd.oasis.opendocument.spreadsheet-template[Svante_Schubert][OASIS]
vnd.oasis.opendocument.textapplication/vnd.oasis.opendocument.text[Svante_Schubert][OASIS]
vnd.oasis.opendocument.text-masterapplication/vnd.oasis.opendocument.text-master[Svante_Schubert][OASIS]
vnd.oasis.opendocument.text-templateapplication/vnd.oasis.opendocument.text-template[Svante_Schubert][OASIS]
vnd.oasis.opendocument.text-webapplication/vnd.oasis.opendocument.text-web[Svante_Schubert][OASIS]

The first issue is that Mediawiki does not recognize the database type. The second issue is the DB.odb file attached has an header that does not match IANA registration using simply base for the type: mimetypeapplication/vnd.oasis.opendocument.basePKCv$

Wikipedia states on https://en.wikipedia.org/wiki/OpenDocument_technical_specification#Documents :

File TypeExtensionMIME TypeODF Specification
Database.odbapplication/vnd.sun.xml.base[3][4]not defined in ODF 1.0/1.1 specifications; used in OpenOffice.org 2.x
Database.odbapplication/vnd.oasis.opendocument.baseODF 1.2; used in OpenOffice.org 3.x
Database.odbapplication/vnd.oasis.opendocument.databasedefined in IANA registration

I have created a database with LibreOffice 5.2.7 and it uses mimetypeapplication/vnd.oasis.opendocument.base. That matches ODF 1.2 despite not being registered at the IANA.

Change 481649 had a related patch set uploaded (by Hashar; owner: Hashar):
[mediawiki/core@master] Recognizes Open Document Database

https://gerrit.wikimedia.org/r/481649

greg lowered the priority of this task from Medium to Lowest.Apr 15 2019, 4:46 PM
greg removed a subscriber: wikibugs-l-list.

Change 481649 had a related patch set uploaded (by Hashar; owner: Hashar):
[mediawiki/core@master] Recognizes Open Document Database

https://gerrit.wikimedia.org/r/481649

I have rebased the patch. Pending review / +2

hashar changed the task status from Open to Stalled.Jul 9 2019, 11:53 AM

Change 481649 merged by jenkins-bot:
[mediawiki/core@master] Recognizes Open Document Database

https://gerrit.wikimedia.org/r/481649