Page MenuHomePhabricator

Can't upload PDF / ODF Hybrid file created by LibreOffice: "The file is a corrupt or otherwise unreadable ZIP file."
Open, LowPublic

Description

Author: fun-stuff

Description:
LibreOffice supports exporting PDFs in a hybrid ODF / PDF file format:

When trying to upload such a file MediaWiki reports that the ZIP file is ambiguous or has been damaged. (I have a German installation so I can't tell you the exact error message.)

I already took ZIP files out of the MediaWiki Blacklist and added the file extensions PDF, ODT and ZIP.

I think includes/ZIPDirectoryReader.php checks the file and throws out the error cause it doesn't know the new file format yet.

The new PDF / ODF hybrid format makes it easy to open documents for everyone while maintaining the possibility to edit them which might also be a great thing for Wikipedia. Therefore, this is a major bug for me.
Please fix this and thanks for the great software.

Tobias


Version: 1.18.x
Severity: normal

Details

Reference
bz28188

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:29 PM
bzimport set Reference to bz28188.
bzimport added a subscriber: Unknown Object (MLST).

Can you attach a sample file or provide a link to it?

fun-stuff wrote:

Test PDF/ODF (ODT) Hybrid Document

Can be edited with LibreOffice 3.3 Writer and viewed with any PDF viewer. But it cannot be uploaded in MediaWiki 1.18alpha.

Attached:

The cause is "ZipDirectoryReader: Fatal error: trailing bytes after the end of the file comment".

In simple words, we expect zip files to be... zip files and not contain something scary. We need to hack our detector to handle zips embedded in something known.

fun-stuff wrote:

Thanks for clarification und for taking care of the problem so quickly. I hope you can fix this bug in the near future.

From comments in triage:

"Workaround: 'don't save your PDF that way'. (Problem with workaround: if someone else made the file, you might not know how to re-save it.)"

So, we thought about dealing with it: "This presents same security threats as a PDF file.... need to check security model, probable threats."

"Our security checks are working as intended by detecting that the files have been smashed together unexpectedly. Might be possible to tweak it to consider 'oh that's ok' but not sure how much we want to. If not careful might accidentally allow all sorts of evil appended to a PDF file."

fun-stuff wrote:

Thanks for the comments.

I can imagine that deciding whether this is an 'OK' PDF file saved as hybrid ODF or not is difficult to code. However, I think it would be a great loss if this wasn't implemented as this format is so versatile.

dovijacobs wrote:

Hi, I asked about this problem here (and was referred to this bug):
http://commons.wikimedia.org/wiki/Commons:Village_pump#Uploading_embedded_PDFs_created_through_LibreOffice

The embedded PDF is an extremely useful file format, and one of the best features in the open source LibreOffice project. It is becoming extremely popular and is already being used in hundreds of millions of files around the world.

Therefore, I'd like to reiterate the comment before mine, which was made nearly two years ago: "I think it would be a great loss if this wasn't implemented as this format is so versatile."

If that was true two years ago, it is far more true today. I hope it can be made a basic part of PDF support in Wikimedia projects.

dovijacobs wrote:

In the meantime I've been uploading classic texts and educational materials at Internet Archive instead of at the Commons:
http://commons.wikimedia.org/wiki/Category:Talmud_(digital_text_vowelized_and_formatted)

This is extremely inconvenient for proper use at Wikimedia projects. I hope this will be taken care of eventually.

Aklapper lowered the priority of this task from Medium to Low.Mar 1 2017, 2:23 PM

Still an issue in 1.29:
Created a hybrid PDF with libreoffice-calc-5.2.6.1:

$:acko\> gvfs-info test.pdf 
attributes:
  standard::content-type: application/pdf

Opens fine in ghostscript-9.20-6.

Trying to upload that file via https://test2.wikipedia.org/wiki/Special:Upload I get "The file is a corrupt or otherwise unreadable ZIP file. It cannot be properly checked for security."

Aklapper renamed this task from Can't upload PDF / ODF Hybrid to Can't upload PDF / ODF Hybrid file created by LibreOffice: "The file is a corrupt or otherwise unreadable ZIP file.".Mar 1 2017, 2:24 PM