Page MenuHomePhabricator

Whitelist OASIS OpenDocument file format
Open, MediumPublicFeature

Description

Author: leercontainer-bugzilla

Description:
Currently (as far as I'm aware) you can upload OpenOffice.org 1.x files, at
least with the extension ".sxw". OpenOffice.org 2.x uses the new OASIS file
format (see link).

The file upload whitelist should be extended to also include at least ".odt"
(writer), and possibly also ".odp", ".odg", ".odb".

OpenOffice-documents are useful for providing presentations and promotions.


Version: unspecified
Severity: enhancement
URL: https://commons.wikimedia.org/wiki/Commons_talk:File_types/Archive_1
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=43151
https://bugzilla.wikimedia.org/show_bug.cgi?id=40504
https://bugzilla.wikimedia.org/show_bug.cgi?id=71954

Details

Reference
bz2089

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:28 PM
bzimport set Reference to bz2089.
bzimport added a subscriber: Unknown Object (MLST).

jeluf wrote:

The trouble with file formats is MSIE. It tries to autodetect the file format of
a file that it downloads. If it *looks* like HTML, MSIE will display it -
executing the JavaScript that is in the file.

To add a file format, there must be a way to check that the file really is using
that format.

iztok.jeras wrote:

I am using OpenOffice.org to create figures on Wikibooks and there are many
figures for a single book. I would like to upload .odg source files for those
images, so that contributors could modify them.

About the MSIE problem, OpenDocument files are compressed (they do not look like
HTML), just unzip them and if there are no errors the file should be OK. If you
are still concerned about the achieve content, they can be scanned for viruses...

It would be great to enable OASIS-fileformats for Commons at least. Storing all
kind of documents that can be updated later would be a great benefit.

I would suggest to *not* allow *any* styled text or presenation files on
commons. text on wmf projects should generally be wikitext. That being said,
having presentations and promotional material in those formats may make sense
for meta, wikimediafoundation.org, wikimania sites, etc.

just my 2p

  • Bug 9127 has been marked as a duplicate of this bug. ***

christof.hahn wrote:

I'm a Author form Wikibooks Germany. In the last two months I start a project
for schoolbooks. So the intent for this Project is to develop materials for
teachers and pupils. So what we need is the support of the ODF-Format and a
place where we can upload every raw material in every format where teacher can
spend her existing learn materials. And of the other side we need the
possibility to upload 7-Zip to bundle learning material.

jeroenvrp wrote:

I think it's a good idea to disallow it on commons, but enable those file-formats by default and let the e.g. the wikipedia-projects by themselves decide if they want to allow those formats on their projects.

Also don't forget the .ods-files (spreadsheets).

Jeroenvrp

mail wrote:

In includes/mime.types the line

application/zip zip jar xpi sxc stc sxd std sxi sti sxm stm sxw stw

must be changed to

application/zip zip jar xpi sxc stc sxd std sxi sti sxm stm sxw stw odt ods odp odg odf

This really should be the default MediaWiki configuration. Not being able to upload the only standardised Office file format to the most common Wiki software is kind of strange...

Just a note -- the old StarOffice formats were disabled some time ago. OpenOffice (ODF) formats are enabled on our private/internal wikis, but not on the general wikis or in the MediaWiki default configuration.

An additional note is we have no current way to validate uploaded files as being ODF.

rmh wrote:

(In reply to comment #9)

Just a note -- the old StarOffice formats were disabled some time ago.
OpenOffice (ODF) formats are [...]

Note that the division is not really StarOffice/OpenOffice. Both were using the old formats before, and use OpenDocument now (along with a lot of other apps, since that was the point of standarising it).

An additional note is we have no current way to validate uploaded files as
being ODF.

As long as the check is filename-based, this problem isn't introduced by adding "odt ods odp odg odf" to the list, since you can pass any kind of ZIP file as *.zip already.

(In reply to comment #9)

An additional note is we have no current way to validate uploaded files as
being ODF.

That's not a reason for not enabling odf, we don't really validate many types (bug 10823)
<spam>I am independently validating commons uploads at #commons-image-uploads2 (it isn't
so hard)</spam> and by allowing odf (at big projects), it wouldn't be that hard,
just like pdfs: you need to manually review all of them and delete almost everyone.

Currently allowed formats on Commons are:

png, gif, jpg, jpeg, xcf, pdf, mid, ogg, svg, djvu

I'm fairly certain we do at least magic-number signature validation on all of those now. PNG, GIF, and JPEG are run through a simple header sanity check. SVG is checked for XML well-formedness. DJVU is I believe checked for metadata validity, though I don't recall the details.

reschke.michael wrote:

Well, at the German Wikiversity we would need OASIS-files to upload editable documents and presentations. OASIS-Files at Commons would make our work much easier.

rmh wrote:

You can use odt2txt (http://stosberg.net/odt2txt/) for validation:

$ odt2txt hello.odt

Hello

$ echo $?
0
$ zip test.zip hello.odt
updating: hello.odt (deflated 19%)
$ odt2txt test.zip
Can't read from test.zip: Is it an OpenDocument Text?
$ echo $?
1

It appears to work with the other types as well:

$ odt2txt hello.odp

Hello

This is a text

HTH

michael.frey wrote:

(In reply to comment #15)

You can use odt2txt (http://stosberg.net/odt2txt/) for validation:

That does only verify that there is text, but it doesn't warn for macros or other files that are included but not relate in the odt file.

(Else some could have the genius idea to upload the odt file that contain a macro virus or contain pictures with forbidden content and use the WMF servers to share them. Users that know the hidden content can simply rename and extract the file and get so the hidden content, other users don't see it and think it's a normall text, but also get the pictures withforbidden content.)

rmh wrote:

(In reply to comment #16)

Or someone could use a program featuring steganography techniques (http://en.wikipedia.org/wiki/Steganography#Implementations) to embed forbidden content in a PNG.

As for the macro virus, proper sandboxing is expected to be present. If it isn't, that's an implementation bug.

ingo.thies wrote:

(In reply to comment #4)

I would suggest to *not* allow *any* styled text or presenation files on
commons. text on wmf projects should generally be wikitext. That being said,
having presentations and promotional material in those formats may make sense
for meta, wikimediafoundation.org, wikimania sites, etc.

Please keep in mind that OpenOffice.org file types also include spreadsheets (*.ods) that can be used not only for presentation but also as an interactive calculation tool. The author defines a "user area" within a sheet where the user can enter parameters based on which calculations on a scientific topic is done. For example, you can write a sheet that calculates, tabulates and/or plots the pressure, temperature and density of the atmosphere in a user-defined altitude for the standard atmosphere, or orbital parameters of satellites or any other kind of scientific or technical stuff. In contrast to most other file types (and as far as I know all file types currently allowed on Commons) spreadsheets can be used *interactively*, which can be a great improvement for many science-related Wikipedia articles. Furthermore, I do not really see a reason for *not* allowing any styled context. The existence of wikitext IMHO does not strictly imply all other text formats being invalid. Please also remember that ODF is now an ISO standard.

Therefore I would strongly suggest to allow Open Dodument Format in general, but at least Open Document Sheets (*.ods).

rmh wrote:

(In reply to comment #18)

Please keep in mind that OpenOffice.org file types [...]

Please, try to avoid confusing ODF with OpenOffice.org. There are many applications supporting ODF independently, and OpenOffice.org is just one of them (see http://boycottnovell.com/2008/01/20/odf-is-not-openoffice-org/).

[...]. Please also
remember that ODF is now an ISO standard.

which unfortunately doesn't mean much anymore. Even OOXML which not even Microsoft themselves (http://www.fanaticattack.com/2008/ooxml-questions-microsoft-cannot-answer-in-geneva.html#comment-220) have implemented can get its own ISO stamp.

IMHO, what's important is that any vendor can implement ODF, and the wide availability of ODF support in applications:

http://en.wikipedia.org/wiki/OpenDocument_software#Current_support

ingo.thies wrote:

(In reply to comment #19)

Please, try to avoid confusing ODF with OpenOffice.org. There are many
applications supporting ODF independently, and OpenOffice.org is just one of
them (see http://boycottnovell.com/2008/01/20/odf-is-not-openoffice-org/).

You are right, I sometimes mix them up, because I am using ODF mainly via OpenOffice.org.

IMHO, what's important is that any vendor can implement ODF, and the wide
availability of ODF support in applications:

http://en.wikipedia.org/wiki/OpenDocument_software#Current_support

That's fully true. But as mentioned above, the major benefit for Wikipedia (where formatted content seems to be frowned upon unless the format is Wikitext and Wikitable etc.) would be the ability of interactive use at least for Open Document Spreadsheets (ODS). Allowing the upload of self-written source codes in common programming languages would also have the effect of allowing interactivity, but ODS allows interactivity in a very transparent and easy-to-use way. The following example (a zipped Excel spreadsheet, however) might explain what I mean:

http://nuclearweaponarchive.org/Library/Nukexls.zip

Such sheets, also including graphs, could be used for an interactive illustration of (not only) many physical and techical topics without forcing the user to type the formulas by him/herself.

cormaggio wrote:

As mentioned above, having greater possibility for interactivity in files would greatly benefit Wikiversity. Particularly for presentations, but also for image files, spreadsheets (data), and others. On opposition to this proposal, are there fears around certain formats on certain sites? If so, perhaps projects could draw up a list of filetypes which would be useful, and provide a rationale for them to be (selectively) whitelisted.

robert wrote:

(In reply to comment #21)

As mentioned above, having greater possibility for interactivity in files would
greatly benefit Wikiversity. Particularly for presentations, but also for image
files, spreadsheets (data), and others. On opposition to this proposal, are
there fears around certain formats on certain sites? If so, perhaps projects
could draw up a list of filetypes which would be useful, and provide a
rationale for them to be (selectively) whitelisted.

The main issue is that OASIS files can contains malicious content. Letting these be uploaded without validation would be undesirable, and as of yet (AFAIK) there is no OASIS validation interface for MediaWiki.

Can you elaborate what malicious content do you refer? Zip files being uploaded as odf? Documents with embedded macros?

robert wrote:

(In reply to comment #23)

Can you elaborate what malicious content do you refer? Zip files being uploaded
as odf? Documents with embedded macros?

Macros are the main issue, they are XML so running it through a basic XML parser would eliminate any Zip issue.

robert wrote:

(In reply to comment #24)

Macros are the main issue, they are XML so running it through a basic XML
parser would eliminate any Zip issue.

Ignore the zip bit, they can be compressed -- as I have just found out.

Seems macros are at <script> elements (<text:script>, <office:script>...) so doesn't look too hard.

mandavi wrote:

Sun published the ODF Validator. It "is a tool that validates OpenDocument files and checks them for certain conformance criteria." That sounds like the tool we need.

rmh wrote:

Unfortunately, with ISO's downfall in the IT sector, being an ISO standard is become less and less meaningful. I'm removing the "ISO" bits from bug title (which IIRC I added myself a while ago).

lars wrote:

In a posting to wikitech-l, Brion Vibber elaborated on what's needed in an ODF validator,
http://lists.wikimedia.org/pipermail/wikitech-l/2008-November/040246.html

Brion said:

we have a basic file type check to confirm
that the file really thinks its an ODF of the appropriate extension, but
not yet checks to confirm there's not evil Java classes also sitting in
the ZIP etc.) [...]

There's an optional zip extension for PHP which should include support
for listing out the ZIP file directory; however since this isn't
included in PHP by default it might be nice to be able to read the
directory independently without the extension for general MediaWiki
installs. (It shouldn't be necessary to actually decompress anything for
our purposes here -- we're mainly looking for subfiles not expected in
an ODF, particularly Java classes that could be used for a session attack.)

Yes, please.

Asking people to use a secondary file-hosting system for materials they are uploading for use with wikiversity or wikibooks projects is embarrassing, and gets moreso every year.

People who are trying to use Commons (for classes or other collaborative-knowledge projects) commonly work with these standard formats; asking them to convert things to/from PDF is quite difficult considering the scarcity of freely-licensed PDF-editing tools.

This bug may have been fixed in the mean time (In particular, I had the impression that Tim did work on allowing zip based formats to be uploaded safely). Tagging testme. [Note comment 31: Fixing this bug, and enabling on Wikimedia are two different things].

Then again bug 35607 appears to suggest our support for open doc is broken (?).

  • Bug 46977 has been marked as a duplicate of this bug. ***

Thanks for including my request for enabling ODF upload for German Wikiversity.

Could you please indicate whether we are running any chance to have ODF upload implemented in the near future?

I would like to hand on the message to the German Wikiversity community ASAP.

Thx!

(In reply to comment #34)

Could you please indicate whether we are running any chance to have ODF
upload
implemented in the near future?

No. (Although I'd like to say the contrary.)

See the links from http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/059837.html

And the discussion at https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2012/03#Enabling_upload_of_ZIP_types.2C_such_as_MS_Office_or_OpenOffice

It was stated that, with the resolution of bug 24230, « Uploads of ZIP types, such as MS Office or OpenOffice can now be safely enabled. A ZIP file reader was added which can scan a ZIP file for potentially dangerous Java applets. This allows applets to be blocked specifically, rather than all ZIP files being blocked. »

I have asked the question in several places and answers are both unclear and sometimes contradictory. Some have pointed out that concerns lie still with:

  • Potential embedded macros
  • Validation that it is actually ODF

Are these concerns valid? If not, what is missing to allow ODF upload on projects?

If nobody comes up with concrete concerns, would it be a valid proposal to just try and see how it goes, fixing problems as they come up?

PDF files are not exempt from problems either, we often have some with viruses; but most of them are deleted quickly and the others we found thanks to your.org running antivirus software on their copy.

(In reply to comment #36)

See the links from
http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/059837.html

And the discussion at
<https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2012/
03#Enabling_upload_of_ZIP_types.2C_such_as_MS_Office_or_OpenOffice>

It was stated that, with the resolution of bug 24230, « Uploads of ZIP types,
such as MS Office or OpenOffice can now be safely enabled. A ZIP file reader
was added which can scan a ZIP file for potentially dangerous Java applets.
This allows applets to be blocked specifically, rather than all ZIP files
being
blocked. »

I have asked the question in several places and answers are both unclear and
sometimes contradictory. Some have pointed out that concerns lie still with:

  • Potential embedded macros
  • Validation that it is actually ODF

Are these concerns valid? If not, what is missing to allow ODF upload on
projects?

The zip reader prevents someone from uploading an ODF file that's really a java archive, which was a pretty big security vulnrability. (It also would prevent those hacks where people make combined ODF/PDF files).

It does not prevent embedded macros, nor does it validate the file is an ODF file (beyond some very superficial checks. It would prevent someone from accidentally uploading another format. It would not prevent someone intentionally uploading a non-odf format that they've tweaked to slightly look like an ODF file)

Whether or not this is an acceptable situation (I consider the macro virus possibility a little scary. Platonides suggestion in comment 26 may be something we should look into) is probably a matter that's up to debate. I've cc'd Chris Steipp, as he probably has some thoughts on this, and would probably have the final word on if ODF upload is acceptable.

The major threats I'm most concerned with are these attachments opening up and xss by causing the browser to think it's html, java applet, swf, etc.

So if it correctly unzips to something that validates as an odf, and the binary is checked to make sure sniffing wont think it's html, or another mime type, then we can probably enable this. Bawolff, could you confirm that's what it does?

The macro / embedded virus threat is definitely a danger to our users, but we currently do not scan incoming binaries (as Nemo pointed out, we have plenty of pdfs with hostile code already).

(In reply to comment #39)

The major threats I'm most concerned with are these attachments opening up
and
xss by causing the browser to think it's html, java applet, swf, etc.

So if it correctly unzips to something that validates as an odf, and the
binary
is checked to make sure sniffing wont think it's html, or another mime type,
then we can probably enable this. Bawolff, could you confirm that's what it
does?

I believe that is correct.

OSM added odp a minute ago: https://trac.openstreetmap.org/ticket/3323#comment:2
So, there are no blockers left here AFAICS, but we also have a guinea pig.

So per Chris & Brian comments, there are no security concerns blocking this?

If so, the "editorial" question is left on whether we want this on Wikimedia Commons or on Meta. I am more than willing to open a discussion over there, if that’s the last thing missing.

At a MediaWiki community level this looks like a consensus, yes. This report is at a point where no technical/legal obstacles are left.

Now, about deployment in Wikimedia... This is a Commons discussion. There is the right place to decide whether OpenDocument files belong to their domain (just like PDFs) or not. If they agree, then the related file extension can be enabled there. If they disagree... we can meet here again to discuss the next step.

Does this make sense? If so, can someone familiar with the Commons project share the news there, please?

(In reply to comment #42)

I am more than willing to open a discussion over there,
if
that’s the last thing missing.

Jean-Fred, per Quim it is, so please go ahead, yes.

(In reply to comment #45)

(In reply to comment #42)

I am more than willing to open a discussion over there,
if
that’s the last thing missing.

Jean-Fred, per Quim it is, so please go ahead, yes.

Good. I’ll open a community consultation soonish.

What's concretely the configuration setting needed here?

(In reply to Jean-Fred from comment #46)

Good. I’ll open a community consultation soonish.

Ping.

(In reply to dacuetu from comment #47)

What was the result of this?

Consensus for OpenDocument has always been given for granted: in the innumerable discussions about it I don't recall ever finding an opposer. There are for instance a dozen supporters just in https://commons.wikimedia.org/wiki/Commons_talk:File_types/Archive_1 (several sections; nobody bothered +1 what was obvious).

Also, https://commons.wikimedia.org/wiki/Commons:Project_scope/Allowable_file_types is apolicy and states the formats are allowed by policy, just blocked on technical reasons. «SXW, SWC, SXD, and SXI (OpenOffice.org 1.x), as well as ODT, ODS, ODG, and ODP (OpenDocument) are theoretically permissible. Marking this shell; more discussion is always possible but not necessary.

(In reply to Nemo from comment #48)

What's concretely the configuration setting needed here?

(In reply to Jean-Fred from comment #46)

Good. I’ll open a community consultation soonish.

Ping.

Thanks for the ping, I completely forgot about this. This is now opened at https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals#Support_for_OpenDocument_file_format_upload

(In reply to dacuetu from comment #47)

What was the result of this?

Consensus for OpenDocument has always been given for granted: in the
innumerable discussions about it I don't recall ever finding an opposer.
There are for instance a dozen supporters just in
https://commons.wikimedia.org/wiki/Commons_talk:File_types/Archive_1
(several sections; nobody bothered +1 what was obvious).

Hmmmm, indeed. Well, let's make it super clear :)

So the result was 'interested, but no consensus' due to the need to have media preview for this format and concerns that these uploaded documents may contain

  • macros/scripts, which may be malicious
  • embedded typefaces, which may be non-free

https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals/Archive/2014/07#Support_for_OpenDocument_file_format_upload

In that discussion, PDF with OpenDocument embedded was raised as a bug and possible way forward, as we already have PDF preview support, so I have created bug 71954 for that.

We will also need bugs for detecting macros/scripts and embedded typefaces, and bug 17497 probably needs to be solved first.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM