Page MenuHomePhabricator

Commons uploads: Support spreadsheet data format ODS
Open, LowPublicFeature

Description

There are currently no supported spreadsheet formats on commons. As this is one of the most universal file and data types in the world, and wikitables are both poor substitutes and only usable for data up to a few hundred rows, this poses a problem.

The potential future existence of WikiData does not solve this problem; that will simply be a place to store the cell-level data stored in spreadsheets. We will have even more of a reason to have the original datasets stored on Commons (or on WD itself, if it becomes a shared cross-Project namespace) then.

Support should at least include one of {CSV, ODB}. (See also bug T4089 for the full suite of Office formats.)


Version: 1.21.x
Severity: enhancement
See Also:
T4089: Whitelist OASIS OpenDocument file format

Details

Reference
bz43151

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:48 AM
bzimport set Reference to bz43151.
bzimport added a subscriber: Unknown Object (MLST).

Moving bug to the ContentHandler system.

Why ODB and not ODS?

(In reply to comment #1)

Moving bug to the ContentHandler system.

Really?

(In reply to comment #2)

(...)
(In reply to comment #1)

Moving bug to the ContentHandler system.

Really?

I wonder if the interest to allow upload wouldn't be strongly limited if we couldn't print or edit the spreadsheet.

(In reply to comment #3)

I wonder if the interest to allow upload wouldn't be strongly limited if we
couldn't print or edit the spreadsheet.

ODS upload is already enabled on several or most private wikis because it's extremely useful. CSVs on wiki pages would require much more thought.

(In reply to comment #4)

(In reply to comment #3)

I wonder if the interest to allow upload wouldn't be strongly limited if we
couldn't print or edit the spreadsheet.

ODS upload is already enabled on several or most private wikis because it's
extremely useful. CSVs on wiki pages would require much more thought.

Much more? We are already working towards supporting syntaxes besides WikiText in articles. And text based formats we can make great representations of such as CSVs fit greatly within that.

(In reply to comment #5)

Much more? We are already working towards supporting syntaxes besides
WikiText
in articles. And text based formats we can make great representations of such
as CSVs fit greatly within that.

It's still something *not* available currently.
Moving back to Uploading/File management: please open another bug for the on-wiki handling and display of CSV, thanks.

  1. Only one request per bug report, please. See https://www.mediawiki.org/wiki/How_to_report_a_bug . This report is unfixable as nobody can define when "spreadsheet data formats" are "supported" by MediaWiki.
  2. What is ODB? Links to definitions welcome.
  3. Why would anybody use something as error-prone as CSV (e.g. missing support to define charsets)? If I was a developer I'd close any request to support CSV as WONTFIX. It's a can of worms.

(In reply to comment #7)

  1. Only one request per bug report, please. See

https://www.mediawiki.org/wiki/How_to_report_a_bug . This report is unfixable
as nobody can define when "spreadsheet data formats" are "supported" by
MediaWiki.

  1. What is ODB? Links to definitions welcome.

https://en.wikipedia.org/wiki/OpenDocument

However I vaguely remember comments about security issues involved with permitting the upload of document formats such as this.

  1. Why would anybody use something as error-prone as CSV (e.g. missing

support
to define charsets)? If I was a developer I'd close any request to support
CSV
as WONTFIX. It's a can of worms.

I do not see how lack of charset specification is a bug. Plaintext has the same issue. As do other formats. And where a format does support internal charset specification I usually would not think too highly of how it's done. One typically simply expects everything to be UTF-8 encoded.

(In reply to comment #0)

There are currently no supported spreadsheet formats on commons.

Supported how? This request is extremely ambiguous. Do you just want to upload them? Do you want to be able to thumbnail/display them in some manner? Do you want to be able to reference specific cells from wikitext? Do you want to be able to edit them in-wiki? etc

What do you want to do with the spreadsheets? What are your use cases?

However I vaguely remember comments about security issues involved with
permitting the upload of document formats such as this.

They've been fixed (assuming you're thinking of the zip-java one)

Also ODB is a database format, not a spreadsheet format... (Actually there's many formats named ODB. I'm assuming you mean the open office thing)

This report is unclear enough (plus covers several issues) that I might close it as INVALID soon if it won't see improvements. Also see https://www.mediawiki.org/wiki/How_to_report_a_bug how to report requests.

(In reply to comment #12)

This report is unclear enough (plus covers several issues) that I might close
it as INVALID soon if it won't see improvements.

Made it a request for ODS format. I don't know how secure this is for Commons and if something else has to be fixed in MediaWiki itself.

(In reply to Nemo from comment #13)

(In reply to comment #12)

This report is unclear enough (plus covers several issues) that I might close
it as INVALID soon if it won't see improvements.

Made it a request for ODS format. I don't know how secure this is for
Commons and if something else has to be fixed in MediaWiki itself.

Can we get some concensus here? Provided you want upload but no display, that's probably ok now that the zip bug is fixed (I haven't looked recently, don't quote me on that), but needs approval of commons community.

So next step to resolve (or reject) this bug: Ask commons.

These files are not in scope (https://commons.wikimedia.org/wiki/Commons:Project_scope) so please don't enable this unless you have very clear community consensus in favor of it.

(In reply to Maarten Dammers from comment #15)

These files are not in scope
(https://commons.wikimedia.org/wiki/Commons:Project_scope) so please don't
enable this unless you have very clear community consensus in favor of it.

That's not what https://commons.wikimedia.org/wiki/Commons:Project_scope/Allowable_file_types currently says. If you think it's outdated and you can edit it to reflect what's current consensus that would be great.

Bawolff: yes, upload but not display. Thanks for the next step.

Maarten, can you clarify what you mean by 'not in scope'?

I admit that I don't understand the current process for enabling file types for upload on Commons, though not for lack of trying. When is the last time that a new file type was added? The next step here does seem to be finding community consensus; along with fixing the process for finding that consensus.

(In reply to Dereckson from comment #1)

Moving bug to the ContentHandler system.

Related: https://gerrit.wikimedia.org/r/#/c/160610/

(In reply to SJ from comment #17)

The next step here does seem to be finding
community consensus; along with fixing the process for finding that
consensus.

There is already consensus. In addition to Commons:Scope, a supermajority of users agreed to enable OpenDocument.
https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals/Archive/2014/07#Support_for_OpenDocument_file_format_upload

I'm surprised to see that the closure claimed there was no consensus, anyway it's enough to repropose it with focus on ODP and ODT to start with.

Made it a request for ODS format. I don't know how secure this is for Commons and if something else has to be fixed in MediaWiki itself.

These are zip-based files, so there's the usual problem with similar-but-forbidden zip-based file being distributed, hidden files distributed in zip files, and GIFAR.

These are zip-based files, so there's the usual problem with similar-but-forbidden zip-based file being distributed, hidden files distributed in zip files, and GIFAR.

Sure, but that part has been cleared at T4089#64021.

Really needed. So much stuff that I’d love to put up on Commons in a good format!

8 years after this phab ticket was supported I'd also like this a lot

(Hi, please avoid "me too" comments as they create notifications for everyone. Thanks!)

I'm going to re-up some conversation around this, by observing a few things that have changed quite a bit from 2012. Now that Wikidata is the center of gravity for a lot of what we do, the community is working with more data sets than ever, whether it is for OpenRefine or GLAM work. We have tangible needs for sharing data sets within the movement and collaborating on them with advanced tools. So this would argue for not just settling for using Github or external sites, but to actually allow the storage of (reasonable) data sets using formats that are recognized as being a best practice.

For reference, the US Library of Congress has pointed to four "formats" for long-term preservation - XML, JSON, CSV and SQLite. Link: https://www.loc.gov/preservation/resources/rfs/data.html

As for whether "data" is in scope for Commons - the fact that we even have a "Data:" namespace in Commons means that we have a precedent in moving into this area.

I know that not all CSV are created equal, and it's a very nonstandard landscape. But if the very active world of ML/data science can deal with nonstandard CSVs all the time, I don't think this should be a blocker to us considering it.

@Aklapper :D Please consider this a "me one".

Having CSV + JSON + XML + SQLite support is increasingly important; and we should have ODT + ODS for user-facing rather than archival reasons. the initial concerns have all been met, and the positive impact has increased over time; I'd like to see the priority of this bug raised.

Let's keep this bug focused on ODS, others can be created to address the rest.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:13 AM