Page MenuHomePhabricator

add CSV support to (most) special pages
Closed, ResolvedPublic

Description

I propose to add a CSV mode to special pages that display mainly a list. This
would greately help bots and scripts with parsing, and would ease the server
load created by such scripts.

This patch implements CSV support in the QueryPage class, and adds CSV support
to (currently) 23 special pages without any extra effort.

patch will follow in a minute


Version: 1.6.x
Severity: enhancement

Details

Reference
bz3676

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:50 PM
bzimport set Reference to bz3676.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 974
patch for QueryPage.php

attachment csvQuery.diff ignored as obsolete

Removing patch keyword; patch fails security
review.

Exploit proof of concept:

  1. Set up MediaWiki on a server allowing PATHINFO

expansion or with a rewrite rule or alias for wiki
pages, plus this patch.

  1. Upload an image, with the description field

filled out with this fragment:
<script>alert(document.cookie)</script>

  1. In MSIE 6 on Windows, visit

Special:Unusedimages/evil.html?csv=yes

The extended path fragment ending in ".html" is
interpreted by MSIE as an override of the unknown
Content-Type; the HTML fragment in output is then
interpreted and the JavaScript executed, leaving
the site wide open to cross-site scripting attack.

At a minimum, this needs to perform the kind of
security checks that action=raw does, checking
that a canonical URL is being used.

There may be additional information disclosure
issues if there is data in the internal rows being
passed around that shouldn't make it outside, but
I haven't checked for this yet.

Created attachment 977
improved patch, see comment

I improved the patch in a few ways:

  • All fields are now URL-encoded. This has two advantages:
    • it makes the csv output imune to scripting attacks.
    • it allows lines to be split at the separator char reliably.
  • querycache is now used if applicable (untested)
  • Mime type is now text/plain instead of text/csv. While text/csv is the

proposed standard, it is not widely supported yet. text/plain is shown directly
by webbrowsers, as opposed to asking for download or an expernal app.

  • Subclasses can new specify which columns to include in CSV output by

overwriting the csvFields() function. This may be used to adress any unwanted
exposure of internal data.

attachment csvQuery.diff ignored as obsolete

I'm not sure I like 'gen=csv'. It might make sense to be consistent
with existing things like 'action=raw'. On the other hand 'feed=rss'
etc... Bleh. :)

doCSV() should *not* exit; the script should continue running to
completion (so any post functions and profiling can be run at the end
among other things). If you're trying to disable the HTML output, use
$wgOut->disable().

There looks to be some duplication of code in doCSV(); it's running
queries and such all over again. This should instead share existing
code; extract common submethods where appropriate.

Created attachment 981
updated patch. see comment

I updated the patch to address the issues mentioned above. Specifically:

  • doCSV() no longer calls exit, but uses $wgOut->disable()
  • the actual database query has been factored out, receache(), doQuery(),

doCSV(), and doFeed() now all use the same function. I hope I did not miss any
suptle differences.

This still uses gen=csv to trigger CSV mode. Using action=csv would not work
without hacking around index.php, which is already quite ugly in that respect.
Also, action indicates *what* is shown, there should be a convention for a
separate parameter that determines *how* the data is shown. Consider that in
the future, CSV (and XML, and...) support could be added for instance to the
history view, which is triggered by action=history - the format needs to be in
a separate parameter. I'm using "gen" because it is already used for js and css
(right?). The alternative would be to (mis-)use the "feed" parameter, or a
common "output" or "format" parameter. This would have many implications,
though...

NOTE: I have not tested this with the querycache - i'm not sure how to do that. But it should work as before, since I have not changed anything in that code, at least not intentionally.

RSS Feed is also untested, because there is currently no special page that is
based on QueryPage and has syndication enabled.

attachment csvQuery.diff ignored as obsolete

I have put together some general ideas and suggestions for a REST interface om
meta. See here:

http://meta.wikimedia.org/wiki/REST

The patch suggest in this bug is one of the corner stones of a REST interface as
I propose it. Please have a look...

Created attachment 1003
factored out CSV creation. patch for the additional file to follow.

Note to anyone: CVS sucks. It can't include new files in diffs.

Attached:

Created attachment 1004
standalone CSV class, required by previous patch. Diffed against a dummy.

Attached:

Duping this to bug 14869, which is basically trying to accomplish the same thing, but in a much cleaner way.

  • This bug has been marked as a duplicate of bug 14869 ***