Page MenuHomePhabricator

Allow Special:MIMESearch to work under miser mode
Closed, ResolvedPublic

Description

Proposed patch

Special:MIMESearch will work efficiently on Wikimedia if we will add indexing by (img_major_mime, img_minor_mime).


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Special:MIMESearch
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=34969

attachment patch-mimesearch.patch ignored as obsolete

Details

Reference
bz13438

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:06 PM
bzimport set Reference to bz13438.

You should be able to use the generic index updater function instead of writing a custom one here.

Created attachment 5021
Updated patch, use add_index()

Updated the previous patch to use add_index() per Brion's comments.

attachment test.patch ignored as obsolete

ayg wrote:

You have unrelated changes to User.php there. Also, I suggest you commit the reformatting of the img_sha1 lines separately and recreate the patch, because they just distract from the patch's actual content.

Created attachment 5057
With fixes

Updated previous patch. Removed unrelated changes.

Attached:

This schema tweak will do the job of making the query faster, but IMHO it's not a super great system to begin with. If we have to make a schema change, it might be good to consider the basic issues first:

  • There's no secondary sorting or filtering, which means it's going to be of very limited utility unless you're searching for a particularly exotic type.

The results list will appear in semi-random order, paging through results will be very slow.

A secondary index on name would at least allow for basic ordering and index-based paging.

This index *might* be useful sometimes for bulk statistics, but the core case (a sensible way of searching based on mime type) would probably be better served by making some image metadata (including mime type) available to the fulltext search index.

If appropriately integrated, you could then do a search for soemthing like:

image:moon landing mime:application/ogg

and get a sensible result of files with a text match for "moon landing" and a MIME type matching application/ogg.

I'm building a tool (at http://brettz9.github.com/xqueryeditor/ ) to allow Ajax browsing of Mediawiki articles, currently for the purpose of performing XQueries against XML stored on wikis, and hopefully for optional local IndexedDB storage as well. It is very unsafe to make these queries at the moment (working on that), but especially after whenever I may be able to get that resolved, I'd want to be able to point people by default to logical locations for a starting point for browsing XML documents at any given Mediawiki wiki.

Currently, when the user chooses a Mediawiki wiki, I'm supplying its root category by default, but it would be great if the API could filter out only those categories belonging to a particular MIME type (or at least if the MIME search worked) so I could avoid my users seeing non-XML pages (though I could parse a page fully into XHTML and expose that once I can figure out how to do that properly through the API). And it would be nice to do all this if this would not require users to manually add categories for these file format types.

(Incidentally, would be great to have the ability to directly edit XML files such as SVG (and TEI--my main interest) with the benefit of diffs and all, rather than needing to treat them as images on the one hand, or to put them directly within articles without the choice of whether to disable wiki markup.)

The attached patch will probably still apply (almost; the updaters.inc part would have to be done manually I guess), but the index should probably be on (major_mime, minor_mime name) to facilitate paging. Also, Special:MIMESearch's queries should be looked at to see what kind of index we'd actually need, and maybe tweaked to be more reasonable (like, use proper paging instead of OFFSET). I'd also like to change it to no longer be a QueryPage, because parameterized QueryPages don't really make sense.

And of course we should also expose this functionality in the API :)

sumanah wrote:

+reviewed since folks have given Chad code review

I did another attempt at this. With a little bit more complexity on the php side, I believe it is possible to do this efficiently without adding any more indicies.

I agree that searching for mimes can be done in much better way, but this sort of simple use still has its uses. Thus if we can make it work without messing with the indicies, I think we should. (That said, we should still attempt to do something better for searching by mime type in the mysterious future. Fixing this doesn't mean we can't have both)

Please see gerrit change 67468 (Where are thou gerrit notification bot?)

Change 67468 merged by jenkins-bot:
Make Special:MIMESearch a non-expensive special page.

https://gerrit.wikimedia.org/r/67468