Page MenuHomePhabricator

Unexpected explicit image found in search results - possible search bomb?
Closed, DeclinedPublic

Description

So, I did a search on the English Wikipedia for the text "welcome", and I restricted the results to multimedia.

https://en.wikipedia.org/w/index.php?title=Special:Search&search=welcome&fulltext=Search&profile=images&redirs=1

When I did this, the first result that came up was [[:File:Anus.jpg]]. (Warning: neither the search link nor the file link are safe for work.)

The file description page doesn't have any mention of the text "welcome", and I couldn't spot any other reason why this image should turn up for a search for "welcome".

Also, the same search on Wikimedia Commons does not return Anus.jpg as a result.

https://commons.wikimedia.org/w/index.php?title=Special:Search&search=welcome&fulltext=Search&profile=images&redirs=1

My first thought was that it could be MediaWiki's version of a [[Google bomb]], but I don't know if this is true or not. Sorry if this is a false alarm, but this is a very strange result to get from this search, so I thought it was worth reporting here.


Version: unspecified
Severity: normal

Details

Reference
bz61477

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:01 AM
bzimport set Reference to bz61477.
bzimport added a subscriber: Unknown Object (MLST).

I have seen this topic ("interesting" image search results) brought up before on a Village Pump, but cannot find it now. But it's not a "new" issue.

Using the CirrusSearch backend via
https://en.wikipedia.org/w/index.php?title=Special:Search&search=welcome&fulltext=Search&profile=images&redirs=1&srbackend=CirrusSearch
this seems to not be a problem anymore.

(In reply to Andre Klapper from comment #1)

Using the CirrusSearch backend via
https://en.wikipedia.org/w/index.php?title=Special:
Search&search=welcome&fulltext=Search&profile=images&redirs=1&srbackend=Cirru
sSearch
this seems to not be a problem anymore.

(helped me find bug 61483.)

Moving to Cirrus will give us many more options for tuning the search, so we can hopefully avoid these situations.

Chad/Nik, assuming that this is the result of a vandal trying to game the search results, do you feel like we can adequately respond to something like this in the future? We've been talking about the hiding content attack, but this actually seems like a more common form of someone trying to be annoying.

The big advantage Cirrus has over lsearchd in this department is that indexing is pretty fast. So if you found something like this you could edit the page to remove whatever caused it to show up and it'd reindex within a few seconds.

Beyond that, Cirrus gives us the ability to peer into the index and see what it contains so if some one comes up with some crazy way to get it in there that we aren't expecting then we have a much higher chance of figuring out what they did and stopping it in the future.

I think the general consensus after digging into it with Chad is that we're really not sure why this particular issue is happening, and lsearchd is woefully short on debugging information.

Since we're moving to Cirrus where this particular issue doesn't exist, I think we can say this is being addressed, and both Nik and Chad have this on their radar to watch out for as we migrate to the new system.

I'm going to close this for now to get it off the security queue (WONTFIX, because their isn't a "Fixed by an unrelated thing we're working on"). Feel free to reopen it if there is something actionable we need to do, or Nik/Chad if you want to keep digging into this for Cirrus.

That seems sensible to me. Thanks for looking into it.

Also, if the issue doesn't require fixing as such, how about making this a public bug? That will help to avoid duplicate reports, and will allow others to ponder what might have caused it.