Page MenuHomePhabricator

Beta search provides weird snippets from JS pages
Closed, ResolvedPublic

Description

Search results for "mw prefix:Utilizador:H" on ptwikibooks

Just look at the screenshot. Something is wrong.


Version: unspecified
Severity: critical
URL: https://pt.wikibooks.org/wiki/Special:Search?profile=all&search=mw+prefix%3AUtilizador%3AH&fulltext=Search&uselang=en
See Also:

Attached:

beta-search-js-pages.png (900×1 px, 145 KB)

Details

Reference
bz61752

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:00 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz61752.

Confirmed this earlier today after chatting with some people. This is really bad.

Here's what we're pretty sure is happening (need to confirm that):

  • Cirrus is parsing the JS page even though it shouldn't.
  • Since it's being treated as wikitext and being parsed, we're getting weird results when we hit a header tag in Javascript (in this case, no parseable closing tag)
  • We generate bogus sections as a result that shouldn't go anywhere

We should (in no particular order):

  • Check content handling to make sure CSS/JS pages aren't getting parsed like wikitext
  • Make sure core Special:Search is not being insane when given obviously bogus sections. See also bug 59717 where lsearchd is giving us bogus section titles as well

There's a second bug (but might just be a symptom of this bug) in that we see two distinct titles (vector.js and Tools.js) but they've got the same section name (which only exists in the former).

Change 115214 had a related patch set uploaded by Chad:
Don't use parsed wikitext when dealing with CSS/JS pages

https://gerrit.wikimedia.org/r/115214

Change 115214 merged by jenkins-bot:
Don't use parsed wikitext when dealing with CSS/JS

https://gerrit.wikimedia.org/r/115214

Will go out with next deploy cycle and results will still be wonky until indexes are rebuilt.

He7d3r set Security to None.