Page MenuHomePhabricator

CirrusSearch: Can't find text in specific heading
Closed, ResolvedPublic

Details

Reference
bz62058

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:04 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz62058.

Filing high because I'm not sure what is up with it.

Added some See Also bugs which might be the cause. Or might not.

Not quite sure what is going on but this actually works in dev but not production. Both enwiki and dewiki don't split on the ":" but my dev machines do.

http://localhost:1234/dewiki_content/_analyze?analyzer=text&text=Kategorie:Stolpersteine
{

"tokens": [
  {
    "token": "kategorie:stolperstein",
    "start_offset": 0,
    "end_offset": 23,
    "type": "<ALPHANUM>",
    "position": 1
  }
]

}

Ah, what is saving me in dev is the $wgCirrusSearchUseAggressiveSplitting setting which _is_ enabled on mediawiki.org but only works in English. The problem with enabling it everywhere is that it only works in English right now and might make it harder to find things.... Let me see what I can do about that.

Stalling this for a moment while I wait on input from Dan and Chad. At question is whether to get aggressive splitting working everywhere or to use a smaller fix to get just colons. I'd like to unify everywhere on aggressive splitting to make regression testing easier and so I don't have the confusion of some environment having it and some not.

I've gotten input: we should push aggressive splitting everywhere we can sensibly do it. I've filed https://github.com/elasticsearch/elasticsearch/issues/5648 upstream so we can more easily edit the analyzers built in to elasticsearch. Right now editing them requires rebuilding them as "custom" analyzers by hand which is error prone. The issue would let us instruct Elasticsearch to rebuild them as custom analyzers and then we could make incremental changes to them.

We don't actually need the issue closed upstream to work on this here, but we will need it for a few languages because some of the language analyzers can't actually be rebuilt as custom analyzers: Persian, Thai, and German I believe.

Aklapper set Security to None.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Filing high because I'm not sure what is up with it.

@Manybubbles: This task has not seen updates for 15 months. Is this still high priority? Still working on this (as you're set as assignee)?

@Discovery folks: This task has not seen updates for 15 months. Is this still high priority?

Should the assignee (Nik) be reset?

Deskana lowered the priority of this task from High to Low.

Given that Nik said the Cirrus results should have three results, and now it does, I'm going to assume this is resolved.