Prefix Search: Would be nice if search engine could highlight the result rather than js
Open, LowPublic
Actions

Assigned To

None

Authored By

	• Manybubbles
	Feb 6 2014, 6:29 PM

Description

Sometimes we see weird results in the prefix search because Cirrus uses different matching rules then the jquery.suggestions library. In English, for example, Cirrus flattens high ascii. Searching for "resume" will return "résumé". Cirrus is quite capable of highlighting the result properly, but it has no way to tell the front-end what the result should look like.

I don't believe it would be practical to replicate Cirrus's logic on the front end because it can change and it is different for different wikis.

Details

Reference: bz60976

Related Objects

Mentioned In: T27187: Search suggestions should only highlight the first match in the title

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:03 AM

• bzimport added a project: MediaWiki-JavaScript.

• bzimport set Reference to bz60976.

• bzimport added a subscriber: Unknown Object (MLST).

• Manybubbles created this task.Feb 6 2014, 6:29 PM

I don't care how you do this, but please do. I hate the core search suggestions module.

Core could totally also output match indices from the opensearch API (that shouldn't be incompatible with anything, but I haven't checked), naively by default (we could just implement the same logic as the JS module has now), with a hook override for better search extensions. Then we could apply bolding in the UI trivially based on these indices.

I'm glad it has bothered someone else too.

So, can we make this happen? When the necessary information is somehow exposed via action=opensearch API, I'll be happy to to implement the JavaScript part of this.

Krinkle updated the task description. (Show Details)Jan 8 2015, 9:16 PM

If I understand correctly, the OpenSearch API follows a standard response format we shouldn't change. We can add it to a prefixsearch or search API module, however. Probably using offsets or substrings to indicate what to highlight.

Current format:

{
"query": "resum"
"results": [
  "Resumé",
  "Resumé (magazine)",
  "RESUMECHAR (CONFIG.SYS directive)",
  "Resumen de acompañar"
]
}

Current (incomplete) highlighting behaviour:

Screen_Shot_2015-01-08_at_21.19.45.png (356×504 px, 39 KB)

Proposed formats:

{
"query": "resum"
"results": [
  [ 5, "Resumé" ],
  [ 5, "RESUMECHAR (CONFIG.SYS directive)" ],
  [ 5, "Resumé (magazine)" ],
  [ 5, "Resumen de acompañar" ]
]
}

{
"query": "resum"
"results": [
  [ "Resum", Resumé" ],
  [ "RESUM", "RESUMECHAR (CONFIG.SYS directive)", ] ..
]
}

Actually... Unless there are cases where the interpretation of unicode code points is different for one of the flattened characters, wouldn't it always simply be the length of the input string?

Except for namespace prefixes, as we allow normalisation/localisation of those.

In T62976#964273, @Krinkle wrote:

Actually... Unless there are cases where the interpretation of unicode code points is different for one of the flattened characters, wouldn't it always simply be the length of the input string?

No, the processing can cause the number of separate characters to change, for example æ↔ae, ß↔ss. (I was also under the impression that Cirrus ignored/downplayed non-word characters like '(' when displaying search suggestions, but it doesn't seem to now.)

No, the processing can cause the number of separate characters to change, for example æ↔ae, ß↔ss. (I was also under the impression that Cirrus ignored/downplayed non-word characters like '(' when displaying search suggestions, but it doesn't seem to now.)

It does that in full text search but prefix search includes them. Its supposed to be just the right kind of sloppy matching....

But, yeah, the most flexibility possible would be best. We want the ability to properly handle whatever off the wall request comes in and if the highlighting code makes any assumptions then it'll break it. The best would be to accept offset pairs to highlight or the string marked up with <em> tags or something. The <em> tags might be simplest because you could transform them on the client side to whatever you like but they'd still be simple to read right in the string. Simpler than offset pairs, at least.

Krenair subscribed.Jan 12 2015, 3:57 PM

If I understand correctly, the OpenSearch API follows a standard response format we shouldn't change.

Can we not extend it? Like add another key, say 'matches', that would contain indexes of matched substrings in each suggestion result?

In T62976#974395, @matmarex wrote:

If I understand correctly, the OpenSearch API follows a standard response format we shouldn't change.

Can we not extend it? Like add another key, say 'matches', that would contain indexes of matched substrings in each suggestion result?

OpenSearch format is an array with an array inside. No string keys.

https://www.mediawiki.org/w/api.php?action=opensearch&search=ap&limit=4

[
    "ap",
    [
        "Apache configuration",
        "Apps/Commons",
        "Apps",
        "API/maintenance"
    ]
]

It seems we already extended it it by adding a second and third array at the end for text extract and urls:

[
    "ap",
    [
        "Apache configuration",
        "Apps/Commons",
        "Apps",
        "API/maintenance"
    ],
    [
        "Apache is probably the webserver used most with MediaWiki.",
        "",
        "",
        "This page is to document activity related to the MediaWiki API. This is an ongoing activity, led by Sam Reed."
    ],
    [
        "https://www.mediawiki.org/wiki/Apache_configuration",
        "https://www.mediawiki.org/wiki/Apps/Commons",
        "https://www.mediawiki.org/wiki/Apps",
        "https://www.mediawiki.org/wiki/API/maintenance"
    ]
]

That doesn't scale well though.

On second thought. From a design and user experience point of view. Do we even need the highlighting? I've rarely seen this kind of highlighting done in other search interfaces or autocompleted form fields. They just show the results.

I've played with it a bit locally and am liking it a lot. It feels a little wrong because we're so used to bit. I'd like to consider ditching that logic altogether and just displaying the results are normal (linked) text.

Screen_Shot_2015-03-14_at_03.20.10.png (377×630 px, 45 KB)

Screen_Shot_2015-03-14_at_03.19.57.png (428×642 px, 44 KB)

Thoughts?

matmarex mentioned this in T27187: Search suggestions should only highlight the first match in the title.Sep 13 2016, 12:23 AM

Restricted Application added a project: Discovery-Search. · View Herald TranscriptSep 24 2018, 7:22 AM

• EBjune lowered the priority of this task from Medium to Low.Sep 27 2018, 5:17 PM

EBernhardson renamed this task from Prefix Search: Would be nice if php could highlight the result rather than js to Prefix Search: Would be nice if search engine could highlight the result rather than js.Sep 27 2018, 5:17 PM

• EBjune moved this task from needs triage to search-icebox on the Discovery-Search board.Sep 27 2018, 5:17 PM

Closing out low/est priority tasks over 6 months old with no activity within last 6 months in order to clean out the backlog of tickets we will not be addressing in the near term. Please feel free to reopen if you think a ticket is important, but bare in mind that given current priorities and resourcing, it is unlikely for the Search team to pick up these tasks for the indefinite future. We hope that the requested changes have either been addressed by or made irrelevant by work the team has done or is doing -- e.g. upgrading Elasticsearch to a newer version will solve various ES-related problems -- or will be subsumed by future work in a more generalized way.

Re-opening tasks and removing from team workboard per IRC feedback given yesterday and discussion with MPham.

	F90749: Screen_Shot_2015-03-14_at_03.19.57.png
	Mar 14 2015, 2:20 AM

	F90750: Screen_Shot_2015-03-14_at_03.20.10.png
	Mar 14 2015, 2:20 AM

	F27026: Screen_Shot_2015-01-08_at_21.19.45.png
	Jan 8 2015, 9:23 PM

Prefix Search: Would be nice if search engine could highlight the result rather than jsOpen, LowPublicActions

Description

Details

Related Objects

Event Timeline

Prefix Search: Would be nice if search engine could highlight the result rather than js
Open, LowPublic
Actions