Page MenuHomePhabricator

Search problem in ml.wikipedia.org (malayalam)
Closed, ResolvedPublic

Description

Author: sadik.khalid

Description:
When searching for an article in ml.wikipedia.org the result is not related to search text entered.

Search text: കൊറിയ
Wiki search result is: http://ml.wikipedia.org/wiki/%E0%B4%AA%E0%B5%8D%E0%B4%B0%E0%B4%A4%E0%B5%8D%E0%B4%AF%E0%B5%87%E0%B4%95%E0%B4%82:Search?ns0=1&ns100=1&search=%E0%B4%95%E0%B5%8A%E0%B4%B1%E0%B4%BF%E0%B4%AF&
(http://ml.wikipedia.org/wiki/പ്രത്യേകം:Search?ns0=1&search=കൊറിയ&fulltext=Search)

Google search result is:http://www.google.com.qa/search?hl=en&q=site%3Aml.wikipedia.org+%E0%B4%95%E0%B5%8A%E0%B4%B1%E0%B4%BF%E0%B4%AF&btnG=Google+Search&meta=
(http://www.google.com.qa/search?hl=en&q=site%3Aml.wikipedia.org+കൊറിയ&btnG=Google+Search&meta=)

Google search result is more perfect and Wiki search result don't have any connection with search query.

I think this bug is not only in ml.wikipedia.org. See this link http://de.wikipedia.org/wiki/Spezial:Suche

Please do the needful

Thank you


Version: unspecified
Severity: normal
URL: http://ml.wikipedia.org

Details

Reference
bz11021

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:52 PM
bzimport set Reference to bz11021.
bzimport added a subscriber: Unknown Object (MLST).

Wikimedia wikis uses Lucene Search 2.0 -> moving this bug to the extensions.

We require a list box containing google,yahoo etc.. such as in enwiki

rainman wrote:

The main problem should be fixed in r25052. However, it will take a while for software to be updated.

Another issue I would like some feedbak on is that we still use a rather aggressive removal of all combining characters. Try searching for "കൊറിയ" (with quotes), the vowels are stripped, and it ends up something like "കറയ", I wonder if this will produce serious ambiguities?

For the additional search boxes, consult [[MediaWiki:Common.js]] on your favorite wiki.

jacob.jose wrote:

To answer your query:

Another issue I would like some feedbak on is that we still use a rather

aggressive removal of all combining characters. Try searching for
"കൊറിയ" (with quotes), the vowels are stripped, and it ends up
something like "കറയ", I wonder if this will produce serious ambiguities?

The word "കറയ" is significantly different in meaning compared to "കൊറിയ" (has no similarity at all) and hence could potentially generate serious ambiguities.

shijualex wrote:

If we search a word (for example, കൊറിയ) after placing it inside the quotes, (i.e "കൊറിയ") then the result is almost perfect.

jacob.jose wrote:

If we search a word (for example, കൊറിയ) after placing it inside the

quotes, (i.e "കൊറിയ") then the result is almost perfect.

Agreed with Shiju's above observation. However one new problem is that the behaviour of “കൊറിയ” is not equal to "കൊറിയ", but is equivalent to when searching with കൊറിയ. Quite unfortunately, if we type <Quote>കൊറിയ<Quote> in Mozhi 1.1 (used for transliteration for Malayalam), it gives the quotes used with:“കൊറിയ” and not the ones used with "കൊറിയ"

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]