Page MenuHomePhabricator

Prefer pages in the user's language in multilingual wikis
Closed, ResolvedPublic

Description

As an Italian user using Italian language interface on a multilingual wiki, I expect Italian and English results (in this order) to come first.

"Italian and English" results here is defined by the [[mw:page content language]], which is currently set only by Translate. See for instance https://www.mediawiki.org/w/index.php?title=Help:Extension:Translate/Installation/it&action=info which is in Italian.

I suppose this requires storing the language of the document, which should be easy enough I think, and might later be used for fancier things (bug 54832).


Version: master
Severity: enhancement
URL: https://www.mediawiki.org/w/index.php?title=Special%3ASearch&search=memcached+prefix%3Ahelp%3Aextension%3Atranslate&go=Vai

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:25 AM
bzimport added projects: CirrusSearch, I18n.
bzimport set Reference to bz66829.
bzimport added a subscriber: Unknown Object (MLST).

Change 140866 had a related patch set uploaded by Chad:
Prefer articles in a user's language on multilingual wikis

https://gerrit.wikimedia.org/r/140866

Change 140866 merged by jenkins-bot:
Prefer articles in a user's language on multilingual wikis

https://gerrit.wikimedia.org/r/140866

I'm not sure this still works, https://www.mediawiki.org/w/index.php?title=Special%3ASearch&profile=default&search=throttle+prefix%3AManual%3APywikibot&fulltext=Search&uselang=it shows [[Manual:Pywikibot/Global Options/it]] only as 5th result for me, preceded by English and Catalan.

Reopening, search result set and ordering on mw.org seems identical regardless of language preference or uselang=langcode Compare "badtoken" search with "badtoken" search with uselang=fr.

I'm not sure this still works, https://www.mediawiki.org/w/index.php?title=Special%3ASearch&profile=default&search=throttle+prefix%3AManual%3APywikibot&fulltext=Search&uselang=it shows [[Manual:Pywikibot/Global Options/it]] only as 5th result for me, preceded by English and Catalan.

Nemo_bis triaged this task as Medium priority.Jul 25 2015, 12:26 PM
Nemo_bis set Security to None.

Change 315532 had a related patch set uploaded (by EBernhardson):
Prefer pages in the user's language in multilingual wikis

https://gerrit.wikimedia.org/r/315532

afaict the support was added to cirrussearch but never turned on anywhere, above patch turns it on for mediawiki.org with fairly arbitrarily chosen weights.

Change 315532 merged by jenkins-bot:
Prefer pages in the user's language in multilingual wikis

https://gerrit.wikimedia.org/r/315532

deployed. The results are better, but never perfect. The problem is that the weight on incoming_links can, in some cases, overweight the language based rescore. So for the example query above:

https://www.mediawiki.org/w/index.php?title=Special%3ASearch&profile=default&search=throttle+prefix%3AManual%3APywikibot&fulltext=Search&uselang=it&cirrusDumpResult&cirrusExplain=pretty

Pywikibot/Global Options

text based scoring: 0.05815
incoming links weight: 2.576
language:en weight: 2.5
final score: 0.37453112

Pywikibot/Global Options/it

text based scoring:  0.05815
incoming links weight: 0.90309
language:it weight: 5.0
final score: 0.26257026

Now my first example search gives, in order:

Help:Extension:Translate/Installation
Help:Extension:Translate/Components
Help:Extension:Translate/Installation/it
Help:Extension:Translate/Installation/en

And the second:

Manual:Pywikibot/Global Options
Manual:Pywikibot/Global Options/it
Manual:Pywikibot/Global Options/en
Manual:Pywikibot/login.py/it
Manual:Pywikibot/login.py

This at least makes sense, because my interface language and the original language come before any other translation. So we could generously consider this task fixed (though the broader goal T56832 isn't), but it would be wise to keep track of your finding somewhere, e.g. to decide whether to revise the language weight; and perhaps it's another task to avoid showing the original language twice.

debt subscribed.

Cool - closing this ticket as resolved and opened T148207 for further investigation later on.

debt claimed this task.
debt moved this task from Backlog to Done on the Discovery-Analysis (Current work) board.

@TJones do we have any kind of test to verify that this keeps working as expected? Currently I'm unable to verify it's still fixed.

As pointed out by Erik the number of incoming links can sometimes overweight language preferences (this is especially true when using special keywords as they change the scoring formula). I have some hope that can unify the scoring behaviors with and without keywords in the query.
Without any keywords in the query I can see language preference in effect:

But when a keyword is here the effect of the language preference is diminished this is perhaps why you don't see it in effect?
This should be more consistent I hope once we've done refactoring how we handle the query in Cirrus.

@Nemo_bis, I hope @dcausse answered your question. The pages related to Manual:Pywikibot/login.py are good examples, since the /en and /it pages are currently almost identical, since only the "See also" section has been translated.