Page MenuHomePhabricator

in languages where the "User:" space has more than one gender flavor, autocomplete suggests pages from user space even before user types ":"
Open, LowPublic

Description

Autocomplete functionality is supposed to show only pages from article space (namespace 0), at least until one enters the first ":".

In Hebrew, when you type in "משתמ" (the first 4 character of the localized word for "User"), autocomplete suggests pages in user page, and shows only pages in the user space of users who have set their gender to "Female" in preferences.

IOW, users whose "User:" translates to "משתמשת:" rather than the "standard", which is "משתמש:".


Version: 1.21.x
Severity: normal
See Also:
https://rt.wikimedia.org/Ticket/Display.html?id=2160

Details

Reference
bz31697

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 11:47 PM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz31697.
bzimport added a subscriber: Unknown Object (MLST).

Regression in behavior, marking highest priority.

Hmm, that's not what it's "supposed" to do so much as what it happens to do because of the default implementation. Whether it's a desirable behavior I don't know if we've thought about that.

(In reply to comment #2)

Hmm, that's not what it's "supposed" to do so much as what it happens to do
because of the default implementation. Whether it's a desirable behavior I
don't know if we've thought about that.

it seems reasonable that "autocomplete" should behave the same as "search". search by default is limited to namespace 0, and so should autocomplete, until the user types the first ":".

even if one thinks that typing the first few character from the localised word for "User" should show pages from the "User:" space, it makes absolutely no sense to bring only pages of female users. IOW - bug.

Does WMF have any PrefixSearchBackend hook users or is it using the default implementation?

The patch on Bug 31602 seems to show that this is caused by HTML5 datalists.

i did not test the patch (i would if i could...), but i do not believe the patch has anything to do with this issue.
the issue has nothing to do with the appearance of the suggestion box, it's about its content (i.e., the data returned by the ajax call).

the problem is clearly in the API and not on the browser side, so r100348 is extremely unlikely to solve the issue.

Taking a peek over this per code review request...

Ok I can reproduce this on https://es.wikipedia.org/ but not on a local trunk installation.

Reverting r100348 doesn't appear to have any effect; typing 'Usuar' in the search box when set to Spanish doesn't show any user pages. Only once I get to 'Usuario:' or 'Usuaria:' or 'User:' do they show up.

It doesn't look like an API or client-end problem; it looks like something on the Lucene search backend's end (or at least the search plugin).

Nothing obvious about gendered-namespace support in MWSearch extension; the logic implementing the aliases might be in the lucene backend...

This may actually be a side-effect of bug 32376.

The XML export data that the search indexer is building from will have "undefined" namespaces like 'Usuaria:' (etc) in the export, which will be unrecognized and end up getting interpreted as ns 0.

This is why on es.wikipedia.org search for 'Usuar' turns up things in 'Usuaria:' and 'Usuario Discusión:' (different from the stock 'Usuario' and 'Usuario discusión') but not 'Usuario:' since they do get interpreted correctly.

r103945 switches the export to canonical form titles for bug 32376; after search index updating this should clear this problem up.

Needs merge & deployment to 1.18...

Merged to REL1_18 in r103953, 1.18wmf1 in r103954.

Roan's pushing the fix live; it may not fully update until a search index rebuild is triggered, not sure when those happen.

from IRC:

[12:27] <RoanKattouw> The reindexing occurs between 06:00 and 06:30 UTC I think

so hopefully it'll be clearer by tomorrow. :)

Testing on eswiki, I type "usuario" and see the expected "usario" and also "usarios". I do not see any "usuario:USERNAME"

Unexpectedly, I do see "Usuario Discusión:USERNAME".

Also unexpected, when I type "usuario:", I don't see any completions except "Usuario:! DanSkammelsrod !".

When I type "usario:u" I get more completions of usernames starting with "u".

When I do this on enwiki, though, I get similar results.

Hrm... but testing "usaria", I see a completion for "Usuaria:Miss Manzana/Retiro de nominación" so I don't think this is fixed yet.

If I type "usaria:", though, I do see completions including "usario:" so maybe that is the "canonical form" that brion talks about.

Looks like there are still bad entries (some 'Usuaria:...' entries come up prefix-searching for 'usuari' on es.wikipedia.org).

Adding rainman so he can take a peek at this though there is some
action on https://rt.wikimedia.org/Ticket/Display.html?id=2160 so
maybe who knows.

Copying this comment from RT:

Um... now eswiki is showing some REALLY funky results.

Before:

When I type "usario:u" I get more completions of usernames starting with "u".

Now I don't see user page results.

Before:

Testing on eswiki, I type "usuario" and see the expected "usario" and also

"usarios". I do not see any "usuario:USERNAME"

Now I do not see that. Instead, the only completion I see is Usario:Leyón/Sobre mí (plus a lot of redirects for the same user).

Before:

When I type "usario:u" I get more completions of usernames starting with "u".

Now, no user pages are given. (On enwiki, "user:u" gives usernames starting with "u").

Maybe I checked too soon? But it looks worse now.

Peter, I've been told on IRC that you can look into this.

Assigning to Peter Youngmeister, as Amir indicated that this should be resolved by ops, and Peter allegedly has a solution (no details known).

For our infomormation: The ops ticket has seen no changes since 2012-01-12 (That's January 12, 2012 for everyone who writes dates in a funny way).

Entering "Usuario" in the search bar on es.wikipedia.org will list quite some "Usuario Discusión:" links, hence still valid.

Peter: ping - any updates on this?

I wonder if the new prefix stuff in ElasticSearch will have an effect on this nasty bug...

For the records, RT #2160 is closed "as wontfix because lsearchd is end of life and Cirrus is replacing it in the next few months".

Aklapper lowered the priority of this task from High to Low.Nov 16 2017, 10:34 AM