Page MenuHomePhabricator

Special:Search/intitle: broken in en.wikipedia for non-ASCII characters
Closed, DeclinedPublic

Description

Author: Innocenti.Maresin

Description:
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&fulltext=Search&ns0=1&redirs=1&search=intitle: (append one UTF-8 character) does not show redirects if the character searched has code point above U+007F. en.wikipedia.org/wiki/Special:Search/intitle: even does not show articles (non-redirect pages) for some code points. Apparently the bulk of this bug affects only en.wiki. Examples:
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3A%C2%85&fulltext=Search&ns0=1&redirs=1&profile=advanced – but U+0085 NEL (non-graphic) redirect exists: http://en.wikipedia.org/wiki/%C2%85
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3A%C2%A2&fulltext=Search&ns0=1&redirs=1&profile=advanced – but the [[¢]] redirect exists: http://en.wikipedia.org/wiki/%C2%A2

Note: I was unable to locate any wiki where intitle:¢ for U+00A2 gives some results, which means that in other wikis something is broken too

http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3A%CF%B7&fulltext=Search&ns0=1&redirs=1&profile=advanced – but the [[Ϸ]] redirect exists: http://en.wikipedia.org/wiki/%CF%B7
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3A%D0%81&fulltext=Search&ns0=1&redirs=1&profile=advanced

Note it works in another language: http://ru.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3A%D0%81&fulltext=Search&ns0=1&redirs=1&profile=advanced&uselang=en

http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3A%D0%86&fulltext=Search&ns0=1&redirs=1&profile=advance

Note it works in another language: http://uk.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3A%D0%86&fulltext=Search&ns0=1&redirs=1&profile=advanced&uselang=en

http://en.wikipedia.org/wiki/Special:Search/intitle:%E8%A5%BF – but the [[西湖]] dab page exists: http://en.wikipedia.org/wiki/%E8%A5%BF%E6%B9%96

Note it works in another language: http://zh.wikipedia.org/wiki/Special:Search/intitle:%E8%A5%BF

Version: unspecified
Severity: minor
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=61080

Details

Reference
bz33824

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:07 AM
bzimport set Reference to bz33824.

Innocenti.Maresin wrote:

I am not sure that understand correctly what the combination of search=intitle: which redirs=1 has to do, but along the example with "西" there are numerous non-searchable characters even directly in the title of article, and not restricted only to en.wikipedia.

http://en.wikipedia.org/wiki/Special:Search/intitle:%E2%80%93 – though a huge number of articles with en dash exist in en.wikipedia
http://ru.wikipedia.org/wiki/Special:Search/intitle:%E2%80%94 – though a huge number of articles with em dash exist in ru.wikipedia
http://ru.wikipedia.org/wiki/Special:Search/intitle:%C2%AB – though a huge number of articles with guillemets exist in ru.wikipedia
http://ca.wikipedia.org/wiki/Especial:Cerca/intitle:%C2%B7 – though a large number of articles with interpunct exist in ca.wikipedia

We're in the process of migrating over from Lucene to CirrusSearch, so I'm marking this bug as RESOLVED WONTFIX.

That said, this bug still seems to exist in CirrusSearch. I've filed an equivalent bug in MediaWiki extensions -> CirrusSearch. Check the See Also on this bug for more information.