Page MenuHomePhabricator

can't find results for string ".?*/()44$$$" though it is present in a page
Closed, ResolvedPublic

Description

Author: sumanah

Description:
https://test2.wikipedia.org/wiki/User_talk:Sumanah has the string

.?*/()44$$$

but searching for that string turns up no results, even when searching all namespaces:

https://test2.wikipedia.org/w/index.php?title=Special:Search&search=.%3F*%2F%28%2944%24%24%24&fulltext=Search&profile=all&redirs=1


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=47770

Details

Reference
bz53013

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:56 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz53013.
bzimport added a subscriber: Unknown Object (MLST).

Lowering importance for now compared to the other worse problems we're seeing. I'm not sure if we should support searching stuff like this given that we're really designed to search for words. At this point your search is tokenized as just "44". Everything else is thrown away.

sumanah wrote:

I understand that we'll be primarily searching for words. But there are English Wikipedia articles that have slashes in their titles, e.g., "/dev/null" and the results in https://en.wikipedia.org/w/index.php?search=%2Fdev%2F&title=Special%3ASearch&fulltext=1 , parentheses, e.g. https://en.wikipedia.org/wiki/%28I_Can%27t_Get_No%29_Satisfaction , question marks, e.g., ?uestlove , asterisks, e.g. *69 , and more.

Also, when someone is looking for technical help on mediawiki.org or in the help pages of the English Wikipedia (for instance, with templates), we will want to be able to help them, even if they are using {} and similar.

So I think we do need to worry about these kinds of characters.

Raising to normal - below being able to search for text that isn't really in the page, but above including unmentioned urls in search.

I'm not sure about exact matches on (I can't get no) satisfaction, but we can now find words delimited in camelCase on mediawiki.org which is an improvement.

I think we've fixed most of the big problems with this. There are a few that remain, mostly the !#$^@%#$%# kinds of searches. We've fixed the /dev/null and (I can't get no) satisfaction searchers. Lowering priority.

Considering we're better in most normal cases, I wonder if we'll be able to solve the remainder with the analysis on the non-expanded forms (cf bug 60487) which should definitely be able to find weird wikitext punctuation.

thuejk wrote:

Searching for just "<" returns

An error has occurred while searching: The search backend returned an error:

Obviously such an empty error message is in itself an error, and quite confusing. Even if you decide to not return pages containing "<" for technical reasons.

(In reply to Thue Janus Kristensen from comment #7)

Searching for just "<" returns

An error has occurred while searching: The search backend returned an error:

Obviously such an empty error message is in itself an error, and quite
confusing. Even if you decide to not return pages containing "<" for
technical reasons.

That is a different bug affecting the old search engine. See bug 66259.

FriedhelmW claimed this task.
FriedhelmW subscribed.

User talk page is found by search.