Page MenuHomePhabricator

Search for nonexistent page with quote mark in title doesn't show "create page link" on he.ws
Closed, ResolvedPublic

Description

Author: inkbug200

Description:

  1. Go to Hebrew Wikisource
  2. Search for אילת השחר (מלבי"ם)/פרק נ
  3. One is told: > לא נמצאו תוצאות המתאימות לחיפוש. Instead of: > לא נמצאו תוצאות המתאימות לחיפוש. >אין בוויקיטקסט העברי דף בשם "אילת השחר (מלבי"ם)/פרק נ". ליצירת דף חדש בשם זה, לחצו על הקישור האדום (דפים מקושרים).

Version: master
Severity: normal
URL: https://he.wikisource.org/w/index.php?search=%D7%90%D7%99%D7%9C%D7%AA+%D7%94%D7%A9%D7%97%D7%A8+%28%D7%9E%D7%9C%D7%91%D7%99%22%D7%9D%29%2F%D7%A4%D7%A8%D7%A7+%D7%A0

Details

Reference
bz64350

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:16 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz64350.
bzimport added a subscriber: Unknown Object (MLST).

Confirming, also happens for above link with &srbackend=CirrusSearch

Bug only exists for double quotes ("), no bug for artcles with a single quote (')

Switching to cirrus and raising priority because it belies a worse problem: In full text search Cirrus always interprets quotation marks as search syntax. It looks like in Hebrew they could be a [[Gershayim]]. Cirrus will see them as an unbalance quote and helpfully balance it for the user and then interpret that as a phrase query. So full text search with them will be broken.

So where do we go from here? Is "quoting some words" normal in Hebrew to represent a phrase? If so, we should keep that syntax. If not then we can switch to whatever is normal for phrases and not use quotes at all for phrases in Hebrew.

inkbug200 wrote:

Quotation marks (מרכאות, “”) are used in Hebrew for phrases and quotes. Gershayim (גרשיים, ״) is used in Hebrew for acronyms.

On a regular Hebrew keyboard, none of these characters are found, and instead the typewriter double quote (") is used for both (this is also the practice on the Hebrew Wikisource).

However, it might be possible to differentiate between phrases and acronyms as follows:

  1. If the " character has only one character after it (מלבי"ם), then it should be considered to be a Gershayim, and the word should be dealt with as an acronym.
  2. Otherwise, it is a quotation mark, and it should be dealt with as a phrase.

This is not foolproof, but it should deal with most of the problems.