feature request: replace forbidden characters with lookalike UTF8 signs in the wikipedia search input control
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• bzimport
	May 18 2012, 11:01 AM

Description

Author: michael.manner

Description:
replace forbidden characters with lookalike UTF8 signs in the wikipedia search input field [alt-F].

Here are some alternativs:
mayor:

# → ⧣ (⧣) EQUALS SIGN AND SLANTED PARALLEL (U+29E3) &#10723;

With this replacements wouldt it be possible do find article titles like "C#"

minors:

< → ‹ (‹) SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (U+2039) &#8249;
> → › (›) SINGLE LEFT-POINTING ANGLE QUOTATION MARK (U+203A) &#8250;
| → ∣ (∣) DIVIDES (U+2223) &#8739;
{ → ❴ (❴) MEDIUM LEFT CURLY BRACKET ORNAMENT (U+2774) &#10100;
} → ❵ (❵) MEDIUM RIGHT CURLY BRACKET ORNAMENT (U+2775) &#10101;

no alternativs found:

[
[

Only the CJK Characters would be available, but the arn't supported by a large number of fonts.

Version: unspecified
Severity: enhancement

Details

Reference: bz36954

Related Objects

Mentioned Here: T211824: Investigate a “rare-character” index

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 22 2014, 12:22 AM

• bzimport added a project: MediaWiki-Redirects.

• bzimport set Reference to bz36954.

• bzimport added a subscriber: Unknown Object (MLST).

• bzimport created this task.May 18 2012, 11:01 AM

This sounds like something that would get in the way of AntiSpoof.

mr.heat wrote:

This bug has been confirmed by popular vote. ***

Krinkle edited projects, added Discovery-ARCHIVED, MediaWiki-Search; removed MediaWiki-Redirects.Jul 31 2017, 9:31 PM

Krinkle removed a subscriber: • wikibugs-l-list.

Restricted Application added a project: Discovery-Search. · View Herald TranscriptJul 31 2017, 9:31 PM

debt moved this task from needs triage to search-icebox on the Discovery-Search board.Aug 3 2017, 9:39 PM

I'm going to close this because it was written before we moved to Elasticsearch. The current behavior of Elasticsearch is the same for both these characters and their proposed normalization: all of are ignored during tokenization. In general, we have implemented ICU Normalization for English-language projects, so most non-punctuation characters are normalized well.

If the goal is to be able to find these specific characters, see T211824: Investigate a “rare-character” index.

feature request: replace forbidden characters with lookalike UTF8 signs in the wikipedia search input controlClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

feature request: replace forbidden characters with lookalike UTF8 signs in the wikipedia search input control
Closed, ResolvedPublic
Actions