Page MenuHomePhabricator

Narayam does not work properly on Chrome browser
Closed, ResolvedPublic

Description

Author: junu.pv+public

Description:
(Reported by a Sankrit wikimedian personally to me)

Narayam does not work properly on Chrome browser.

On debugging I found that it is what text replacement not doing its job. I am not sure but I think it might have been caused by some change in jquery.textSelection.js (??)


Version: unspecified
Severity: critical

Details

Reference
bz30130

Event Timeline

I can see some apparently incorrect behavior...

'ma me mi mo mu' on Firefox 5.0.1 gives
'म मॆ मि मॊ मु' which looks right to me.

on Chrome 12.0.742.122 it gives:
'म म्ॆ म्ि म्ॊ म्ु' which definitely shows a couple things in wrong order or such.

Safari 5.1 gives the same as Chrome 12.

However, behavior seems the same at least back to r90000, so this would be something that's been there a little while. Did it used to work correctly, is there a point we can compare from?

junu.pv+public wrote:

(In reply to comment #1)

I can see some apparently incorrect behavior...

'ma me mi mo mu' on Firefox 5.0.1 gives
'म मॆ मि मॊ मु' which looks right to me.

on Chrome 12.0.742.122 it gives:
'म म्ॆ म्ि म्ॊ म्ु' which definitely shows a couple things in wrong order or
such.

Safari 5.1 gives the same as Chrome 12.

However, behavior seems the same at least back to r90000, so this would be
something that's been there a little while. Did it used to work correctly, is
there a point we can compare from?

I put a notification regarding this on my network and no replies got, no one seems to be typing while they are in Chrome!

You are right, me too have checked it with very earlier revisions and not found working.

So I think it might be not working on Chrome once it has been rewritten to use jQuery.

A sample patch with text replacement part working on Chrome (still caret setting not working so cannot type within text): https://gist.github.com/1117113

Can it be confirmed that the code from jquery.textSelection.js working properly on Chrome too? (used in code starting from line 226 to 235: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/Narayam/ext.narayam.core.js?revision=93572&view=markup#l226)

Ok, here's what it looks like is happening:

  • user types "m"
    • select (0, 0] (empty selection)
    • insert "म्" (two characters: "m" and no-vowel placeholder)
  • user types "i"
    • select (1, 2] (select the second char: no-vowel placeholder)
    • insert "ि" ("i" combining character, replacing the placeholder)

In Firefox, this works fine.

But in Chrome & Safari, the setSelection call to (1, 2] fails -- the resulting selection is set to (2, 2], so we end up inserting the new character after the character we meant to replace.

I think the system is trying to outsmart us and isn't letting us select a combining character by itself... which is pretty inconvenient since that's explicitly what we're trying to do. :P

Make that [0, 0), [1, 2), and [2, 2). Set notation. bah!

Manually selecting [0, 1) ends up selecting [0, 2) so it's definitely trying to select whole groups of base+combining characters.

HTML 5 spec doesn't say anything about this, though it does explicitly say that invisible characters need to be selectable: http://dev.w3.org/html5/spec/association-of-controls-and-forms.html#textFieldSelection

Created attachment 8869
Provisional testing patch

It looks like if we select & replace the entire base+combining character sequence it should work. Narayam itself doesn't distinguish between base & combining characters and only deals with character sequences... but this hack seems to sorta maybe work?

This quick cruddy patch adds a list of all combining characters, and when finding the place to start out selection with firstDivergence() I added a hack to have it jump back to the last non-combining character.

Very lightly tested, and includes debug code etc.

Attached:

Created attachment 8870
quickhack.php -- extracts combining chars list from UnicodeData.txt

used to generate the list i dropped into the testing patch

Attached:

Yeah, looks like that's probably the same bug.

junu.pv+public wrote:

@brion,

I personally keep a standalone version of this tool that is independent of any external js library (usually jQuery as we do here). It implements its own text selection, replacement and caret positioning functions (Most of them are collected from the web :D ).

You can finds its code here: http://code.google.com/p/naaraayam/source/browse/#svn%2Ftrunk

Just now, I created a test page at: http://toolserver.org/~junaid/transliteration/test.html
utilising this standalone tool. You may test it.

The tool is working fine for the case 'ma me mi mo mu' = 'म मॆ मि मॊ मु'. So it is sure we can fix the issue.

It is true that webkit does not allow we to place the caret within a ligature(?) combination, even programatically. So even using my standalone tool we cannot change 'म' to 'कम' by simply placing caret before म and typing 'ka'. Because, when we type 'k' just before to existing म it will form a ligature 'क्म', then webkit will outsmart (as you said) by jumping caret at the end of 'क्म'. But, we can overcome this issue by overwriting entire word.

So current situation can be improved either by improving jquery.textSelection.js or using code from my standalone version of the tool.

Another reason to like Firefox.

I don't see code in your version to overwrite entire words explicitly; offhand it seems to have the exact same firstDivergence() func etc. How/where exactly do you find the word boundary?

junu.pv+public wrote:

I didn't say the code overwrite entire word :(

The magic is happening in replaceString(). The function splits the string in the field to three parts (not considering middle part), then recombines it it back using replacement (variable named newString) as middle part. Then adjust the caret to new position. So it does not fail in replacement.

(variable names in my code may confuse you, sorry)

Hmmm... that seems to work by replacing the *entire text* of the input area. Won't this be fairly inefficient for interactive typing on article-sized textareas?

A quick test copy-pasting about 170k worth of text from Sanskrit Wikipedia into the textarea on your test page confirms that it's vvvveeeerrrrryyyyy sssslllloooowwww to type in in Chrome 12. In Firefox 5.0.1 it's not as slow but causes lots of visual artifacts -- flicker as the text redraws, and the scrolling jumps around.

I suspect that's why this code was replaced with the $.textSelection() calls in the first place...

Actually the current code is still pretty slow in Chrome 12 with a large input text -- not sure if it's the replacement or if it's something else like the regexes.

Urgh, unfortunately the set-entire-huge-text-value seems to be the only portable API for replacing the selection contents. Nice. $.textSelection at least is handling the scrolling correctly.

junu.pv+public wrote:

That is a nice idea. Hope people will edit long article by sections.

If I remember correctly, scrolling problem only appears in Firefox on windows platforms. isn't it?

Created attachment 8953
correct the selectionStart postition problem with chrome.

Since we know the correct selectionStart and selectionEnd, we can check the values returned by the browser by taking a difference. If start-end is not same as what we passed and we get from browser, that means there is a problem. In that case the selectionStart value should be the value we know and not the browser passed value.

I modified the jquery.textSelection.js encapsulateSelection method to accept two more optional arguments- selectionStart and selectionEnd. And avoided calling setSelection method from ext.narayam.core.js. The encapsulateSelection method will check for those extra options and will set the selection. Then compare the range values returned by browser and what we passed as options. If there is a mismatch , correct the startPos and proceed.

A quick test showed that it is working. I hope I did not break anything.

Attached:

Created attachment 8954
Use encapsulateSelection with selection range

Pass extra options to encapsulateSelection and avoid calling setSelection explicitly.

Attached:

Well, looks like we need to correct the caret positions also, since it use the wrong selectionStart.

To Santhosh per IRC triage. Implement a Webkit only work-around for https://bugs.webkit.org/show_bug.cgi?id=66630.

junu.pv+public wrote:

Applied a local workaround within extension: r96415. And the bug can be considered fixed and closed.

Please reopen if needed.

Committed Santosh's core patch in r96579. Will review and apply the Narayam patch after I eat.

(In reply to comment #22)

Committed Santosh's core patch in r96579. Will review and apply the Narayam
patch after I eat.

Oops, that patch does not work well as I mentioned in comment 18. r96415 alone fixes the bug. Please revert the patch from me.

(In reply to comment #23)

(In reply to comment #22)

Committed Santosh's core patch in r96579. Will review and apply the Narayam
patch after I eat.

Oops, that patch does not work well as I mentioned in comment 18. r96415 alone
fixes the bug. Please revert the patch from me.

I didn't apply your Narayam patch yet, and adding the selectionStart and selectionEnd params to jquery.textSelection (which is all I did in r96579) doesn't break anything in and of itself.

r96415 fixes it, but in a hacky and slow way. What is the caret position problem exactly?

Background on the selection bug:

Unicode text may include sequences composed of a base character followed by one or more combining characters:

[lowercase latin letter "a"] [combining acute accent] [combining underdot]

These are distinct, separately addressable characters, but will render together as a unit. This is extremely common for Indic scripts, which apply multiple vowel-indicating & other diacritics onto a consonant base character.

In Webkit, if we set a textarea's selection start or end point in the middle of a combining sequence, the selection point is silently moved to the boundary of the sequence. We can either select the entire sequence, or none of it.

Thus if we thought we had selected the [combining acute accent] and [combining underdot] we may find that our selection is actually empty, as the start point is moved down to the end of the sequence.

When we go to replace the text using jquery.textSelection's insertion goodie, it substrings out the pieces using the new, modified selection -- which means we've trimmed out a different substring than the one we had specified coordinates for, and inserted a new substring that doesn't fit properly.

One way to work around this is to check the new caret position, so we can adjust our replacement string to fit the actual insertion point.

Another is to do the string replacement directly and not use the selection at all... except on IE, it looks like jquery.textSelection's replacement actually works by taking the entire textarea contents as a string, substringing some pieces, and replacing the entire string anyway -- so it's only going awry because the selection position gets moved from what we asked for.

(There is surprisingly no standard DOM-ish way to say "replace the selected text with this value, without changing the other megabyte of text in this textarea"!)

junu.pv+public wrote:

(In reply to comment #24)

r96415 fixes it, but in a hacky and slow way. What is the caret position
problem exactly?

Example:

Consider "म्", it is a combined character having two individuals. If we try to select range (1, 2), webkit will end up returning selection (2, 2).

So we will be unable to transform, for example, "म्" to "मु" by doing replacement in the range (1, 2) with "ु", result will be "म्ु".

(In reply to comment #25)

Another is to do the string replacement directly and not use the selection at
all... except on IE, it looks like jquery.textSelection's replacement actually
works by taking the entire textarea contents as a string, substringing some
pieces, and replacing the entire string anyway -- so it's only going awry
because the selection position gets moved from what we asked for.

You seem to have this backwards, but your wording is ambiguous. Substringing and replacing is what we do in *non*-IE browsers. IE actually has a way to replace just the selection.

(There is surprisingly no standard DOM-ish way to say "replace the selected
text with this value, without changing the other megabyte of text in this
textarea"!)

Yes, this is extremely annoying. There is a standard DOM way to replace selected *elements* (which of course doesn't work in IE but never mind that), but not to replace selections *entirely within* an element.

(In reply to comment #26)

(In reply to comment #24)

r96415 fixes it, but in a hacky and slow way. What is the caret position
problem exactly?

Example:

Consider "म्", it is a combined character having two individuals. If we try to
select range (1, 2), webkit will end up returning selection (2, 2).

So we will be unable to transform, for example, "म्" to "मु" by doing
replacement in the range (1, 2) with "ु", result will be "म्ु".

Yes, I understand why WebKit setting the selection wrong is a problem. What I don't understand is how the patches you attached fail to address the bug: what parts of the bug do they not address?

(In reply to comment #24)

I didn't apply your Narayam patch yet, and adding the selectionStart and
selectionEnd params to jquery.textSelection (which is all I did in r96579)
doesn't break anything in and of itself.

The patch tries to correct the selection by correcting startPos using the passed selectionStart when it detects a problem. It does not break anything. But it is insufficient to fix the bug. The patch works fine as long as we type at the end of current word. But when we try to edit some letters inside a word, it goes wrong. eg: type കചടതപ, try inserting പ്ര just after ച. expected is കചപ്രടതപ , but what we get is കചപ്ടരതപ . This is what I meant by wrong caret positioning in the comment 18. Sorry I did not explain it with example. So my patch to the jquery.textSelection.js and ext.narayama.core.js is incomplete and does not solve the problem completely. So please don't apply that patch.

r96415 fixes it, but in a hacky and slow way. What is the caret position
problem exactly?

For now we can go ahead with r96415, it is conditional for webkit browsers alone and not good solution. But to solve that either I need to complete my patch to jquery.textSelection.js(I can try that once I get some free time) or webkit bug need to be fixed.

I hope it is clear now :).

(In reply to comment #28)

The patch tries to correct the selection by correcting startPos using the
passed selectionStart when it detects a problem. It does not break anything.
But it is insufficient to fix the bug. The patch works fine as long as we type
at the end of current word. But when we try to edit some letters inside a word,
it goes wrong. eg: type കചടതപ, try inserting പ്ര just after ച. expected is
കചപ്രടതപ , but what we get is കചപ്ടരതപ . This is what I meant by wrong caret
positioning in the comment 18.

So what you're saying is that, if selectionStart is not passed, textSelection will have to obtain it from getCaretPosition(), and that value will be wrong too?

(In reply to comment #29)

So what you're saying is that, if selectionStart is not passed, textSelection
will have to obtain it from getCaretPosition(), and that value will be wrong
too?

Yes, it will be wrong. I will be wrong only when we are at a combining character.- VIRAMA in case of Indic text. It will jump to the end of glyph cluster for webkit browser.