Page MenuHomePhabricator

Typing ओं is not possible in hindi transliteration
Open, MediumPublic

Description

Typing ओं is currently not possible in hindi transliteration due to a rule-conflict.

Typing ओं requires input of oM. However, due to the rule

['ओM', '', 'ॐ']

This combination is reserved for ॐ, making writing ओं impossible.

This is an unfortunate side-effect, but needs to be resolved somehow, and quickly.

I say quickly because several words in hindi, when used in plural form in a sentence, end in ओं . Examples: भाषाओं, चिताओं, कलाओं, घटाओं etc.

I don't know why this bug wasn't noticed earlier, but it needs to be resolved somehow ASAP.


Version: unspecified
Severity: normal

Details

Reference
bz38238

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:00 AM
bzimport set Reference to bz38238.
bzimport added a subscriber: Unknown Object (MLST).

Do you have any key combination to suggest?

A few questions:

  1. Is there any other transliteration standard in which this is solved already?
  1. What is used more frequently: the OM (ॐ) character or the simple syllable ओं?
  1. Shantanoo - is this fix needed in Marathi, too?

ओं is used much more frequently than ॐ

One possible solution is to change the input for ॐ to auM. This is slightly harder than oM but has been used in other transliteration schemes. It was used in ITRANS as AUM. [1]

[1] http://en.wikipedia.org/wiki/ITRANS

A patch changing ॐ to auM and making oM type ओं was submitted here:
https://gerrit.wikimedia.org/r/#/c/15846/

It can be tested here:
http://sandbox.translatewiki.net/wiki/Main_Page?uselang=hi

This patch makes a significant change in the current behavior, so before deployment some community consensus should be demonstrated, for example in Village pumps of Hindi projects.

Also, as I said earlier, this change may be needed in Marathi, too. It can be deployed to the Hindi projects before the change for Marathi is made, however.

(In reply to comment #2)

A few questions:

  1. Is there any other transliteration standard in which this is solved already?
  1. What is used more frequently: the OM (ॐ) character or the simple syllable

ओं?

  1. Shantanoo - is this fix needed in Marathi, too?

Yes. It should be also fixed for Marathi. e.g. 'Onkar' = 'ओंकार'
(https://www.google.co.in/search?q=onkar+marathi)

Also, when the name is from another language, one needs 'ओं'. E.g. 'Ontario, Canada'.

I added support for Marathi, too.

(In reply to comment #4)

This patch makes a significant change in the current behavior, so before
deployment some community consensus should be demonstrated, for example in
Village pumps of Hindi projects.

Good idea about discussion for consensus. Since this basically makes writing औं impossible instead of ओं, its probably better to ask the community (to check in case something is written with औं too). Have raised this question and asked for consensus at the village pump at hi-wp: http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2#.E0.A4.A8.E0.A4.BE.E0.A4.B0.E0.A4.BE.E0.A4.AF.E0.A4.AE_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A5.90_.E0.A4.95.E0.A5.87_.E0.A4.B2.E0.A4.BF.E0.A4.AF.E0.A5.87_.E0.A4.87.E0.A4.A8.E0.A4.AA.E0.A5.81.E0.A4.9F_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.A8

(In reply to comment #7)

(In reply to comment #4)

This patch makes a significant change in the current behavior, so before
deployment some community consensus should be demonstrated, for example in
Village pumps of Hindi projects.

Good idea about discussion for consensus. Since this basically makes writing औं
impossible instead of ओं, its probably better to ask the community (to check in
case something is written with औं too). Have raised this question and asked for
consensus at the village pump at hi-wp:
http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2#.E0.A4.A8.E0.A4.BE.E0.A4.B0.E0.A4.BE.E0.A4.AF.E0.A4.AE_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A5.90_.E0.A4.95.E0.A5.87_.E0.A4.B2.E0.A4.BF.E0.A4.AF.E0.A5.87_.E0.A4.87.E0.A4.A8.E0.A4.AA.E0.A5.81.E0.A4.9F_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.A8

'Ounce' unit of weight is written as 'औंस'. Maybe, that support is also required.

Hmm. If "auM" produces ॐ, then औंस can't be written.

Maybe make it "AUM"?

Does ॐ always appear as a separate word?

(In reply to comment #9)

Hmm. If "auM" produces ॐ, then औंस can't be written.

Maybe make it "AUM"?

+1 for AUM.

Does ॐ always appear as a separate word?

ॐकार, ॐकारेश्वर are valid words.

  1. Just to make sure: ॐकार and ओंकार are both valid?
  1. I can easily make auMsa->औंस and AUM->ॐ work, but will this finally cover all the cases? Is anybody familiar with a comprehensive standard on which I would be able to base our mapping? Is ITRANS comprehensive, for example?

(In reply to comment #11)

  1. Just to make sure: ॐकार and ओंकार are both valid?

Yes. I know 2 different people 'Omkar' and 'Onkar' :). Either way, we still have 'Ontario' which is ओंटारीओ(?).

(In reply to comment #8)

(In reply to comment #7)

(In reply to comment #4)

This patch makes a significant change in the current behavior, so before
deployment some community consensus should be demonstrated, for example in
Village pumps of Hindi projects.

Good idea about discussion for consensus. Since this basically makes writing औं
impossible instead of ओं, its probably better to ask the community (to check in
case something is written with औं too). Have raised this question and asked for
consensus at the village pump at hi-wp:
http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2#.E0.A4.A8.E0.A4.BE.E0.A4.B0.E0.A4.BE.E0.A4.AF.E0.A4.AE_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A5.90_.E0.A4.95.E0.A5.87_.E0.A4.B2.E0.A4.BF.E0.A4.AF.E0.A5.87_.E0.A4.87.E0.A4.A8.E0.A4.AA.E0.A5.81.E0.A4.9F_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.A8

'Ounce' unit of weight is written as 'औंस'. Maybe, that support is also
required.

I missed one important example for औं:
http://mr.wikipedia.org/wiki/%E0%A4%94%E0%A4%82%E0%A4%A7

(In reply to comment #11)

  1. Just to make sure: ॐकार and ओंकार are both valid?
  1. I can easily make auMsa->औंस and AUM->ॐ work, but will this finally cover

all the cases? Is anybody familiar with a comprehensive standard on which I
would be able to base our mapping? Is ITRANS comprehensive, for example?

As per the discussion on hi-wp village pump, the following words use औं:
औंधा
औंगारी सूर्य मंदिर [1]
औंराडीह गाँव, गुरुआ (गया) [2]
औंकोलोजी (oncology)

Clearly, औं is also used in hindi, and the submitted patch[3] won't solve the problem.

Also, AUM won't solve the problem either, since that is for आऊं which in itself is a popular (mis)spelling of the word आऊँ and can be used to write the word आऊंगा (correct spelling आऊँगा).

Although I want to have an easy input method for inputting ॐ, I am wondering how much it is actually used in regular text, and if a direct input for it is needed (or it can be handled in the editing tools shown below the edit-window). I'll be asking the same at hi-wp discussion [4]. Shantanoo, is it used regularly in Marathi or rarely?

[1] http://hi.wikipedia.org/wiki/%E0%A4%94%E0%A4%82%E0%A4%97%E0%A4%BE%E0%A4%B0%E0%A5%80_%E0%A4%B8%E0%A5%82%E0%A4%B0%E0%A5%8D%E0%A4%AF_%E0%A4%AE%E0%A4%82%E0%A4%A6%E0%A4%BF%E0%A4%B0
[2] http://hi.wikipedia.org/wiki/%E0%A4%94%E0%A4%82%E0%A4%B0%E0%A4%BE%E0%A4%A1%E0%A5%80%E0%A4%B9_%E0%A4%97%E0%A4%BE%E0%A4%81%E0%A4%B5,_%E0%A4%97%E0%A5%81%E0%A4%B0%E0%A5%81%E0%A4%86_(%E0%A4%97%E0%A4%AF%E0%A4%BE)
[3] https://gerrit.wikimedia.org/r/#/c/15846/
[4] http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%9A%E0%A5%8C%E0%A4%AA%E0%A4%BE%E0%A4%B2#.E0.A4.A8.E0.A4.BE.E0.A4.B0.E0.A4.BE.E0.A4.AF.E0.A4.AE_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A5.90_.E0.A4.95.E0.A5.87_.E0.A4.B2.E0.A4.BF.E0.A4.AF.E0.A5.87_.E0.A4.87.E0.A4.A8.E0.A4.AA.E0.A5.81.E0.A4.9F_.E0.A4.AE.E0.A5.87.E0.A4.82_.E0.A4.AA.E0.A4.B0.E0.A4.BF.E0.A4.B5.E0.A4.B0.E0.A5.8D.E0.A4.A4.E0.A4.A8

(In reply to comment #14)

Although I want to have an easy input method for inputting ॐ, I am wondering
how much it is actually used in regular text, and if a direct input for it is
needed (or it can be handled in the editing tools shown below the edit-window).
I'll be asking the same at hi-wp discussion [4]. Shantanoo, is it used
regularly in Marathi or rarely?

It is used rarely.
IMO, putting in editing tools should be fine. But, I am not sure regarding how one can use it for the 'search box'. There is not edit toolbox for entering text in search box.

How about pre and post fixing '_' (or any other combination with pre and/or post fixing) to have rarely used characters (but important)?

e.g. _aum_ or _#aum or __aum or aum## or (aum) or any other combination.

or as suggested on hiwiki,
'_M' will be 'ं' and should not be combined with other sequence. (Instead of '_' some other character may be used. _italics_ )
This can be extended to 'अे' (a_e), 'अै' (a_ai).

(In reply to comment #15)

It is used rarely.
IMO, putting in editing tools should be fine. But, I am not sure regarding how
one can use it for the 'search box'. There is not edit toolbox for entering
text in search box.

How about pre and post fixing '_' (or any other combination with pre and/or
post fixing) to have rarely used characters (but important)?

e.g. _aum_ or _#aum or __aum or aum## or (aum) or any other combination.

or as suggested on hiwiki,
'_M' will be 'ं' and should not be combined with other sequence. (Instead of
'_' some other character may be used. _italics_ )
This can be extended to 'अे' (a_e), 'अै' (a_ai).

I think if we do have to go with the _ idea, it should probably be au_M for ॐ, and auM should remain औं as it currently is (coz ॐ is probably lesser used than औं). Also, since 'ं' is used much more than ॐ, its best to keep its input as stable as possible.

Also (unrelated to this bug), the idea suggested on hi-wp was for a breaker key, typing which would ensure that joining rules are not applied on the next keystroke (i.e the next keystroke is rendered independent of combining rules, using only the basic input rules). So if = is the breaker key, writing a=u would output अउ instead of औ. This could probably also be made to work like: if the breaker key is pressed once, the first matching rule is skipped and the next one applied. If it is pressed twice, the first two matching rules are skipped and the third rule applied and so on.

(In reply to comment #16)

I think if we do have to go with the _ idea, it should probably be au_M for ॐ,
and auM should remain औं as it currently is (coz ॐ is probably lesser used than
औं). Also, since 'ं' is used much more than ॐ, its best to keep its input as
stable as possible.

Still think that instead of au_M, _auM_ or _aum_ is better.

Another way is to use meta/alt key combination. e.g. a + u + Meta/alt-m

Also (unrelated to this bug), the idea suggested on hi-wp was for a breaker
key, typing which would ensure that joining rules are not applied on the next
keystroke (i.e the next keystroke is rendered independent of combining rules,
using only the basic input rules). So if = is the breaker key, writing a=u
would output अउ instead of औ. This could probably also be made to work like: if
the breaker key is pressed once, the first matching rule is skipped and the
next one applied. If it is pressed twice, the first two matching rules are
skipped and the third rule applied and so on.

IMO, this may make the logic complex for the end user to remember the sequence for typing.

(In reply to comment #17)

Still think that instead of au_M, _auM_ or _aum_ is better.

Another way is to use meta/alt key combination. e.g. a + u + Meta/alt-m

_auM_ seems fine to me.

IMO, this may make the logic complex for the end user to remember the sequence
for typing.

The end-user has no need to bother with the logic, but yes, having a series of underscores differentiate between inputs would probably be confusing. It could still be used for just two (like auM and au_M), but as I said above, I don't mind _auM_ either.

Have asked at hi-wp if anyone objects to either _aum_ or _auM_ . If noone does we can go through with either.

There've been no objections. Feel free to go ahead.

The patch that was merged doesn't include all the requests from this discussion. I'll submit a new patch soon.

(In reply to comment #21)

The patch that was merged doesn't include all the requests from this
discussion. I'll submit a new patch soon.

Amir: Did you have time for this?

[Assignee was removed, hence also resetting ASSIGNED status]

Does this need to be reported upstream in the github bug tracker of jquery.ime ?

(In reply to comment #25)

Does this need to be reported upstream in the github bug tracker of
jquery.ime?

That would be preferable. Then this issue can be marked "upstream" with reference to the upstream report.

Why so? There are so many wikimedia codes are managed at GitHub, why this one only goes 'upstream' there? Did jquery.ime happened to become no part of wwikimedia projects?

Most Wikimedia projects on GitHub are just mirrored there from the canonical code repositories located at https://gerrit.wikimedia.org/ , however query.ime is intended to be also used by other projects and not Wikimedia-only, and to get non-WMF developers, having GitHub accounts is more common than Wikimedia Gerrit accounts. But this is a bit offtopic here... :)