Page MenuHomePhabricator

Install Extension:Transliterator on fr, pl and en.wiktionary
Closed, DeclinedPublic

Description

Author: conrad.irwin

Description:
Please install the Transliterator extension (in SVN and documented at http://www.mediawiki.org/wiki/Extension:Transliterator) on the English Wiktionary. The consensus for this is at http://en.wiktionary.org/w/index.php?oldid=7110737.


Version: unspecified
Severity: enhancement

Details

Reference
bz20246

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:49 PM
bzimport set Reference to bz20246.
bzimport added a subscriber: Unknown Object (MLST).

Assigning to myself for review.

conrad.irwin wrote:

Is there anything further I can do to accelerate progress here?

conrad.irwin wrote:

status update: This has been glanced at by werdna (a while back), Roan and Alphos (yesterday) and a few minor issues have been resolved.

conrad.irwin wrote:

Ping, again...

As murmerings last time it was looked at indicated people would be more comfortable with it using strtr() internally, this is what it now does.

Some feedback would be wonderful.

Um, is anybody working on this? The communities have been waiting for almost eleven months for this to be installed. Will this be done anytime soon?

Seems not. It's not been reviewed, so it has no chance of being deployed until it has

Is anyone working on reviewing it, or planning to review it at some point?

internoob2010 wrote:

This bug is celebrating its first birthday today. Someone please resolve it!

Cannot go to keyword shell before it has been reviewed.

What is it for? I have read this bug report, the extension page and the Wiktionary vote page, and have found no answers.

msh210+wmfbugzilla wrote:

(In reply to comment #12)

What is it for?

Wiktionaries transliterate words. In particular, English Wiktionary (happens to be the one I'm most familiar with and) transliterates into English any words written in a non-Latin script, for the benefit of its (anglophone) readers. This is now done manually, but often can be done automatically according to set rules (depending on the language being transliterated. The "maps" described in the extension description are intended to be one per language, generally). For example, if we were transliterating Spanish (which we don't, but it's an easy example to give), we might have a map that says (pseudocode)
ll maps to y
j maps to h
á maps to a
etc. This both will relieve people from having to transliterate manually and will increase the number of entries that have transliterations.

michael wrote:

And, perhaps most importantly, should eliminate errors and the use of transliteration schemes that are non-standard, inconsistent, and illogical.

conrad.irwin wrote:

Transliterations are used particularly in translations tables where the alphabet of the destination language is not Latin (see http://en.wiktionary.org/wiki/Uzbekistan#Translations), and throughout entries in non-Latin alphabets, see for example http://en.wiktionary.org/wiki/%D5%88%D6%82%D5%A6%D5%A2%D5%A5%D5%AF%D5%BD%D5%BF%D5%A1%D5%B6.

For further context:

http://wikt.jelzo.com/wiki/Test:el shows a comparison of Greek words and their automatic transliteration versus the transliterations that existed on Wiktionary at some point during 2010.

http://en.wiktionary.org/wiki/User_talk:Conrad.Irwin/Transliterator.php contains some desired transliteration maps for various languages, along with which standards they correspond to (where applicable).

Are these tables in any way similar to the tables used for transliteration by ICU? I wrote a PHP extension a couple of years ago which provides an interface to ICU's transliteration functions to PHP:

http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/transliterate/

The original idea was to use it to display transliterations of foreign names that come to a wiki via CentralAuth, but I never developed it any further. Would it be useful to provide access to these ICU transliterators via the extension you have developed?

conrad.irwin wrote:

Yes, I imagine that would be very useful, though not absolutely necessary.

A few revisions on this were marked "deferred" when they should be reviewed or at least marked "old." I've gone and changed the status to "new" and will send out an email asking people to review it. See http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/status/new&path=%2Ftrunk%2Fextensions%2FTransliterator

Since this is a new extension and nobody remembers what it was, it's not worth worrying about individual old revisions in code review -- the extension should simply be reviewed as a whole.

Overall the code looks pretty nice and has comments and stuff which makes me happy. ;)

Couple things that stick out to me:

  • static functions as hook callbacks are intermixed with a non-static singleton class which feels a bit odd to me; it's hard to tell what's what sometimes.
  • not sure what's up with the mPages, mMaps member variables; what's the lifetime of the ExtTransliterator object? If batch jobs are running, will this in-process cache get updated by actions in another process? Or will it be discarded between jobs within the same process?

It looks like a new object is created at ParserFirstCallInit time... offhand I'm not sure whether a new parser will get created for job runs or not. Probably won't break anything in practice, but it's worth looking at for -- in-process caches are always dangerous in a multi-node environment.

  • Several functions accept reference parameters, like isMapPage( &$title ) which don't appear that they should. Unless it's possible to *replace an entire object parameter with another object* or *alter a scalar value or array contents*, references should not be used. If these are to match hook definitions, in many cases the hooks probably need fixing upstream, and when they are fixed these functions will fail on PHP 5.3 unless they are also fixed to remove the references; I'd also recommend naming the functions with an 'on' prefix and the actual name of the calling hook if possible, to make it clearer what's going on.

In general I'd also recommend looking out for what happens when you're given a huge amount of input; the default NFD decomposition implementation is not very efficient, and it might be more likely to keel over and die if, say, you accidentally don't close the tag and try to transliterate a very large page full of non-English text.

This bug is now two years old. Is there any chance it will be resolved any time soon?

(In reply to comment #21)

This bug is now two years old. Is there any chance it will be resolved any time
soon?

If you want this deployed, then I suggest you find someone to work on the issues Brion has raised. Otherwise, this is likely to sit longer.

sumanah wrote:

Conrad, could you respond to Brion's thoughts? Also, I encourage you to use developer access to move this to Git:

https://www.mediawiki.org/wiki/Developer_access

https://www.mediawiki.org/wiki/Git/Conversion/Extensions_queue

Conrad / Beau: Could you comment on comment 20, please?

Also, for general information see https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment for information on what is needed to get an extension reviewed before potentially deploying it on a wikisite.

beau wrote:

I don't think we need to consider pl.wiktionary anymore. We have implemented transliteration using lua: https://pl.wiktionary.org/wiki/Module:transliterator

Looks to be stalled for a long time. I'm closing this request and people can reopen if needed. There are work arounds available (see comment 26).

wmf.amgine3691 wrote:

Ignored to death, resulting in a community hack which has to be recreated/reimplemented on every wiktionary. Brilliant wontfix.

But thoroughly predicted.

I agree handling of this ticket wasn't the best - comment 20 and comment 22 explain why this was and is stalled. If this is still wanted feel free to reopen, but it will still need somebody to fix the code first.

wmf.amgine3691 wrote:

The extension is still needed. Without it, every wiktionary which transliterates into the local script will do so manually, or using local template/module systems. It is still requested on at least English and French wiktionaries. I believe EL also wants it, but has assumed it will never happen. (There are already transliteration rules for EL in this extension. Ariel helped Conrad in the development of the extension.)

I do not know of anyone associated with wiktionary (and therefore familiar with the issues) who could fix the code *and* is willing to do hoop dancing for WMF devs (and therefore able to get through the iterative processes necessary to get a deity to merge it.)

Amgine's frustration is deserved/understandable, but let's see what small sacrifices we can do to the deities in question to help them help us (in Italian we say: aiutati che il ciel t'aiuta). I've checked the last version of the review queue checklist: https://www.mediawiki.org/w/index.php?title=Review_queue&oldid=771682

  1. ok
  2. ok
  3. bug 53393
  4. ok
  5. screencast missing
  6. done now with +design keyword
  7. ok (review done by Brion, comment 20)
  8. ok³

(In reply to comment #31)

I've checked the last version of the review queue checklist:
https://www.mediawiki.org/w/index.php?title=Review_queue&oldid=771682

  1. ok
  2. ok
  3. bug 53393
  4. ok
  5. screencast missing
  6. done now with +design keyword

For the design review, I would recommend emailing the design mailing list: https://lists.wikimedia.org/mailman/listinfo/design

  1. ok (review done by Brion, comment 20)

Have the changes been made that Brion suggested? There may need to be more back and forth here.

  1. ok³

Links for En and Fr:
https://en.wiktionary.org/w/index.php?oldid=7110737

https://fr.wiktionary.org/wiki/Sp%C3%A9cial:Filtre_antiabus#Mod.C3.A8le_pour_une_section_Translitt.C3.A9rations (I can't view this, apparently)

(In reply to comment #32)

For the design review, I would recommend emailing the design mailing list:
https://lists.wikimedia.org/mailman/listinfo/design

Presumably best after a screenshot or something is produced? It's not clear to me what there is to review, this seems to be just a parser function.

  1. ok (review done by Brion, comment 20)

Have the changes been made that Brion suggested? There may need to be more
back
and forth here.

Well, "the code looks pretty nice" looks good enough a review. Missing pieces should be filed as separate bugs.

Consensus from 3 years and half, well... "Low enhancement". Seriously, should I laught or cry ?

wmf.amgine3691 wrote:

(In reply to comment #35)

Consensus from 3 years and half, well... "Low enhancement". Seriously,
should I
laught or cry ?

4+ years total. We've gone through the full gamut: excitement, begging, demanding, giggling, shouting, crying...

You could go through previous comments and check what has been requested and has not been done by anybody yet who is interested to get this fixed.
That's more productive than shouting and crying.

To start with comment 32: Please provide a link to the design review request on the design mailing list at http://lists.wikimedia.org/pipermail/design/ ? Adding it here helps to keep the process/progress transparent.

(In reply to comment #34)

  1. ok (review done by Brion, comment 20)

Have the changes been made that Brion suggested? There may need to be more
back and forth here.

Well, "the code looks pretty nice" looks good enough a review. Missing pieces
should be filed as separate bugs.

Nobody has done this so it is obviously a very good next step.
I currently see zero Transliterator reports in Bugzilla (open or closed).

Sidenote: Only https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FTransliterator/HEAD/Transliterator.php mentions a license (GPL 2.0) for this extension, other files do not. Mention as https://www.mediawiki.org/wiki/Extension:Transliterator is in the Category:Extensions with unknown license.

wmf.amgine3691 wrote:

I believe CIrwin was the last semi-active developer with the Wiktionary project. Needless to say his experiences are the exemplar being passed down through generations of wiktionary contributors.

There's no one from the project who *can* do what you suggest, Andre.

In that case I don't see a good way forward here except for somebody picking up (at least temporary) maintainership of Transliterator to implement the "suggestions".
In general, deploying unmaintained code sounds like a bad idea (though in this case it's not "much" code so it might be less of a problem).

Pppery subscribed.

Extension is archived