Page MenuHomePhabricator

Enable Language Converter for Portuguese in a testwiki
Open, MediumPublic

Description

Some time ago it was proposed on Portuguese Wikipedia the use of LanguageConverter for dealing with differences among Portuguese variants. This was proposed on
https://pt.wikipedia.org/wiki/Wikipedia:Esplanada/propostas/Conversor_de_idiomas_para_as_variantes_do_portugu%C3%AAs_%2830mai2010%29
and discussed in its talk page.

We already have some examples of conversion tables which could be used as basis for building the official tables:
https://pt.wikipedia.org/wiki/Wikipedia:Esplanada/propostas/Conversor_de_idiomas_para_as_variantes_do_portugu%C3%AAs_%2830mai2010%29/MediaWiki:Conversiontable/pt-pt
and
https://pt.wikipedia.org/wiki/Wikipedia:Esplanada/propostas/Conversor_de_idiomas_para_as_variantes_do_portugu%C3%AAs_%2830mai2010%29/MediaWiki:Conversiontable/pt-br

There is also a first version of the configuration files which should be needed to test it:
https://pt.wikipedia.org/wiki/Wikipedia:Esplanada/propostas/Conversor_de_idiomas_para_as_variantes_do_portugu%C3%AAs_%2830mai2010%29/PHP

Could someone add the corresponding code to a test wiki where the Portuguese community could try it in order to decide if they want or not the feature enabled on some Portuguese project? (and also, if it would need any further enhancement before that)

There was a thread about this on wikitech, but it is pending an answer to my previous question:
http://lists.wikimedia.org/pipermail/wikitech-l/2010-August/048809.html

See Also:
T33015: Convert between English language variants in display of pages

Details

Reference
bz26121
TitleReferenceAuthorSource BranchDest Branch
Port to Python 3 and other improvementslegoktm/rawdog!1legoktmpy3main
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:21 PM
bzimport set Reference to bz26121.
bzimport added a subscriber: Unknown Object (MLST).

Any updates on this?
Is there anything we can do to accelerate the process?

Providing a unified diff of the code (And attaching it to this bug as a patch) will help significantly

Removing shell keyword. I can't imagine anyone is going to update the testwiki to use code not currently in svn.

Created attachment 7926
Generated from http://pt.wikipedia.org/wiki/Wikipedia:Esplanada/propostas/Conversor_de_idiomas_para_as_variantes_do_português_(30mai2010)/PHP

Just a note: I've also removed my old username from translation files MessagesPt_br.php and MessagesPt.php, as mentioned at
http://translatewiki.net/wiki/Thread:User_talk:Siebrand/Rename_user_accounts/reply_%285%29

attachment my.patch ignored as obsolete

(In reply to comment #2)

Providing a unified diff of the code (And attaching it to this bug as a patch)
will help significantly

Done. Could you please confirm if I did it right, since this is my first patch?

Thanks

At a quick glance, the patch looks correctly posted.

Is there anything our community can do to accelerate the process?

It would be great to have this conversion system in use somewhere so that we can start trying to migrate from our gadget ( [[pt:MediaWiki:Gadget-LanguageConverter.js]], which is an adaptation of a script used at Wikisource for modernization of old texts) to the PHP implementation.

(In reply to comment #7)

Is there anything our community can do to accelerate the process?

Sure:

  1. Get the patch applied. That means bugging someone with commit access.
  2. Add the shell keyword and show community consensus for getting this deployed on the requested wiki.

If you've done that and you're still not getting anywhere post another query on this bug.

Unassigning from self. Sorry I'm not familiar with the languageConverter code at all, and at the moment don't have the time to look through it in great depth. Thus I don't feel comfortable committing the patch. (I'll probably have more time once exams are done and all that, I'll try to take another look at this bug after that if it hasn't been resolved yet. Hopefully it will already be resolved by then).

Also sorry for not responding earlier (I think some of my bugmail got lost).

This part doesn't appear to have a place in the patch: /* Should we use translated names for the flags, as in "Sr" language?

Missing file PtConversion.php.

Created attachment 8410
Updated patch

(In reply to comment #10)

This part doesn't appear to have a place in the patch: /* Should we use
translated names for the flags, as in "Sr" language?

Missing file PtConversion.php.

Fixed in the updated patch.
Could you commit it for us?

Attached:

PhiLiP.NPC wrote:

As I had mentioned before, I'm afraid the current LanguageConverter would disrupt the content written in Portuguese.

As we know, in Chinese we don't use space to separate words. Since the LanguageConverter was originally designed for languages like Chinese, it separate the text char by char, but not word by word. It fits for Serbian because they just need to convert charset between Latin and Cyrillic, also char by char.

In order to accomplish the Portuguese version LC, we need to implement a new feature to LC, which convert text word by word separated by space or punctuations. I don't think implement the feature in pure PHP is a good idea, perhaps we need an C extension to provide the function whose performance can be comparable to PHP's built-in strtr.

sumanah wrote:

If you want to set up a test wiki, maybe you could use a labs instance? https://www.mediawiki.org/wiki/WMF_Projects/Wikimedia_Labs

(In reply to comment #12)

In order to accomplish the Portuguese version LC, we need to implement a new
feature to LC, which convert text word by word separated by space or
punctuations. I don't think implement the feature in pure PHP is a good idea,
perhaps we need an C extension to provide the function whose performance can
be
comparable to PHP's built-in strtr.

I don't think strtr is enough for word-by-word conversion...

millosh wrote:

Thanks to Liangent for noting this to the bug #15161.

Presently, this is not possible via standard conversion engine methods, as it assumes conversion inside of one script.

To have this working, we need either different method or generalized engine, which is now possible, thanks to the new Parser.

Will talk about it with Parsoid team during Amsterdam Hackathon.

(In reply to comment #11 by Helder)

Fixed in the updated patch.
Could you commit it for us?

Helder: Is this still wanted? If so, could you put it into Gerrit for review?

As far as I know, yes (and the gadget is still in use).
But I won't have the time to work on the patch again any time soon.

PS: There was also a related thread at
https://pt.wikipedia.org/wiki/WP:Esplanada/propostas/Uso_do_portugu%C3%AAs_de_Portugal,_pt-PT_%284mar2012%29

If its necessary, I can prepare the gerrit patch for that. should I ?

He7d3r set Security to None.

Now that we have Patch Demo, I suppose it may be significantly easier to set up a test instance to validate these changes. Is this something that could be done with the code as it is now, or does there need to be further work in the patch, e.g. rebasing it?

Depending on the amount of work needed, I might be able to work on this soon-ish, or alternatively I could tackle it as project for this year's Wikimedia Hackathon.