Page MenuHomePhabricator

Set locale if system uses wrong default
Closed, InvalidPublic

Description

Author: dr.trigon

Description:
The grid engine on tool labs has another default locale setting than the console.

Grid engine:

import locale
print locale.localeconv()

{'mon_decimal_point': '', 'int_frac_digits': 127, 'p_sep_by_space': 127, 'frac_digits': 127, 'thousands_sep': '', 'n_sign_posn': 127, 'decimal_point': '.', 'int_curr_symbol': '', 'n_cs_precedes': 127, 'p_sign_posn': 127, 'mon_thousands_sep': '', 'negative_sign': '', 'currency_symbol': '', 'n_sep_by_space': 127, 'mon_grouping': [], 'p_cs_precedes': 127, 'positive_sign': '', 'grouping': []}

print locale.getdefaultlocale()

(None, None)

print locale.getlocale()

(None, None)

print locale.getpreferredencoding()

ANSI_X3.4-1968

Console:

import locale
print locale.localeconv()

{'mon_decimal_point': '', 'int_frac_digits': 127, 'p_sep_by_space': 127, 'frac_digits': 127, 'thousands_sep': '', 'n_sign_posn': 127, 'decimal_point': '.', 'int_curr_symbol': '', 'n_cs_precedes': 127, 'p_sign_posn': 127, 'mon_thousands_sep': '', 'negative_sign': '', 'currency_symbol': '', 'n_sep_by_space': 127, 'mon_grouping': [], 'p_cs_precedes': 127, 'positive_sign': '', 'grouping': []}

print locale.getdefaultlocale()

('en_US', 'UTF-8')

print locale.getlocale()

(None, None)

print locale.getpreferredencoding()

UTF-8

The one from console works with pywikibot, the other one not, see Bug 58181. Essentially the issue is that the locale on the grid engine is not set properly. But it is not important where this error comes from, the bots must not crash in such situations.

I propose to check 'locale.getdefaultlocale()' on startup and compare it to 'config.textfile_encoding' (may be also 'config.console_encoding') IFF they mismatch, the encoding has to be set according to config in order to use the correct one.


Version: compat-(1.0)
Severity: major

Details

Reference
bz58872

Related Objects

StatusSubtypeAssignedTask
Resolvedcoren
InvalidNone

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:20 AM
bzimport set Reference to bz58872.
bzimport added a subscriber: Unknown Object (????).

This is not a pywikibot issue, but an issue with your code - as I have explained before. Filenames should *never* be unicode strings -- always byte strings. It's just luck (or rather: a combination of factors that happens to be just right) that it works in the shell.

In my case:

locale.getdefaultlocale()

('pl_PL', 'UTF8')

locale.getpreferredencoding()

'UTF-8'

why is en_US better?

If I create files automatically for example out of article names, I prefer to .encode("utf-8") unicode strings manually without resorting to locale module

dr.trigon wrote:

(In reply to comment #1)

This is not a pywikibot issue, but an issue with your code - as I have
explained before. Filenames should *never* be unicode strings -- always byte
strings. It's just luck (or rather: a combination of factors that happens to
be
just right) that it works in the shell.

As explained I am always very confused by this unicode vs. bytecode stuff - I know the details - I am just mixing it up all the time... so please be patient with me.

I did correct all those errors and issues within my scripts once and thus was not aware (and even more confused) that there are still bugs.

Since you make the impression to be "the expert" on such string issues I am desperately needed your help and might will need it again in future.

I was e.g. enormously confused by the fact that unicode (strings?) do also need an internal representation in python and I always assumed this has to be UTF (8, 16 or 32) thus I was mixing UTF and unicode conceptually. Now I learned about UCS [1] and should have sorted it out:

-(byte)string (ASCII, UTF or else)
-unicode (internally UCS)

encode: unicode -> bytestring
decode: bytestring -> unicode

[1] http://www.cmlenz.net/archives/2008/07/the-truth-about-unicode-in-python

Please correct me if I did say something wrong (again ;) ...

btw.: using 'UTF' locale on tool labs grid engine would not be the correct solution, but it would have helped and not do any harm anyway so I don't see the with that problem there... but this is not an issue anymore... ;)

Greetings