Page MenuHomePhabricator

standardize_notes.py encoding
Closed, ResolvedPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/feature-requests/327/
Reported by: n-fran
Created on: 2013-01-25 14:38:46
Subject: standardize_notes.py encoding
Original description:
If I want to add to the script text of russian letters, is this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range\(128\)

To avoid this error, I think, it is necessary to register in the code bot these or any of the other lines:

\# -\*- coding: utf-8 -\*-
import sys
reload\(sys\)
sys.setdefaultencoding\('utf-8'\)

And my bot started to function. Thanks.


Version: compat-(1.0)
Severity: enhancement
See Also:
https://sourceforge.net/p/pywikipediabot/feature-requests/327

Details

Reference
bz55018

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:14 AM
bzimport set Reference to bz55018.
bzimport added a subscriber: Unknown Object (????).

I cannot follow what you mean with "add to the script". Do you want to modify the script or enter russian characters on the command line?

What is the complete error you got.

Did you set your transliteration\_target and console\_encoding in your user-config.py

reload\(sys\) after import sys does not matter since it just reloads the same module

Sorry, my knowledge of the English language, particularly on the part of the technical terms, it may be bad. I meant that I was putting in Russian characters in the file standardize\_notes.py . For example, I changed the '\n== Notes ==\n' to '\n== Примечания ==\n' \(line 987\), and then this error appeared:

http://pastebin.ru/yzh2CdvX

When I added in the beginning of the text file, which is pointed out above, the problem disappeared. Thank you.

In my user-config.py there are lines

console\_encoding = 'cp1251'
transliteration\_target = console\_encoding

but the problems with the coding still a lot. Thank you.

While using python 2.X there are two kind of stings:
ASCII strings are noted like "This is a ascii string"
unicode strings are noted like u"This is a unicode string"

Just write a u before that sting in line 987 \(and remove that reload/encoding stuff\):
new\_text = new\_text + u'\n== Notes ==\n' \# set to standard name

But ok, this part should be localized

valhallasw claimed this task.

standardize_notes.py has the # -*- coding: utf-8 -*- header, so this should be OK now.