Page MenuHomePhabricator

en.planet stopped updating
Closed, ResolvedPublic

Description

En planet hasnt updated since march 2


Version: wmf-deployment
Severity: normal

Details

Reference
bz45806

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:24 AM
bzimport added a project: Wikimedia-Planet.
bzimport set Reference to bz45806.
bzimport added a subscriber: Unknown Object (MLST).

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128)

hrm... Django/utf-8/something along these lines:

http://stackoverflow.com/questions/2513027/encoding-gives-ascii-codec-cant-encode-character-ordinal-not-in-range128

i got it to update once, so now it's March 07, but after that i ran into the same issue again. keep open

Sounds kind of like bug 44569 but not quite

could be fixed (for now) by deleting all content from the cache directory and re-running:

root@zirconium:/var/cache/planet/en/

rm *

sudo -u planet /usr/bin/planet -v /usr/share/planet-venus/wikimedia/en/config.ini

this also fixed the atom link http://en.planet.wikimedia.org/atom.xml

Looks like the issue is back:
Last updated:March 10, 2013 09:02 PM

dzahn reran Comment #4 and it's still stuck:

13 05:55:55 < jeremyb_> mutante: did planet finish?
13 05:56:50 < mutante> unfortunately, no. UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128)

INFO:planet.runner:Loading cached data
Traceback (most recent call last):

File "/usr/bin/planet", line 138, in <module>
  splice.apply(doc.toxml('utf-8'))
File "/usr/lib/pymodules/python2.7/planet/splice.py", line 118, in apply
  output_file = shell.run(template_file, doc)
File "/usr/lib/pymodules/python2.7/planet/shell/__init__.py", line 66, in run
  module.run(template_resolved, doc, output_file, options)
File "/usr/lib/pymodules/python2.7/planet/shell/tmpl.py", line 254, in run
  for key,value in template_info(doc).items():
File "/usr/lib/pymodules/python2.7/planet/shell/tmpl.py", line 193, in template_info
  data=feedparser.parse(source)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 3525, in parse
  feedparser.feed(data)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 1662, in feed
  sgmllib.SGMLParser.feed(self, data)
File "/usr/lib/python2.7/sgmllib.py", line 104, in feed
  self.goahead(0)
File "/usr/lib/python2.7/sgmllib.py", line 143, in goahead
  k = self.parse_endtag(i)
File "/usr/lib/python2.7/sgmllib.py", line 320, in parse_endtag
  self.finish_endtag(tag)
File "/usr/lib/python2.7/sgmllib.py", line 360, in finish_endtag
  self.unknown_endtag(tag)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 569, in unknown_endtag
  method()
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 1512, in _end_content
  value = self.popContent('content')
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 849, in popContent
  value = self.pop(tag)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 764, in pop
  mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 2219, in _parseMicroformats
  p.vcard = p.findVCards(p.document)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 2161, in findVCards
  sVCards += '\n'.join(arLines) + '\n'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 32: ordinal not in range(128)

You should treat all strings in python as unicode not ASCII,

sVCards += '\n'.join(arLines) + '\n'
becomes
sVCards += u'\n'.join(arLines) + u'\n'

(In reply to comment #8)

You should treat all strings in python as unicode not ASCII,

That's surely not the root cause, right?

It is a matter of a UTF-8 string coming in and being treated as ASCII

thank you very much for the reply, but it still does not work when i changed line 2161 in feedparser.py
...

sVCards += u'\n'.join(arLines) + u'\n'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)


INFO:planet.runner:Loading cached data
Traceback (most recent call last):

File "/usr/bin/planet", line 138, in <module>
  splice.apply(doc.toxml('utf-8'))
File "/usr/lib/pymodules/python2.7/planet/splice.py", line 118, in apply
  output_file = shell.run(template_file, doc)
File "/usr/lib/pymodules/python2.7/planet/shell/__init__.py", line 66, in run
  module.run(template_resolved, doc, output_file, options)
File "/usr/lib/pymodules/python2.7/planet/shell/tmpl.py", line 254, in run
  for key,value in template_info(doc).items():
File "/usr/lib/pymodules/python2.7/planet/shell/tmpl.py", line 193, in template_info
  data=feedparser.parse(source)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 3525, in parse
  feedparser.feed(data)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 1662, in feed
  sgmllib.SGMLParser.feed(self, data)
File "/usr/lib/python2.7/sgmllib.py", line 104, in feed
  self.goahead(0)
File "/usr/lib/python2.7/sgmllib.py", line 143, in goahead
  k = self.parse_endtag(i)
File "/usr/lib/python2.7/sgmllib.py", line 320, in parse_endtag
  self.finish_endtag(tag)
File "/usr/lib/python2.7/sgmllib.py", line 360, in finish_endtag
  self.unknown_endtag(tag)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 569, in unknown_endtag
  method()
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 1512, in _end_content
  value = self.popContent('content')
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 849, in popContent
  value = self.pop(tag)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 764, in pop
  mfresults = _parseMicroformats(output, self.baseuri, self.encoding)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 2219, in _parseMicroformats
  p.vcard = p.findVCards(p.document)
File "/usr/lib/pymodules/python2.7/planet/vendor/feedparser.py", line 2161, in findVCards
  sVCards += u'\n'.join(arLines) + u'\n'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)

You need to go through the file and make sure that all references and assignments make sVCards unicode also, otherwise you are just repeating the same error when you attempt to +=

searching through all of feedparser.py, there are just 3 occurences. sVCards = '' in the beginning, the one we changed and the returning it.

for tonight i simply commented line 2161 in feedparser.py

  1. sVCards += u'\n'.join(arLines) + u'\n'

as the temporary fix as i figured this is just because a vCard is found in one feed. This way sVCards should simply be returned empty. ( sVCards = '').

and this made the update work for now. We are back to March 14th on https://en.planet.wikimedia.org/

sVCards = u''
is the proper method for creating a unicode string

thanks, i tried. but also sVCards = u'' in line 1949 in addition to that did not fix it yet. back to commenting 2161.

updates work for now, but not the real fix and we should report upstream

and compare this: https://github.com/rubys/venus/blob/master/planet/vendor/feedparser.py to the one we have from planet-venus version "0~bzr116-1" in Ubuntu precise

http://www.intertwingly.net/code/venus/

in http://www.intertwingly.net/code/venus/docs/index.html it links to http://feedparser.org/docs/ but that is a parking domain at godaddy, sigh.

http://www.intertwingly.net/code/venus/AUTHORS

Package: planet-venus
Priority: optional
Section: universe/python
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Noah Slater <nslater@tumbolia.org>
Version: 0~bzr116-1

lowering importance to normal/normal because updates are working and en.planet is March 15 due to live hack.

well, all that would be todo here is reporting to upstream.. but we're fine with the hack ...

If there is a clear testcase and if somebody tells me where upstream is I can forward the ticket. https://github.com/rubys/venus/issues ? https://code.google.com/p/feedparser/issues/list ?

upstream is the github link you pasted. feedparser would be upstream for them i suppose. but it might also be a request to Ubuntu to have newer packages.. hrmm

^ i'd like to do that so it's not a live hack anymore but then close this ticket

done. work around is puppetized now. so i claim it's resolved