Author: bdanee88
Description:
I'm just started to write a statistics program for Hungarian Wikipedia. While I downloaded the deletion log from January 2008, my program encountered an exception: the XML loaded from the API was bad encoded. I wondered why, so I checked it, and really, there is an error:
In element 'item' with logid 142820, the comment contains an unknown character at the end. Probably it would be a two byte length UTF-8 character, but it has been trimmed. The problem is not so serious as I can get rid of the comment attribute with using &leprop= in the URL as I don't need it, but if someone needs it, he/she won't able to load the file.
The bad line (see also in the link):
<item logid="142820" pageid="0" ns="0" title="Borisz Szpasszkij" type="delete" action="delete" user="Bináris" timestamp="2008-01-25T21:19:30Z" comment="[[Wikipédia:Homokozó|teszt]]: a lap tartalma: „Boris Vasilievich Spassky [szerkesztés] A Wikipédiából, a szabad lexikonból. Ugrás: <small>NAVIGÁCIÓ</small>, <small>KERESÉS</small> Boris V Spassky () szovjet később francia...” (és csak �"/>
Version: unspecified
Severity: normal
URL: http://hu.wikipedia.org/w/api.php?format=xml&action=query&list=logevents&letype=delete&lestart=2008-01-25T22:12:03Z&lelimit=30