Page MenuHomePhabricator

File metadata containing invalid characters produce bad-formed XML
Closed, ResolvedPublic

Description

See url. The file contains a bare 0x0F character \x0f on the title field which is passed untranslated through the API to the user as XML. It would be bad if this were an invalid unicode char, but being < 32, per http://www.w3.org/TR/REC-xml/#charsets its presence makes the full api result not well-formed, choking XML parsers on it.

Related to bug 15261, bug 16105.


Version: 1.14.x
Severity: normal
URL: http://commons.wikimedia.org/w/api.php?format=xml&action=query&prop=imageinfo&iiprop=metadata&titles=Image:%EB%B0%95%ED%96%A5%EB%A6%BC_%EA%B9%80%ED%95%B4%EC%86%A1_%EC%A0%84%ED%99%94%EC%9D%BC%EA%B8%B0.ogg

Details

Reference
bz16262

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:23 PM
bzimport set Reference to bz16262.

Should be fixed in r45749: 0x0F is invalid UTF-8 and converted to the UTF-8 replacement character (U+FFFD).