Page MenuHomePhabricator

wbsetitem api action returns invalid xml on error
Closed, ResolvedPublic

Description

My bot gets an error message when parsing response of wbsetitem

This bug is not about the error itself but about the wrong return format:

Request:
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data={%22label%22%3A{%22en%22%3A{%22language%22%3A%22en%22%2C%22value%22%3A%22Sina%22}%2C%22de%22%3A{%22language%22%3A%22de%22%2C%22value%22%3A%22Sina%22}%2C%22la%22%3A{%22language%22%3A%22la%22%2C%22value%22%3A%22Sina%22}%2C%22fy%22%3A{%22language%22%3A%22fy%22%2C%22value%22%3A%22Sina%22}%2C%22nl%22%3A{%22language%22%3A%22nl%22%2C%22value%22%3A%22Sina%22}%2C%22mg%22%3A{%22language%22%3A%22mg%22%2C%22value%22%3A%22Sina%22}}%2C%22links%22%3A{%22en%22%3A{%22site%22%3A%22en%22%2C%22title%22%3A%22Sina%22}%2C%22de%22%3A{%22site%22%3A%22de%22%2C%22title%22%3A%22Sina%22}%2C%22la%22%3A{%22site%22%3A%22la%22%2C%22title%22%3A%22Sina%22}%2C%22fy%22%3A{%22site%22%3A%22fy%22%2C%22title%22%3A%22Sina+%28betsjuttings%29%22}%2C%22nl%22%3A{%22site%22%3A%22nl%22%2C%22title%22%3A%22Sina%22}%2C%22mg%22%3A{%22site%22%3A%22mg%22%2C%22title%22%3A%22Sina%22}}}

Response:

Fatal error: Call to a member function getPrefixedDBkey() on a non-object in /var/www/wikidata-test-repo.wikimedia.de/w/extensions/Wikibase/repo/includes/ItemContent.php on line 139

So the returned content is plain text although the request contains 'format=xml'. This should _never_ happend because most parsers always expect valid xml. So please always catch all errors and wrap then into valid xml.

An exmaple from the same module containing another error, but returning valid, so that it can be handled by the requestor:
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data={%22label%22%3A{%22en%22%3A{%22language%22%3A%22en%22%2C%22value%22%3A%22ABC%22}}%2C%22links%22%3A{%22en%22%3A{%22site%22%3A%22en%22%2C%22title%22%3A%22ABC%22}}}
Respone:
<?xml version="1.0"?><api><error code="internal_api_error_DBQueryError" info="Database query error" xml:space="preserve">

#0 /var/www/wikidata-test-repo.wikimedia.de/w/includes/db/Database.php(939): DatabaseBase-&gt;reportQueryError('Duplicate entry...', 1062, 'INSERT INTO `w...', 'Wikibase\ItemSt...', false)
...
</error></api>


Version: unspecified
Severity: blocker
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=36519

Details

Reference
bz38234

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 12:59 AM
bzimport set Reference to bz38234.
bzimport added a subscriber: Unknown Object (MLST).

Short answer:
First URL hits a bug in the code, but this bug is now partly fixed in a new version. Any fatal bugs due to faulty program flow may result in bugs producing any kind of formats, including but not limited to free text.

In addition you use language as site ids, which should have be catched but this module bypasses validity checks. Your call in this case is flawed.

Second URL tries to make a duplicate entry. This was allowed in some cases previously, but is not allowed anymore.

Long answer:
The function of wbsetitem is mostly undocumented and undefined, and may in the future include additional validation of the arguments. For the moment _all_ actions that includes json passed to this module is a feature but unsupported and may go away. ;)

The servers may, or may not, run in a debug mode where GET requests are allowed. If they are allowed the GET requests are limited in length. When they are limited in length they will be truncated. When they are truncated they will fail because the json will be invalid and the call to json_decode will fail. The code is currently missing several checks on validity when transitioning from a json structure and to a item structure (which is basically a json structure itself).

The reason why it is so is because this module tries to map a json-structure to an array structure representing an item. Later this array structure is recreated as a json structure in the item itself and as rows in special tables in the database structure. The mappings from json to the array are not well-defined, and especially the handling of requests that somehow violates the existing constraints are not defined at all.

Note that json used as input to wbsetitem is _not_ the same as the json you get as output.

In my opinion _all_ calls that violates _any_ constraint should fail.

Especially note that the repo does important normalization and validation when called through wbsetsitelink, and that those are bypassed when you use wbsetitem. It is highly likely that the same normalization and validation will be enforced in wbsetitem.

First call reads something like
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data=
{
"label":{

		"en":{"language":"en","value":"Sina"},
		"de":{"language":"de","value":"Sina"},
		"la":{"language":"la","value":"Sina"},
		"fy":{"language":"fy","value":"Sina"},
		"nl":{"language":"nl","value":"Sina"},
		"mg":{"language":"mg","value":"Sina"}

},
"links":{

		"en":{"site":"en","title":"Sina"},
		"de":{"site":"de","title":"Sina"},
		"la":{"site":"la","title":"Sina"},
		"fy":{"site":"fy","title":"Sina (betsjuttings)"},
		"nl":{"site":"nl","title":"Sina"},
		"mg":{"site":"mg","title":"Sina"}

}
}

It should be
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data=
{
"label":{

		"en":"Sina",
		"de":"Sina",
		"la":"Sina",
		"fy":"Sina",
		"nl":"Sina",
		"mg":"Sina"

},
"links":{

		"enwiki":"Sina",
		"dewiki":"Sina",
		"lawiki":"Sina",
		"fywiki":"Sina (betsjuttings)",
		"nlwiki":"Sina",
		"mgwiki":"Sina"

}
}

First form is invalid after a previous change of the internal transform from an array to the internal json structure. It does not fail due to bypassed validation checks, but it should fail.

Second call reads something like
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data=
{
"label":{

		"en":{"language":"en","value":"ABC"}

},
"links":{

		"en":{"site":"en","title":"ABC"}

}
}

It should be
http://wikidata-test-repo.wikimedia.de/w/api.php?action=wbsetitem&format=xml&item=add&data=
{
"label":{

		"en":"ABC"

},
"links":{

		"enwiki":"ABC"

}
}

Note that upcoming changes will make the previous examples fail hard.
https://gerrit.wikimedia.org/r/#/c/14762/

See also the documentation on Mediawiki.org
http://www.mediawiki.org/wiki/Extension:Wikibase/API#wbsetitem

Use of wbsetitem to set sitelinks still bypasses normalization, and it is the bot operators responsibility to check and verify that only valid canonical page names are used. Verification of the external page name will (probably) be added later.

As i already said, the bug is not about the reported error, but about the returned invalid format caused by an error (there can be many other kind of errors in future).

My SAXParser throws an expection while reading the inputstream from tcp socket, because it expects valid xml if the response header contains a 2xx status code.

If you wrap the error within an xml message everything would be ok and the bot could do error handling based on error code.

Just to let you know:

  • My bot uses post request
  • All my submitted page titles should be normalised because these strings are extracted from a previous api request containing a /page/@title attribute value.

Sounds good. Note also that the list "label" should use the keyword "labels", "description" should use "descriptions", and "links" should use "sitelinks".