Page MenuHomePhabricator

Claim.fromJSON() fails on a deleted property
Closed, ResolvedPublic

Description

Today my bot encountered https://www.wikidata.org/wiki/Q3211336 . This contains the deleted property P288.

This crashes the bot:

Traceback (most recent call last):

File "C:\pywikibot\core\article_to_category.py", line 97, in <module>
  main()
File "C:\pywikibot\core\article_to_category.py", line 94, in main
  bot.run()
File "C:\pywikibot\core\article_to_category.py", line 52, in run
  if not u'P910' in itemB.get().get('claims').keys():
File "C:\pywikibot\core\pywikibot\page.py", line 2571, in get
  c = Claim.fromJSON(self.repo, claim)
File "C:\pywikibot\core\pywikibot\page.py", line 2776, in fromJSON
  if claim.getType() == 'wikibase-item':
File "C:\pywikibot\core\pywikibot\page.py", line 2715, in getType
  self.type = self.repo.getPropertyType(self)
File "C:\pywikibot\core\pywikibot\site.py", line 3533, in getPropertyType
  dtype = data['entities'][prop.getID().lower()]['datatype']

KeyError: u'p288'

If you look at https://git.wikimedia.org/blob/pywikibot%2Fcore.git/a1d63c8ba608b87d3a3aff6acc8dbbc29085eb8f/pywikibot%2Fpage.py from line 2776 and furthere, you'll see that it tries to lookup the type of the property. That fails because the property is deleted.

I don't think the lookup is needed, see https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q3211336&format=json :

"P288": [

{
    "id": "q3211336$00f2cba0-4c06-ee8d-9598-d8c034b663ed",
    "mainsnak": {
        "snaktype": "value",
        "property": "P288",
        "datavalue": {
            "value": {
                "entity-type": "item",
                "numeric-id": 2304194
            },
            "type": "wikibase-entityid"
        }
    },
    "type": "statement",
    "rank": "normal"
}

]

This should be used to construct the claim instead of the extra lookup


Version: core-(2.0)
Severity: normal

Details

Reference
bz54235

Related Objects

Mentioned Here
P18 my paste!

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:55 AM
bzimport set Reference to bz54235.

Unfortunately that won't work in some cases like commonsMedia or url. If you look at
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=q76&format=jsonfm:

"property": "P18",
"datavalue": {
    "value": "President Barack Obama.jpg",
    "type": "string"
}

We could probably implement a fallback for the ones that will work like entity, coordinate and time, and emit a warning like "P288 does not exist"

This error is probably rare to run into since the datatype is cached forever, but most scripts need to re-cache everything because the ids went uppercase.

It looks like the problem mentioned in comment 1 is no longer an problem, as the JSON now includes the mainsnak datatype, whereas I assume it didnt previously based on comment 1.

e.g. q76 JSON now includes "datatype"

"P18": [

{
    "id": "q76$AA25BE21-FFCA-4C0A-A7B4-C041CBE549F7",
    "mainsnak": {
        "snaktype": "value",
        "property": "P18",
        "datatype": "commonsMedia",
        "datavalue": {
            "value": "President Barack Obama.jpg",
            "type": "string"
        }
    },

I ran into a very tightly related bug while loading items from an xml dump, without access to the wiki the dump was from - such as with a firewall in the way.

If I understand this correctly, any time the JSON parser finds a 'datatype' in a snak, it can use that instead of doing a lookup. The JSON *may* not include this, in which a lookup is required, and if a lookup isnt possible then the code should raise an appropriate exception.

(In reply to John Mark Vandenberg from comment #2)

It looks like the problem mentioned in comment 1 is no longer an problem, as
the JSON now includes the mainsnak datatype, whereas I assume it didnt
previously based on comment 1.

Awesome!

If I understand this correctly, any time the JSON parser finds a 'datatype'
in a snak, it can use that instead of doing a lookup. The JSON *may* not
include this, in which a lookup is required, and if a lookup isnt possible
then the code should raise an appropriate exception.

Yup, sounds good.

Change 137737 had a related patch set uploaded by Ricordisamoa:
initialize PropertyPage directly with datatype

https://gerrit.wikimedia.org/r/137737

Change 137737 merged by jenkins-bot:
initialize Property directly with datatype

https://gerrit.wikimedia.org/r/137737