Page MenuHomePhabricator

Encoding problems in property parser function
Closed, DeclinedPublic

Description

I added the property "Commons category" (P373) to a lot of items, see for example https://www.wikidata.org/wiki/Q1008 . It contains a string with the name of the category on Commons.

On the Dutch Wikipedia I added some parser functions to find problems with the data, see https://nl.wikipedia.org/w/index.php?title=Sjabloon%3ACommonscat&diff=37414305&oldid=36545480

I noticed a large number of pages with "funny" characters in https://nl.wikipedia.org/wiki/Categorie:Wikipedia:Commonscat_met_lokaal_andere_link_dan_op_Wikidata . This category is for items which have a local variable set and it's different than the one set on Wikidata. Some pages which ended up in there shouldn't be there. Take for example https://nl.wikipedia.org/wiki/Ivoorkust , it links to the exact same category on the Dutch Wikipedia as it's object on Wikidata.

My assumption is that somewhere encoding got mangled up.


Version: unspecified
Severity: normal

Details

Reference
bz47619

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:35 AM
bzimport set Reference to bz47619.
bzimport added a subscriber: Unknown Object (MLST).

It seems that #property encodes some characters the same way as PAGENAME do. Is it necessary? It results confusing checking matches #ifeq both with PAGENAME or #property. In the example above, the argument of template "Commonscat|Côte d'Ivoire" does not match with #property:P373 that returns "Côte d'Ivoire". If so, we will need a function or a template based on Lua for encoding/decoding pagenames and Wikidata properties.

seems to have been fixed since