Page MenuHomePhabricator

HTML entities do not work with inline queries of n-ary properties
Closed, ResolvedPublic

Description

When making an inline query of a n-ary property whose first part is of type Page, if the property value contains a single quote ('), then the query will break if the magic word {{PAGENAME}} is used, but work if the page name is pasted in directly. The attached URL points to an example of this on the sandbox. The first query uses the pasted page name (and works), the second uses {{PAGENAME}} (and fails), and the third uses a non-n-ary property with {{PAGENAME}} (and works).

This appears to be a problem with escaping, somewhere in the n-ary code.


Version: unspecified
Severity: normal
URL: http://sandbox.semantic-mediawiki.org/index.php?oldid=19162

Details

Reference
bz21926

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:58 PM
bzimport set Reference to bz21926.

The actual problem is that {{PAGENAME}} returns a HTML entity in this example, and this entity is terminated by a semicolon that is mistaken by SMW for the separator of an n-ary value. The example page has been updated to illustrate this.

Alas, this gets us into encoding of special characters. SMW is rather faithful now in this respect: it really makes a difference whether you write "ö" or "ö" in an annotation. In general, SMW sticks to what it gets from MediaWiki and does not do additional encoding or decoding. The problem is that neither encoding nor decoding is idempotent: if you execute either action twice, you may get different results from doing it just once. And doing it twice usually leads to wrong results. For example, if you write "ö" then you would expect a Unicode equivalent of "ö" which would be rendered in HTML again as "ö" but never as "ö".

This is why we cannot blindly decode references unless we are sure that it has not been done before. Ideally, we would treat HTML entity inputs just the same as the characters they encode. In fact, SMW used to do this for annotations at some point. It seems that the behaviour of MediaWiki has changed since then; or some change in SMW has lead to the new behaviour. When trying to preserve compatibility with existing MW version, it is necessary to understand what changed and when. More investigations are needed to find out whether or not MediaWiki decodes/encodes any entities in strings that various parts of SMW receive (clearly, this could be different for parser functions like #ask and for parsing extensions like our semantic links). Depending on this information, we can try to normalise SMW's stored data in such a way that it is feasible to apply entity decoding to inputs that properties in #ask receive (note that decoding, splitting at the remaining ";", and encoding the string again does not lead to the same input that the user has given; e.g. a given "ö" would turn into "ö" which would no longer match unless the DB stores "ö" and "ö" in the same way, which it currently does not).

  • Bug 25178 has been marked as a duplicate of this bug. ***
Kghbln closed this task as Resolved.EditedNov 29 2014, 12:47 PM
Kghbln subscribed.

Fixed with pull request 654 by MWJames.