Page MenuHomePhabricator

Invalid extension attribute encoding
Closed, ResolvedPublic

Description

There was a rt crasher:

error in ruwiki:Калашников,_Анатолий_Иванович
TypeError: Object Некоторые работы А. Калашникова на [[почтовая марка|почтовых марках]],[object Object], ,[object Object],[[СССР]] has no method 'replace'
at Object.Util.escapeEntities (/usr/lib/parsoid/src/lib/mediawiki.Util.js:1511:14)
at WikitextSerializer.WSP._serializeAttributes (/usr/lib/parsoid/src/lib/mediawiki.WikitextSerializer.js:3289:15)
at WikitextSerializer.WSP._buildExtensionWT (/usr/lib/parsoid/src/lib/mediawiki.WikitextSerializer.js:3533:17)

We found a minimal test case:
echo '<gallery caption="&nbsp;"></gallery>' | tests/parse.js --wt2wt
or
echo '<math title="&nbsp;"></math>' | tests/parse.js --wt2wt

The emitted HTML is:
<span data-parsoid='{"src":"&lt;math title=\"&amp;nbsp;\">&lt;/math>","dsr":[0,28,2,2]}' typeof="mw:Extension/math" data-mw='{"name":"math","attrs":{"title":[{"type":"TagTk","name":"span","attribs":[{"k":"typeof","v":"mw:Entity"}],"dataAttribs":{"src":"&amp;nbsp;","srcContent":" ","tsr":[13,13]}}," ",{"type":"EndTagTk","name":"span","attribs":[],"dataAttribs":{"tsr":[19,19]}}]},"body":{"extsrc":""}}' about="#mwt3">
</span>

Note the "type":"TagTk" in data.mw.title. Tokens don't belong there!


Version: unspecified
Severity: normal

Details

Reference
bz62663

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:52 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz62663.

(Bug 62664 is for WTS crashing -- it shouldn't crash, even if the input is bogus.)

Known issue. We dont support templated attributes for extensions. From ext.core.ExtensionHandler.js:


// SSS FIXME: We seem to have a problem on our hands here.
//
// AttributeExpander runs after ExtensionHandler which means
// the native handlers will not receive fully expanded tokens.
//
// In the case of Cite.ref and Cite.references, this is not an issue
// since the final processing takes place in the DOM PP phase,
// by which time the marker tokens would have had everything expanded.
// But, this may not be true for other exensions.
//
// So, we wont be able to robustly support templated ext. attributes
// without a fix for this since attribute values might be ext-generated
// and ext-attribute values might be templated.
//
// The fix might require breaking this cycle by expliclitly-expanding
// ext-attribute-values here in a new pipeline.  TO BE DONE.

We use the same technique for any extension attributes that is not a plain string (which &nbsp; is not). Perhaps we should at least forcibly convert the tokens to string which can at least handle the common case of entities in attribute values.

Change 118784 had a related patch set uploaded by Subramanya Sastry:
(Bug 62663) Temp fix for handling non-string extension attribute values

https://gerrit.wikimedia.org/r/118784

Change 118784 merged by jenkins-bot:
(Bug 62663) Temp fix for handling non-string extension attribute values

https://gerrit.wikimedia.org/r/118784