Page MenuHomePhabricator

VisualEditor: Re-used refs wrongly expanded to <ref name="…" ></span>
Closed, ResolvedPublic

Description

Details

Reference
bz47417

Event Timeline

This seems to be Firefox specific. I am seeing this on a lot of pages in Firefox, but not in Chrome. Gabriel can reproduce as well.

  • Bug 47419 has been marked as a duplicate of this bug. ***

Related URL: https://gerrit.wikimedia.org/r/60840 (Gerrit Change Ib6a0cfb880e769f28b42c9fa63ddc1abc75c399d)

The problem appears to be with the output from Parsoid with regard to <ref> tags not being escaped.

To give http://parsoid.wmflabs.org/en/Foobar as an example, in Chrome this looks fine:

<span id="cite_ref-rfc3092-1-0" class="reference" about="#mwt5" typeof="mw:Object/Ext/Ref" data-parsoid="{&quot;src&quot;:&quot;<ref name=\&quot;rfc3092\&quot; />;&quot;,&quot;tsr&quot;:[416,438],&quot;dsr&quot;:[416,438,null,null]}"><a href="#cite_note-rfc3092-1" data-parsoid="{}">[1]</a></span>
<meta typeof="mw:Ext/Ref/Marker" about="#mwt5" group="" name="rfc3092" content="" skiplinkback="0" data-parsoid="{&quot;tsr&quot;:[416,438],&quot;selfClose&quot;:true,&quot;src&quot;:&quot;&lt;ref name=\&quot;rfc3092\&quot; /&gt;&quot;,&quot;dsr&quot;:[416,438,22,0]}">

... but in Firefox the browser presumably doesn't like the un-escaped "<ref name=... />" in the attribute of the <span> and so it does this:

<span id="cite_ref-rfc3092-1-0" class="reference" about="#mwt5" data-parsoid="{"src":"<ref name=\"rfc3092\" />","tsr":[416,438],"dsr":[416,438,null,null]}" typeof="mw:Object/Ext/Ref"></span>
<meta typeof="mw:Ext/Ref/Marker" about="#mwt5" group="" name="rfc3092" content="" skiplinkback="0" data-parsoid="{"tsr":[416,438],"selfClose":true,"src":"<ref name=\"rfc3092\" />","dsr":[416,438,22,0]}"></meta>

I.e., closing the <span> early and throwing away the contents. Note that the following meta is also fragile as it contains unescaped JSON as well.

The result of this is that <ref name="…" /> is wrongly expanded to <ref name="…" ></span> and so corrupted on save by Firefox users.

This is what I see using wget, FF, and the parse.js in the shell:

<span id="cite_ref-rfc3092-1-0" class="reference" about="#mwt5" typeof="mw:Object/Ext/Ref" data-parsoid='{"src":"<ref name=\"rfc3092\" />","tsr":[416,438],"dsr":[416,438,null,null]}'><a href="#cite_note-rfc3092-1" data-parsoid="{}">[1]</a></span><meta typeof="mw:Ext/Ref/Marker" about="#mwt5" group="" name="rfc3092" content="" skiplinkback="0" data-parsoid='{"tsr":[416,438],"selfClose":true,"src":"<ref name=\"rfc3092\" />","dsr":[416,438,22,0]}'>

Make sure you are not looking at a pretty-printed inspect tool. Either use innerHTML (which will use non-smart quoting) or the actual page source (ctrl-u in Firefox).

From what I see Firefox parses our HTML just fine on the way in, with the entire JSON string being parsed as the attribute value. Do you actually see a parsing difference on the way in resulting in a different DOM?

Changing the title back, as I am pretty sure this is not a bug on our end.

This is a bug in jQuery. To reproduce:

$( '<div>' ).html( '<span data-parsoid="{&quot;src&quot;:&quot;<ref name=\&quot;Foo\&quot; />&quot;,&quot;tsr&quot;:[32,50],&quot;dsr&quot;:[32,50,null,null]}">' ).html();

"<span data-parsoid="{&quot;src&quot;:&quot;<ref name=&quot;Foo&quot; ></span>&quot;,&quot;tsr&quot;:[32,50],&quot;dsr&quot;:[32,50,null,null]}"></span>"

(In reply to comment #6)

This is a bug in jQuery. To reproduce:

That confirms my suspicion. The bug appeared first after you started using it to avoid dropping figure elements.

Ed's quick hack is now merged, but we still need to fix this - upstream bug in jQuery or just a work around?

The reason this doesn't happen in Chrome is because Chrome normalizes < to &lt; in attribute values in .innerHTML. Firefox doesn't do this and gives us the original < as originally provided by Parsoid.

The presence of these <> then triggers a jQuery regex looking for self-closing tags:

/<(?!area|br|col|embed|hr|img|input|link|meta|param)(([\w:]+)[^>]*)\/>/gi

...which believes our span (!) is self-closing because it starts with <span and contains /> later, so it helpfully "fixes" our HTML and replaces the /> with </span>

(In reply to comment #7)

(In reply to comment #6)

This is a bug in jQuery. To reproduce:

That confirms my suspicion. The bug appeared first after you started using it
to avoid dropping figure elements.

Yup, which is why we can't just go back, because that would break <caption> elements again.

Filed upstream as http://bugs.jquery.com/ticket/13821

  • Bug 47430 has been marked as a duplicate of this bug. ***

This is an instance of the jQuery bug 47737 - marking as FIXED (in the specific instance) but we need a wider fix.