Page MenuHomePhabricator

Cite: Reference extension outputs unescaped fragments in parsoid
Closed, ResolvedPublic

Description

https://parsoid.wmflabs.org/enwiki/Channel_Tunnel?oldid=581851617

Parsoid is outputting IDs with spaces in the fragment.

<li about="#cite_note-EU reg impact 220-222-86" id="cite_note-EU reg impact 220-222-86" data-parsoid="{}"><span rel="mw:referencedBy"><a href="#cite_ref-EU reg impact 220-222-86-0">↑</a></span> European Commission pp. 220–222</li>

PHP output:

<li id="cite_note-EU_reg_impact_220-222-86"><span class="mw-cite-backlink"><b><a href="#cite_ref-EU_reg_impact_220-222_86-0"><span class="cite-accessibility-label">Jump up </span>^</a></b></span> <span class="reference-text">European Commission pp. 220–222</span></li>


Version: unspecified
Severity: normal

Details

Reference
bz57252

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:23 AM
bzimport added a project: Parsoid-DOM.
bzimport set Reference to bz57252.
  • This bug has been marked as a duplicate of bug 55400 ***

This looks like a separate issue (no id munging similar to the XHTML4-style PHP behavior), so reopening.

Restrictions on id attributes in HTML5: http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-id-attribute

Those same restrictions also apply to <a> anchors since page-anchors in #-style use element ids.

RDF about attribtues dont accept spaces either. So, using the id-element restrictions on the ref-target and using that in <span id>, <li about/id>, <a href> should work.

Change 96892 had a related patch set uploaded by Subramanya Sastry:
(Bug 57252) Generate HTML5-compliant cite id/about attr values

https://gerrit.wikimedia.org/r/96892

I'd love to leave the weird XHTML4 encoding I added a while ago behind. We should investigate whether people are linking to current citations from other pages. If they do, then we might want to consider not breaking those links for now.

(In reply to comment #5)

I'd love to leave the weird XHTML4 encoding I added a while ago behind. We
should investigate whether people are linking to current citations from other
pages. If they do, then we might want to consider not breaking those links
for
now.

They are used in links from the talk page. Those could be addressed with oldid links, if the old versions kept the same ids; tho that may be more work.

Change 105406 had a related patch set uploaded by Subramanya Sastry:
(Bug 57252) Generate HTML5-compliant cite id/about attr values

https://gerrit.wikimedia.org/r/105406

Change 96892 abandoned by Subramanya Sastry:
(Bug 57252) Generate HTML5-compliant cite id/about attr values

Reason:
Migrated to the new repo (https://gerrit.wikimedia.org/r/#/c/105406/)

https://gerrit.wikimedia.org/r/96892

Change 105406 merged by jenkins-bot:
(Bug 57252) Generate HTML5-compliant cite id/about attr values

https://gerrit.wikimedia.org/r/105406