Page MenuHomePhabricator

Parsoid chokes on unescaped single quotes in URLs
Closed, ResolvedPublic

Description

See https://en.wikipedia.org/wiki/United_States_v._11_1/4_Dozen_Packages_of_Articles_Labeled_in_Part_Mrs._Moffat%27s_Shoo-Fly_Powders_for_Drunkenness - with the last link ([http://archive.nlm.nih.gov/fdanj/items-by-subject?subject=Mrs.+Moffat's+Shoo+Fly+Powders+for+Drunkenness Case information at the NIH National Library of Medicine]) the VE incorrectly determines which bit is the URL and which is the displayed text, resulting in (within the VE) page text that reads "'s+Shoo+Fly+Powders+for+Drunkenness Case information at the NIH National Library of Medicine"


Version: unspecified
Severity: minor

Details

Reference
bz49688

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:57 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz49688.

A Parsoid issue, though really it's a very messy URL.

Example breaking wikitext:

[http://example.org/Foo'bar No bar in title]

Should be:

<p data-parsoid="…"><a rel="mw:ExtLink" href="http://example.org/Foo'bar" data-parsoid="…">No bar in title</a></p>

Instead is:

<p data-parsoid="…"><a rel="mw:ExtLink" href="http://example.org/Foo" data-parsoid="…">'bar No bar in title</a></p>

[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]

Change 126853 had a related patch set uploaded by Cscott:
Fix a number of link-parsing and serialization issues.

https://gerrit.wikimedia.org/r/126853

Change 126853 merged by jenkins-bot:
Fix a number of link-parsing and serialization issues.

https://gerrit.wikimedia.org/r/126853