Page MenuHomePhabricator

Auto URL linker in API formatted output doesn't match Wikipedia articles with parens
Closed, ResolvedPublic

Description

Wikipedia URLs frequently contain parenthetical bits in them from article disambiguation markers. The automatic URL highlighting in the API's jsonfm and xmlfm output doesn't handle these, for instance this bit:

<Url>http://en.wikipedia.org/wiki/HMS_Neptune_(1797)</Url>

gets formatted to:

<span style="color: blue;">&lt;Url&gt;</span><a href="http://en.wikipedia.org/wiki/HMS_Neptune_">http://en.wikipedia.org/wiki/HMS_Neptune_</a>(1797)<span style="color: blue;">&lt;/Url&gt;</span>

cutting off the "(1797)" from the link. Autolinker should be a bit friendlier to Wikipedia here; at a minimum, standard heuristics would allow for accepting the parenthetical as long as it's opened within the link. Alternately, since URLs here are generally standalone rather than part of text, it could just expand the link all the way until an illegal character appears such as " or <.


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/w/api.php?action=opensearch&search=Neptune&limit=50&format=xmlfm

Details

Reference
bz17182

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:28 PM
bzimport set Reference to bz17182.

Should be easy to fix. Ironically, though, Bugzilla doesn't handle the (1797) part completely right either: the closing parenthesis isn't linked.