Page MenuHomePhabricator

Category links that show up as modified and have a "./" in the <link> href serialize to Badtitletext
Closed, ResolvedPublic

Description

$ echo '<link rel="mw:PageProp/Category" href="./Category:Toxine_bactérienne"/><link rel="mw:PageProp/Category" href="./Category:Toxine_bact%C3%A9rienne"/>' | node tests/parse.js --html2wt --apiURL=http://fr.wikipedia.org/w/api.php

[[MediaWiki:Badtitletext]]
[[MediaWiki:Badtitletext]]

Serialization works correctly if the href matches data-parsoid, but only if the client hasn't URL-encoded the href. This is why VE is introducing corruption like https://fr.wikipedia.org/w/index.php?title=Exotoxine&diff=prev&oldid=107508831 , but only in Firefox because Firefox URL-encodes é in hrefs whereas Chrome doesn't.


Version: unspecified
Severity: major

Details

Reference
bz70894

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:47 AM
bzimport set Reference to bz70894.

A quick look at wts.LinkHandler.js reveals that modified links go through wikilink content escaping which is where this gets tripped up. And modification detection is based on data-parsoid inspection and comparing with href, etc.

So, something is broken in state.env.isValidLinkTarget(linkTarget) function (used in escapeWikiLinkContentString).

(In reply to ssastry from comment #1)

A quick look at wts.LinkHandler.js reveals that modified links go through
wikilink content escaping which is where this gets tripped up. And
modification detection is based on data-parsoid inspection and comparing
with href, etc.

So, something is broken in state.env.isValidLinkTarget(linkTarget) function
(used in escapeWikiLinkContentString).

This is also broken for links with special characters whose hrefs then get URL-encoded. This leads to the links being normalized to underscore form.

$ echo '<a href="../Le_Maillon_faible_%28jeu_t%C3%A9l%C3%A9vis%C3%A9%29" rel="mw:WikiLink" data-parsoid="{&quot;stx&quot;:&quot;piped&quot;,&quot;a&quot;:{&quot;href&quot;:&quot;../Le_Maillon_faible_(jeu_télévisé)&quot;},&quot;sa&quot;:{&quot;href&quot;:&quot;Le Maillon faible (jeu télévisé)&quot;},&quot;dsr&quot;:[133,184,35,2]}" title="Le Maillon faible (jeu télévisé)">Maillon faible</a>' | node tests/parse.js --html2wt --prefix frwiki

[[Le Maillon_faible_(jeu_télévisé)|Maillon faible]]

$ echo '<a href="../Le_Maillon_faible_(jeu_télévisé)" rel="mw:WikiLink" data-parsoid="{&quot;stx&quot;:&quot;piped&quot;,&quot;a&quot;:{&quot;href&quot;:&quot;../Le_Maillon_faible_(jeu_télévisé)&quot;},&quot;sa&quot;:{&quot;href&quot;:&quot;Le Maillon faible (jeu télévisé)&quot;},&quot;dsr&quot;:[133,184,35,2]}" title="Le Maillon faible (jeu télévisé)">Maillon faible</a>' | node tests/parse.js --html2wt --prefix frwiki

[[Le Maillon faible (jeu télévisé)|Maillon faible]]

I thought it was weird the first space didn't get converted to an underscore there, but that seems to be happening in general:

$ echo '<a href="./Le_Maillon_faible_(jeu_télévisé)" rel="mw:WikiLink">Maillon faible</a>' | node tests/parse.js --html2wt --prefix frwiki

[[Le Maillon_faible_(jeu_télévisé)|Maillon faible]]

Happens without ./ too

gerritadmin wrote:

Change 160795 had a related patch set uploaded by Subramanya Sastry:
(Bug 70894) Fix bugs serializing modified wikilinks

https://gerrit.wikimedia.org/r/160795

gerritadmin wrote:

Change 160795 merged by jenkins-bot:
(Bug 70894) Fix bugs serializing modified wikilinks

https://gerrit.wikimedia.org/r/160795

gerritadmin wrote:

Change 161141 had a related patch set uploaded by Subramanya Sastry:
(Bug 70894) Fix regressions introduced by 6e302233 (found in RT-testing)

https://gerrit.wikimedia.org/r/161141

gerritadmin wrote:

Change 161141 merged by jenkins-bot:
(Bug 70894) Fix regressions introduced by 6e302233 (found in RT-testing)

https://gerrit.wikimedia.org/r/161141

gerritadmin wrote:

Change 163292 had a related patch set uploaded by Subramanya Sastry:
New parser tests for lang/category/wiki links (wt2wt and html2wt modes)

https://gerrit.wikimedia.org/r/163292

gerritadmin wrote:

Change 163292 merged by jenkins-bot:
New parser tests for lang/category/wiki links (wt2wt and html2wt modes)

https://gerrit.wikimedia.org/r/163292