Page MenuHomePhabricator

Parsoid base URL should be independent of page
Closed, ResolvedPublic

Description

Currently Parsoid sets the URL of the page itself as the base URL. This means that [[OS/2]] has <base href="//en.wikipedia.org/wiki/OS/2">, which means that links on that page have to look like <a href="../Unix"> in order to point to the right place.

This practice is evil and should die in a fire. Instead, the base URL should be set to the base URL of the wiki, e.g. <base href="//en.wikipedia.org/wiki">.

The fact that the base URL currently depends on the page name causes lots of problems.

Mixing content from multiple pages (like in Flow) is hard, because you have to normalize away all the <base> differences. Even embedding content from one page standalone is difficult because Parsoid's (variable) choice for the base URL is not a reasonable choice for your entire UI's base URL.

Creating new content (like in VE) would be hard if it weren't for the fact that Parsoid tolerates <a href="Foo"> where it would really expect <a href="../Foo">. If VE had to actually produce correct hrefs in its output, it would have to do some pretty evil analysis of the base URL.

Copying content from one page to another is hit by both issues: you have to process the hrefs of copied links based on both the source's base URL and the destination's base URL, which is quite error-prone.

This change has been discussed before and everyone seems to agree that it should happen, but it hasn't happened yet, so let's start tracking it here.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=73291

Details

Reference
bz70743

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:54 AM
bzimport added a project: Parsoid-DOM.
bzimport set Reference to bz70743.

Let us tackle this soon since this seems to get in the way of Flow dropping data-parsoid usage.

gerritadmin wrote:

Change 170359 had a related patch set uploaded by Subramanya Sastry:
WIP: (Bug 70743): Point base href to the wiki + fix wikilink hrefs

https://gerrit.wikimedia.org/r/170359

gerritadmin wrote:

Change 170359 merged by jenkins-bot:
(Bug 70743): Point base href to wiki base; update link, img, tpl hrefs

https://gerrit.wikimedia.org/r/170359

GWicke subscribed.
This comment was removed by GWicke.
Arlolra subscribed.

This broke citations on parsoid html when <base href="http://pathtowiki" /> is set.

The href for citations are rendered as "#justahash" which respect the base url.

See http://parsoid-lb.eqiad.wikimedia.org/enwiki/Barack_Obama?oldid=645788571 where clicking a citation leads you to, for example, https://en.wikipedia.org/wiki/Main_Page#cite_ref-347.

Hash links, however, seem to include the full page path and remain unaffected.

What is left to do here? Is this still priority "Unbreak now" ?

ssastry lowered the priority of this task from Unbreak Now! to Medium.Apr 2 2015, 1:29 PM
ssastry set Security to None.