Page MenuHomePhabricator

Generalize foster parented content detection in early DOM postprocessor pass
Closed, ResolvedPublic

Description

Many DOM passes depend on an accurate identification of foster-parented content (see http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#foster-parent). We have implemented some detection already in dom.markFosteredContent.js, but also still depend on a hack (used to be a convenient bug) in the HTML5 treebuilder that disables fostering for meta tags.

It would be great if we could generalize and improve the existing algorithm so that

  • it can be run as a first pass on the DOM,
  • its marking of fostered content can be relied upon by all other DOM passes,
  • it properly detects fostering of pure text, and
  • we can remove the no-fostering-for-metas hack from the HTML5 treebuilder.

Fostered text detection can probably be addressed with this trick:

For each <table> TagTk, we can pre-pend a <meta typeof="mw:FosterMarker"> SelfclosingTagTk just before adding a tagId sequence number and feeding those tokens to the treebuilder. This will then create a 'fostering box' in the DOM:

content..
<meta typeof="mw:FosterBox" data-parsoid="{tagId: 3}"/>
potentially fostered content
<table data-parsoid="{tagId: 4}">..</table>

Fostered element content will have higher tagIds than both the meta and the table.

A complication we should ignore for now is cases like <table><meta><table>..- lets tackle those rare edge cases later.

The goal is to mark all fostered content with data.parsoid.fostered. Fostered text nodes need to be wrapped into a span for this. The extra meta tags for fostering detection should be stripped so that they don't interfere with later passes.


Version: unspecified
Severity: normal

Details

Reference
bz53110

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:03 AM
bzimport added a project: Parsoid-DOM.
bzimport set Reference to bz53110.

A clarification: The hack (convenient bug) in the HTML5 treebuilder that disables fostering of meta tags is used for a different pass (markTreeBuilderFixups) and is independent of the task in this bug -- which is accurate detection of fostered tags. Consequently, step 4. (we can remove no-fostering-for-metas hack) can be implemented separately from the task here -- we can create a new bug for it and outline the problems and requirements there.

The fourth step (re-enabling foster-parenting) will definitely require more work than just implementing fostering detection, but it should be significantly easier once reliable fostering info is available. Lets create a separate bug for that once we get close to tackling it.

Change 80675 had a related patch set uploaded by Arlolra:
WIP: Generalize foster parented content detection

https://gerrit.wikimedia.org/r/80675

Change 80675 merged by jenkins-bot:
Generalize foster parented content detection

https://gerrit.wikimedia.org/r/80675