Page MenuHomePhabricator

Update markTreeBuilderFixups pass to work without relying on metas being unfostered out of tables
Closed, ResolvedPublic

Description

markTreeBuilderFixups analyzes the DOM and tries to find opening and closing tags added automatically by the tree builder (where they didn't exist in the token stream) and unbalanced closing tags that got deleted (where they were present in the token stream).

To do this, it adds shadow meta tags following opening/closing tags in the HTML token stream fed into the HTML tree builder and sees if the shadow meta has a corresponding tag in the DOM or if a tag is missing its shadow for its analysis. However, this analysis requires the shadows to be around at the same place where they were inserted in the HTML token stream.

Foster-parenting complicates this issue since the shadow metas can be hoisted out of the table which is the reason why we've hacked the HTML tree builder to not foster metas.

But, it might be possible to do the analysis without relying on shadow-meta position. For example, by assigning ids to tags and associating that id with the shadow-meta might work (or not since tree builder might clone inserted tags). Or maybe the position can be inferred from the position in the foster box. Not sure. This is the problem that needs to be solved. Once solved satisfactorily, our fostering hack in the HTML5 library can be removed.

A lot of wt2html and wt2wt results crucially rely on this information, so parser tests should accurately tell you what is working or not -- you can test it by turning off the markTreeBuilderFixups pass and see what happens.


Version: unspecified
Severity: normal

Details

Reference
bz53284

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:46 AM
bzimport added a project: Parsoid-DOM.
bzimport set Reference to bz53284.

In the dsr calculation, we'll also need to skip over all fostered content including fostered meta end markers. Those should still be stripped from fostered content.

Change 81455 had a related patch set uploaded by Arlolra:
Remove hack from tree builder to not foster metas.

https://gerrit.wikimedia.org/r/81455

Change 81455 merged by jenkins-bot:
Remove hack from tree builder to not foster metas.

https://gerrit.wikimedia.org/r/81455