Page MenuHomePhabricator

Italics and bold: multiple quote sequences
Open, MediumPublic

Description

We have multiple html2wt regressions in the "Italics and bold: multiple quote sequences" tests, which were hidden because these tests were on the blacklist (the <nowiki> insertion previously made the html2wt tests fail).

The failing tests, along with their wt2wt output (as "After") are:

Italics and bold: multiple quote sequences: (3,4,2)
Before: '''foo''''bar''

<b>foo'</b>bar<i></i>

After: '''<nowiki>foo'</nowiki>'''bar''''

<b>foo'</b>bar'<b></b>

Italics and bold: multiple quote sequences: (3,4,3)
Before: '''foo''''bar'''

<b>foo'</b>bar<b></b>

After: '''<nowiki>foo'</nowiki>'''bar''''''

'<i>foo<b>bar'</b></i>

In the process of fixing these bugs, we should probably also write "clean" versions of the tests which include the proper <nowiki> tags in the input and therefore round-trip correctly, to ensure that we are better protected against regressions in the future.

These next tests are ok, but should have a clean version added. The "Before" version is what's in the test currently; after one round of --wt2wt it becomes the equivalent "Clean" version, which subsequently round-trips correctly:

Italics and bold: multiple quote sequences: (2,4,2)
Before: ''foo''''bar''
Clean: ''<nowiki>foo'</nowiki>'''bar'''''

Italics and bold: multiple quote sequences: (2,4,3)
Before: ''foo''''bar'''
Clean: ''<nowiki>foo'</nowiki>'''bar'''''

Italics and bold: multiple quote sequences: (2,4,4)
Before: ''foo''''bar''''
Clean: ''<nowiki>foo'</nowiki>'''<nowiki>bar'</nowiki>'''''


Version: unspecified
Severity: normal

Details

Reference
bz62569

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:03 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz62569.

Another legit bug:

5 quotes, code coverage +1 line
Before: '''''

<b><i></i></b>

After: ''''''''''

'''''<b><i></i></b>

See also bug 64321, which is quote-related.

(In reply to C. Scott Ananian from comment #0)

We have multiple html2wt regressions in the "Italics and bold: multiple
quote sequences" tests, which were hidden because these tests were on the
blacklist (the <nowiki> insertion previously made the html2wt tests fail).

The failing tests, along with their wt2wt output (as "After") are:

Italics and bold: multiple quote sequences: (3,4,2)
Before: '''foo''''bar''

<b>foo'</b>bar<i></i>

After: '''<nowiki>foo'</nowiki>'''bar''''

<b>foo'</b>bar'<b></b>

Italics and bold: multiple quote sequences: (3,4,3)
Before: '''foo''''bar'''

<b>foo'</b>bar<b></b>

After: '''<nowiki>foo'</nowiki>'''bar''''''

'<i>foo<b>bar'</b></i>

This is messy because of the context-sensitive nature of how quotes get parsed. Adding additional quotes for auto-closed italic tags changes the tokenization and hence parse. While we do have the auto-insertion flags in the HTML, we cannot use this unconditionally since edits could have changed the node.

However, for empty nodes that have auto-inserted flags, we can check DSR widths to determine whether it is safe to use information from the flags. Will experiment.

Change 120802 had a related patch set uploaded by Subramanya Sastry:
WTS: Skip auto-inserted tags for empty nodes with zero-range child dsr

https://gerrit.wikimedia.org/r/120802

(In reply to C. Scott Ananian from comment #2)

See also bug 64321, which is quote-related.

Looks like you provided the wrong bug id here.

Change 120802 merged by jenkins-bot:
WTS: Skip auto-inserted tags for empty nodes with 0-wide subtree DSR

https://gerrit.wikimedia.org/r/120802

The "five quotes" case is still broken. (See comment 1.) Should I open a new bug?

Opened bug 63119 for the unfixed case.

Reopening as the fix from https://gerrit.wikimedia.org/r/120802 is not considered safe.

Test case simulating a wrapper removal like <table><td><b></td></table>foo -> <b>foo:

$ cat /tmp/test.html
<b data-parsoid='{"stx":"html","autoInsertedEnd":true,"dsr":[15,18,3,0]}'></b>foo
$ cat /tmp/test.html | node parse --html2wt
<b>foo

Arlolra triaged this task as Medium priority.Nov 26 2014, 10:32 PM
Arlolra subscribed.
ssastry moved this task from Backlog to Bugs & Crashers on the Parsoid board.