Page MenuHomePhabricator

Failed to cope with mismatched superscript and subscript tags
Closed, DeclinedPublic

Description

Intention:
Just started VE on the article [[en:Divergent series#Zeta function regularization]]

Steps to Reproduce:

  1. https://en.wikipedia.org/wiki/Divergent_series?veaction=edit
  2. scroll down to the Zeta function regularization section near the end
  3. observe the last two sentences.

Actual Results:
The final sentence has kept the superscript style. It appears as
...the trace of ''A''<sup>–''s''. For example...</sup>

Expected Results:
The final sentence should appear as
...the trace of ''A''<sup>–''s''<sup>. For example...

Reproducible: Always

If you edit the second sentence then when pressing save page and preview it shows VE has put the closing </sup> in the wrong place at the end of the second sentence.


Version: unspecified
Severity: minor
URL: http://parsoid.wmflabs.org/enwiki/Divergent_series?oldid=594238903

Details

Reference
bz61011

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:05 AM
bzimport set Reference to bz61011.

Just spotted the problem. The source text had mismatched tags, a sup and a sub: ''A''<sup>–''s''</sub>

Maybe still a tiny bug. The normal render recovers nicely when given the text
''A''<sup>–''s''</sub>, it seems to change the closing </sub> to a </sup>. VE (Parsoid?) does not recover quite so nicely, putting the closing </sup> at the end of the paragraph.

I've fixed the source now so you will need to look at an old version
https://en.wikipedia.org/w/index.php?title=Divergent_series&oldid=594238903

Shifting over to Parsoid – Gabriel, do you have a view as to whether Parsoid should assume </sub> means </sup> in context?

We generally rely on the HTML5 treebuilder to fix issues like this. We could add code to handle sub / sup mismatches in a non-standard way, but this would add complexity in both that pass and the serializer (to avoid dirty diffs).

Is this issue common enough to warrant the cost of special-case handling?

Probably not. Its quite an easy mistake to make, no doubt there are other instances, but a really very minor problem. A better option might be a periodic scan of the database to spot miss-matched tags.

I think this could be one more test case / scenario for the wikitext linting (bug 46705), but a WONTFIX for primary Parsoid functionality?

Created attachment 14679
Python file to scan a database dump looking for miss matched <sup> and <sup>

Attached:

I've run a scan on one part of the database dumps and it found 97 mainspace articles with some problems. There are 27 dump files so I guess its about 2700 articles with some problems. Often these just an extra </sup> at the end of a reference, which might come from a copy and paste but there are a few errors in maths pages with unbalanced tags.

The attached python script is quite simple it just counts the number of opening and closing tags and prints the article name if they don't match. The -d option prints the lines where the counts don't match.

The Check Wikipedia project has been made aware of this problem and a scan found 7,096 articles with problems. https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Check_Wikipedia#Mismatched_sub_and_sup_tags

For the most part these have been fixed on en.wiki and tests are being built into AWB and other tool.

The people at checkwiki did ask for more info about HTML treebuilder so see if there are similar problems which might be worth checking.

I would say a WONTFIX would be fine for this as there are now tools for bots to check this.

Arlolra set Security to None.
ssastry renamed this task from Failed to cope with miss matched superscript and subscript tags to Failed to cope with mismatched superscript and subscript tags.Feb 2 2015, 2:56 PM

These kind of tag mismach is now handled by wikitext linting (for unclosed tags and stripped tags) and there is nothing to do in Parsoid here.