Failed to cope with mismatched superscript and subscript tags
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	SalixAlba
	Feb 7 2014, 7:49 AM

Description

Intention:
Just started VE on the article [[en:Divergent series#Zeta function regularization]]

Steps to Reproduce:

https://en.wikipedia.org/wiki/Divergent_series?veaction=edit
scroll down to the Zeta function regularization section near the end
observe the last two sentences.

Actual Results:
The final sentence has kept the superscript style. It appears as
...the trace of ''A''–''s''. For example...

Expected Results:
The final sentence should appear as
...the trace of ''A''–''s''. For example...

Reproducible: Always

If you edit the second sentence then when pressing save page and preview it shows VE has put the closing in the wrong place at the end of the second sentence.

Version: unspecified
Severity: minor
URL: http://parsoid.wmflabs.org/enwiki/Divergent_series?oldid=594238903

Details

Reference: bz61011

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 22 2014, 3:05 AM

• bzimport added projects: Browser-Support-Google-Chrome, Parsoid.

• bzimport set Reference to bz61011.

SalixAlba created this task.Feb 7 2014, 7:49 AM

Just spotted the problem. The source text had mismatched tags, a sup and a sub: ''A''–''s''

Maybe still a tiny bug. The normal render recovers nicely when given the text
''A''–''s'', it seems to change the closing to a . VE (Parsoid?) does not recover quite so nicely, putting the closing at the end of the paragraph.

I've fixed the source now so you will need to look at an old version
https://en.wikipedia.org/w/index.php?title=Divergent_series&oldid=594238903

Shifting over to Parsoid – Gabriel, do you have a view as to whether Parsoid should assume means in context?

We generally rely on the HTML5 treebuilder to fix issues like this. We could add code to handle sub / sup mismatches in a non-standard way, but this would add complexity in both that pass and the serializer (to avoid dirty diffs).

Is this issue common enough to warrant the cost of special-case handling?

Probably not. Its quite an easy mistake to make, no doubt there are other instances, but a really very minor problem. A better option might be a periodic scan of the database to spot miss-matched tags.

I think this could be one more test case / scenario for the wikitext linting (bug 46705), but a WONTFIX for primary Parsoid functionality?

Created attachment 14679
Python file to scan a database dump looking for miss matched and

Attached:

subsup.py884 BDownload

I've run a scan on one part of the database dumps and it found 97 mainspace articles with some problems. There are 27 dump files so I guess its about 2700 articles with some problems. Often these just an extra at the end of a reference, which might come from a copy and paste but there are a few errors in maths pages with unbalanced tags.

The attached python script is quite simple it just counts the number of opening and closing tags and prints the article name if they don't match. The -d option prints the lines where the counts don't match.

(A related discussion is at https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Check_Wikipedia#Mismatched_sub_and_sup_tags .)

The Check Wikipedia project has been made aware of this problem and a scan found 7,096 articles with problems. https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Check_Wikipedia#Mismatched_sub_and_sup_tags

For the most part these have been fixed on en.wiki and tests are being built into AWB and other tool.

The people at checkwiki did ask for more info about HTML treebuilder so see if there are similar problems which might be worth checking.

I would say a WONTFIX would be fine for this as there are now tools for bots to check this.

Arlolra removed • GWicke as the assignee of this task.Nov 25 2014, 7:57 PM

Arlolra set Security to None.

ssastry moved this task from Needs Triage to Robustness on the Parsoid board.Dec 22 2014, 12:39 AM

ssastry renamed this task from Failed to cope with miss matched superscript and subscript tags to Failed to cope with mismatched superscript and subscript tags.Feb 2 2015, 2:56 PM

• RobLa-WMF added a project: Parsoid-Robustness.Feb 11 2015, 6:12 PM

• marcoil moved this task from Robustness to Needs Triage on the Parsoid board.Feb 13 2015, 12:49 PM

Arlolra removed a project: Browser-Support-Google-Chrome.Feb 14 2015, 11:03 PM

Krinkle unsubscribed.Jan 12 2018, 11:52 PM

LGoto moved this task from Needs Triage to Backlog on the Parsoid board.Feb 17 2020, 4:43 PM

These kind of tag mismach is now handled by wikitext linting (for unclosed tags and stripped tags) and there is nothing to do in Parsoid here.

	F13324: subsup.py
	Nov 22 2014, 3:05 AM

Failed to cope with mismatched superscript and subscript tagsClosed, DeclinedPublicActions

Description

Details

Event Timeline

Failed to cope with mismatched superscript and subscript tags
Closed, DeclinedPublic
Actions