Page MenuHomePhabricator

Fix parsing of "|}" on non-empty lines (table end tag should always be on a new line)
Closed, ResolvedPublic

Description

PHP parser does not recognize "|}" as a table closing tag on a non-empty line (which is how we end up with a pages on WPs with stray trailing |} wikitext on some lines). However, Parsoid recognizes them as a valid closing tag which then causes us to spectacularly bomb on those pages (Parsoid tries to recover and fix up, etc. but which doesn't always work).

The right fix is to fix the tokenizer to require "|}" to be on a new line (leading whitespace and and other sol-transparent text should be fine).


Version: unspecified
Severity: normal

Details

Reference
bz57360

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:31 AM
bzimport set Reference to bz57360.

This will also require fixing the Parsoid serializer to emit "|}" on new lines.

Change 103572 had a related patch set uploaded by Subramanya Sastry:
(Bug 57360) Fix parser/serializer to accept/emit "|}" in SOL posns

https://gerrit.wikimedia.org/r/103572

Change 103572 merged by jenkins-bot:
(Bug 57360) Fix serializer to emit "|}" in SOL posn

https://gerrit.wikimedia.org/r/103572

Followup patch coming from gwicke.

<gwicke> Re the {| |} issue, I re-did my grep search with a better regexp and am now finding quite a few matches that look like {| <some attributes |}
<gwicke> the PHP parser strips the end tag in those cases, so maybe we should just strip it too?
<gwicke> {| class="wikitable"|} is a construct I see repeatedly
<gwicke> also {| class="wikitable"|}" style="text-align:center"
<gwicke> would be interesting to see where that was all copy & pasted from ;)
<gwicke> {|border=1 align=left cellpadding=0 cellspacing=0 style="width: 48%" {{Election city polls FPTP begin|locale = town| title=[[Canadian federal election, 2006]]<br>Hudson's Hope polls in Prince George—Peace River<ref name=06fed/>}}|}
<gwicke> just dropping the end tag token should be good enough I think
<gwicke> and accepting it anywhere in the attribute sequence
<gwicke> can write a patch for that

Change 105019 had a related patch set uploaded by GWicke:
Bug 57360: Eat stray table end tags in table start tag attributes

https://gerrit.wikimedia.org/r/105019

Change 105019 merged by jenkins-bot:
Bug 57360: Eat stray table end tags in table start tag attributes

https://gerrit.wikimedia.org/r/105019