Page MenuHomePhabricator

Template creating CSS in a table's cell does not come through
Closed, ResolvedPublic

Description

Screenshot

When edited in VisualEditor, the following table:

{| class="wikitable"
|- 
! Header
|-
| {{Yes}}
|}

... shows the following string in place of the {{Yes}} template:

style="background: #90ff90; color: black; vertical-align: middle; text-align: center; " class="table-yes"|Yes

See screenshot.
Live at http://en.wikipedia.org/wiki/User:Nicolas1981/Sandbox


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=50589

Attached:

template-in-table.png (133×742 px, 8 KB)

Details

Reference
bz44498

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:22 AM
bzimport set Reference to bz44498.

This is a Parsoid bug; couldn't find the master with which to mark as dupe (sorry, Gabriel!).

This will be hard to support, as the nested template produces part of the table cell syntax plus the table cell content. Moving the pipe starting the table cell into the template would avoid this issue as the template would then expand to a self-contained table cell with content.

Are there other templates following this pattern apart from Yes (and probably No)?

  • Bug 50772 has been marked as a duplicate of this bug. ***

There are many templates in this set.
https://en.wikipedia.org/wiki/Template:Table_cell_templates

And the most commonly used ones are copied to other languages. See the interwikis for {{Yes}}
https://en.wikipedia.org/wiki/Template:Yes

  • Bug 50366 has been marked as a duplicate of this bug. ***

kwwilliams wrote:

This one as well: http://en.wikipedia.org/w/index.php?title=List_of_Big_Time_Rush_episodes&diff=prev&oldid=565906957

This really isn't a "low importance" bug by any stretch of the imagination. It affects most discographies, most television episode lists, and most award lists.

Bumping up the importance, per comment 7.

From an email thread with Subbu & C.Scott:

Subbu writes:

Gabriel and I started discussing this the week before he headed out on vacation ... we'll continue once he is back ... we might either have to hack or use other tricks to support these templates. A bot pass (https://bugzilla.wikimedia.org/show_bug.cgi?id=50547) is one idea we've considered (first fix up the templates, and then use a bot to go and fix pages that use the templates).


I asked:

If we wanted to improve the markup of these templates so they can be more sanely parsed, what instructions would we provide?


Subbu responds:

Instead of generating just style attributes of a table cell, make the "|" char part of the template. So, the templates should produce a styled-cell rather than just the cell-style. And, that also means that the pages that use these should not have "|" char before the transclusion -- but we may be able to work around by recognizing that pattern (I say that without working through the details).

So, instead of (old wikitext):
{|

{{yes}}
}

they would have to code this as:
{|
{{yes}}

}

Looks like there are quite a few of these here: https://en.wikipedia.org/wiki/Template:Table_cell_templates I haven't investigated all the various ways these templates are used, but at first pass, I think that approach should fix the problem.


C.Scott responds:

It's probably worth bringing the VE team into this discussion. I
think that VE would probably struggle trying to edit table cell(s)
generated with a template containing a style + cell contents in this
fashion and/or have difficulty processing whatever markup we decide
Parsoid "should" generate for this case. My intuition is that fixing
the templates to include the leading vertical bar, so that the
template is clearly a cell, not part-of-a-cell, is a better long-term
solution for editing (Parsoid+VE). It seems like a bot to make this
change could be implemented + deployed relatively quickly, although I
haven't actually written a bot before, maybe there are community or
other factors limiting the speed at which bot edits are made?
--scott
ps. who'd be interested in writing said bot, because he's never done
that before, although I suspect it would be best farmed off to someone
who's done this sort of thing before.

And also adds:

Note that there are some anomalous usages, for example:

! {{yes}}

rowspan=2 {{n/a}}

The first case would have to be {{yes|type=header}} and the second
something like {{n/a|rowspan=2}}, both with additional changes to the
table cell template. In other words, the issue isn't just that the
style and cell content boundary is spanned, it's that some of the
style comes from outside the template and some comes from inside.

I still think a bot might be the best long-term fix, but it's not
entirely straightforward. If we were to fix it inside parsoid, we'd
have to introduce something like an "open style context", probably
buffering the style information and spitting it out only after we've
seen the style context closed. It would still be an "interesting"
editing challenge in VE -- some of the table style information would
be editable, some would come from the template (and be editable only
inside the template editor?).
--scott

kwwilliams wrote:

Why would we change the contents of articles to match something that Parsoid likes instead of repairing Parsoid?

The specification of Parsoid surely must have included a spec that all templates and wikitext syntax in widespread use in English wikipedia needed to be handled properly. Since Parsoid doesn't meet that requirement, repair Parsoid.

We are definitely not trying to make life harder for editors. We are just trying to figure out the best solution going forward.

Visual editing is a HTML-based operation, and HTML has structural semantics. Longer-term, it would make for a better and simpler software and editing interface if templates generated entire html elements instead of part of elements (which these templates do). We are just trying to reconcile wikitext structural information with HTML structural information with a view towards the long-term.

While we definitely welcome participation and support of existing template/bot authors here, we are definitely not planning to offload this job of writing bots or updating pages onto editors.

All that said, resolving this issue might well involve a temporary interim fix and a strategy for a longer-term resolution that moves wikitext and templates towards a more cleaner solution that is more aligned with HTML DOM semantics.

kwwilliams wrote:

(In reply to comment #11)

We are definitely not trying to make life harder for editors. We are just
trying to figure out the best solution going forward.

The best solution going forward is to repair Parsoid. While I agree that we can do things with wikitext that no sane person would *want* to, there's no way to prevent us from doing so, and VE cannot fail based on things that work fine in the standard editor.

Bots that modify existing articles to bypass bugs in Parsoid should be off the table as a solution.

Change 76626 had a related patch set uploaded by Cscott:
Add 2 more dumpGrepper patterns to get a handle on bug 44498.

https://gerrit.wikimedia.org/r/76626

Change 76626 merged by jenkins-bot:
Add 2 more dumpGrepper patterns to get a handle on bug 44498.

https://gerrit.wikimedia.org/r/76626

(In reply to comment #12)

Bots that modify existing articles to bypass bugs in Parsoid should be off
the table as a solution.

[[Hard cases make bad law]]. :-)

As a general rule, Parsoid should be handle wikitext in the wild. However, we need more information about the prevalence of these constructs on Wikimedia wikis and the amount of work required to properly handle them before we can make any real decisions here. Scott's Gerrit changeset 76626 seems to be a step in the right direction.

kwwilliams wrote:

(In reply to comment #15)

(In reply to comment #12)

Bots that modify existing articles to bypass bugs in Parsoid should be off
the table as a solution.

[[Hard cases make bad law]]. :-)

As a general rule, Parsoid should be handle wikitext in the wild. However, we
need more information about the prevalence of these constructs on Wikimedia
wikis and the amount of work required to properly handle them before we can
make any real decisions here. Scott's Gerrit change #76626 seems to be a step
in the right direction.

I'm more an adherent to the concept that rigorous adherence to fundamental principles always pays off in the long run, and "Parsoid should handle all wikitext currently in existence in English Wikipedia" strikes me as one of those fundamentals.

I'm concerned by the existence of this one and its interplay with 50589. It appears that Parsoid and VE made some assumptions about how templates, wikimarkup, and tables would interact that aren't true.

Take a look at the certification table in http://en.wikipedia.org/w/index.php?title=Metallica_%28album%29&oldid=566196578&veaction=edit for example. There's something going on in http://en.wikipedia.org/wiki/Template:Certification_Table_Top or http://en.wikipedia.org/wiki/Template:Certification_Table_Entry that's causing more weirdness. I don't know if that's this one, 50589, or yet another related bug.

Having looked into it a bit, I agree that the current bug is related to bug 50589. However, bug 50589 *should* work -- the template is still well structured, it is just generating content for more than one table cell. This bug, on the other hand, has a template which spans structural boundaries in the HTML. For example:

{|

align=center {{table_attribs}}
}

where {{table_attribs}} expands to 'style="color: red"|Foo'. This wants to generate something like the following HTML:

Foo

Note that the <td> and some of its attributes come from the article text, and the rest of the attributes come from the template. I might be able to hack this through Parsoid, but VE will probably never be able to edit the result. (Imagine how you'd present the table cell editing UI for something like this.) So the bot approach should probably be used to clean up constructs like these, to something like:

{|
{{table_attribs|align=center}}

}

Now the template is well-structured in the resulting html.

My current approach is to attempt to get these cases parsed by parsoid, resulting in uneditable-but-visually-correct output. We will still want a bot to rewrite the templates so to make them editable in the articles using them.

As MZMcBridge correctly guessed, I'm currently running a grep over an enwiki dump to figure out exactly how many articles use table cell templates. It's not done running yet so I don't have exact statistics, but the short answer is, "a lot". So far nothing that can't be

rewritten in a more-or-less straightforward manner.

kwwilliams wrote:

Scott, is the glitch with the certification tables at http://en.wikipedia.org/w/index.php?title=Metallica_%28album%29&oldid=566196578&veaction=edit another effect of 50589, this bug, or should I open a new bug report for it?

Change 76758 had a related patch set uploaded by Cscott:
Add new parserTests for table cell attributes coming from templates.

https://gerrit.wikimedia.org/r/76758

Change 76758 merged by jenkins-bot:
Add new parserTests for table cell attributes coming from templates.

https://gerrit.wikimedia.org/r/76758

(In reply to comment #19)

Scott, is the glitch with the certification tables at
http://en.wikipedia.org/w/index.
php?title=Metallica_%28album%29&oldid=566196578&veaction=edit
another effect of 50589, this bug, or should I open a new bug report for it?

No, this is bug 52254 (which Roan already opened based on your report).

kwwilliams wrote:

(In reply to comment #22)

(In reply to comment #19)

Scott, is the glitch with the certification tables at
http://en.wikipedia.org/w/index.
php?title=Metallica_%28album%29&oldid=566196578&veaction=edit
another effect of 50589, this bug, or should I open a new bug report for it?

No, this is bug 52254 (which Roan already opened based on your report).

52254 is relative to the incorrect italicization and bolding in the lead. It does not cover the corrupted certification table (near the bottom), which is another case of a template creating table markup which is displayed incorrectly.

kwwilliams wrote:

(In reply to comment #23)

(In reply to comment #22)

(In reply to comment #19)

Scott, is the glitch with the certification tables at
http://en.wikipedia.org/w/index.
php?title=Metallica_%28album%29&oldid=566196578&veaction=edit
another effect of 50589, this bug, or should I open a new bug report for it?

No, this is bug 52254 (which Roan already opened based on your report).

52254 is relative to the incorrect italicization and bolding in the lead. It
does not cover the corrupted certification table (near the bottom), which is
another case of a template creating table markup which is displayed
incorrectly.

Opened https://bugzilla.wikimedia.org/show_bug.cgi?id=52296

The more I play with it, the more it feels like a third related bug. It's clearly not 50589 because Parsoid can't parse it correctly, but I don't think it's a duplicate of this one, either.

Change 76861 had a related patch set uploaded by Cscott:
WIP: first draft of td reparse.

https://gerrit.wikimedia.org/r/76861

On the enwiki-20130102-pages-articles-multistream.xml.bz2 dump, there are:
13,057,082 articles, of which
11,877 (0.091%) have some use of [[Template:Table_cell_templates]], and
1,875 (0.014%) have a non-trivial use (ie, the preceding character is not |).

kwwilliams wrote:

(In reply to comment #26)

On the enwiki-20130102-pages-articles-multistream.xml.bz2 dump, there are:
13,057,082 articles, of which

That looks like all namespaces, not just articles.

11,877 (0.091%) have some use of [[Template:Table_cell_templates]], and
1,875 (0.014%) have a non-trivial use (ie, the preceding character is not

).

That sounds about right. As a percentage of our most frequently edited articles, I suspect the percentage is much higher, though. The primary use is in award tables, and while the millions of articles about ghost towns, obscure shellfish, and asteroids tend not to discuss awards, while the ones about modern music, television, and films frequently do.

Parsoid support for this and a related table parsing issue is tracked in bug 50603, and should be ready in the next weeks.

I'm working on a DOM-based handler for this. Re-assigning the bug to myself.

(In reply to comment #19)

Scott, is the glitch with the certification tables at
http://en.wikipedia.org/w/index.
php?title=Metallica_%28album%29&oldid=566196578&veaction=edit
another effect of 50589, this bug, or should I open a new bug report for it?

That was bug 50589, which is now fixed.

(In reply to comment #31)

(In reply to comment #19)

Scott, is the glitch with the certification tables at
http://en.wikipedia.org/w/index.
php?title=Metallica_%28album%29&oldid=566196578&veaction=edit
another effect of 50589, this bug, or should I open a new bug report for it?

That was bug 50589, which is now fixed.

Correction: It was very similar to 50589, but not quite the same.

In other news about this bug, https://gerrit.wikimedia.org/r/#/c/79943/ is now deployed, which covers the vast majority of cases including the ones mentioned here. Closing as fixed.

Minor niggle: The caches are not yet cleared as we don't have the rights to do so. Cached pages will only update to the correct rendering after that has happened.

Change 76861 abandoned by Subramanya Sastry:
WIP: first draft of td reparse.

Reason:
Not required anymore. A DOM-based solution is already merged and in production.

https://gerrit.wikimedia.org/r/76861

ssastry set Security to None.