Page MenuHomePhabricator

thead, tbody, tfoot for wikitable syntax
Open, MediumPublic

Description

Author: michael

Description:
Adding wikitable support for thead, tbody, and tfoot elements would be a harmless enhancement, allowing more sophisticated
formatting of tables (both in pages' wikitext, and using style attributes or style sheets).

Logically, each element would only need a start tag, and would be closed when the following element starts or the table ends (as
|- serves for table rows). Possible wikitable shortcuts:

thead:

|!    associated with table headers (but possibly confusing)
|^    analogous to GREP start of string
|<    analogous to XML/HTML tag opening
|[    opening bracket representing start

tbody:

|=    fatter version of table row
|[    enclosing bracket representing a block

tfoot:

|_    underscore=bottom line
|$    analogous to GREP end of string
|>    analogous to HTML/XML tag closing
|/    analogous to HTML/XML closing tag
|]    closing bracket representing ending
`

See also Bug 3156: T5156: Request not to filter <tbody> and </tbody> codes

Details

Reference
bz4740

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:02 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz4740.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

Why not use heuristics? Not ideal, but a great improvement over the current
situation. Proposal:

  1. Any sequence of rows falling at the end of a table and consisting entirely of

header cells is a <tfoot>.

  1. Any other sequence of rows consisting entirely of header cells is a <thead>.
  2. Any other sequence of rows is a <tbody>.

Thus you would get, e.g.

{|

+ Metasyntactic variables

! Computing

-
Foo
-
Bar
-

! English names

-
Jack
-
Jill
}

<table><caption>Metasyntactic variables</caption> <thead><tr><th>Computing</th></tr></thead> <tbody> <tr><td>Foo</td></tr> <tr><td>Bar</td></tr> </tbody> <thead><tr><th>English Names</th></tr></thead> <tbody> <tr><td>Jack</td></tr> <tr><td>Jill</td></tr> </tbody> </table>

which I believe is correct. Any counterexamples?

ezyang wrote:

The only trouble is if the heuristic turns out to be wrong. Unlikely, but
possible, and if you don't offer any way around it there will be problems.

ayg wrote:

Still better than the current setup, and it doesn't complicate wikimarkup (which
I think is why this isn't enabled).

fgregg wrote:

sorttable.js uses thead and tfoot to know what portions of a table to not sort.
Allowing the use of thead and tfoot would make that table sorting script much
easier to integrated with complicated tables.

paul wrote:

My original suggestion was to pass thead and /thead, if you can't pass them, can
you at least not display them on the output page, in effect ignoring them?

paul wrote:

One way would be to translate them with <!-- before and --> after the <thead>,
<tbody> etc. and </> closures, or have some way to mark them as non-displayed,
so that while they are ignored for functionality, they don't show up on the
rendered page.

bluehairedlawyer wrote:

I'm currently working on an implementation of this bug as per comment Aryeh's comments above. I have encountered considerable problems implementing his suggestion on a line of header cells at the end of a table being a tfoot. The problem is that when the program encounters:

! some header cells

it outputs:

<tr><th> some </th><th> header </th><th> cells

We would then only find out subsequently whether it was actually a footer. We could perform a simple search and replace but that would be greatly complicated by the possibility of embedded tables within the footer cells. As far as I can see implementing full heuristics would require a almost full rewrite. Or something like:

{|

+ Metasyntactic variables

! Computing

-
Foo
-
Bar
-

! English names

-
Jack
-
Jill
=
Footer
}

bluehairedlawyer wrote:

a structural method to implement structural elements: tbody, thead and tfoot

Ignore my previous comments. I've now substantially rewritten the doTableStuff() function, by separating the wiki syntax reading part from the bit that outputs the html. doTableStuff() now collects information about the table into an array which a new function, printTableHtml(), converts into html.

Attached:

bluehairedlawyer wrote:

I forgot to mention the patch includes changes to wikibits.js which didn't appear to support tbody, thead or tfoot elements after all. The changes make sortable tables work in Safari v3 and Firefox v3. It needs to be tested in ie6 and other browsers.

Btw...

{|
! header

}

{|
! header

-
content
-

! footer

}

but...

{|
! header

-
content
-

! header

-
}

I did this on purpose just in case people wanted to had headers at the bottom of their tables. It can be changed!

nicdumz wrote:

keywords : Patch, need-review

andy wrote:

I support the proposal to add these three elements; their availability, with class attributes, will greatly facilitate the use of microformats.

*** Bug 3156 has been marked as a duplicate of this bug. ***

a structural method to implement structural elements: tbody, thead and tfoot v2

Updated patch to apply cleanly to trunk.
Fails heaps of parser tests, fixing that now

attachment new.diff ignored as obsolete

a structural method to implement structural elements: tbody, thead and tfoot v2 v2

last patch contained unrelated changes

attachment new.diff ignored as obsolete

7912: a structural method to implement structural elements: tbody, thead and tfoot v

This ones passes all parsertests (except those which get upset by the new <tbody>). the new html tags are whitelisted now as well.
This patch would enable us to migrate to a better tablesorter script, which would fix a lot of the open table sorting bugs.

Attached:

I think it would be nice to have a new syntax for tfoot and thead rather then (only) hack around current one.

Parse with first row in thead:
{|

+ Title
-

! Head cell !! Head cell

-
Normal cellNormal cell
-
Normal cellNormal cell
}

Parse without thead:
{|

+ Title
-

! Head cell

Normal cell
-
Normal cellNormal cell
-
Normal cellNormal cell
}

Parse rows with "|!-" moved to thead (only if in concurrent rows). Parse rows with "|>-" moved to tfoot (only if in concurrent rows).
{|

+ Title
!-

! Head cell !! Head cell

!-

! Head cell !! Head cell

-
Normal cellNormal cell
-

! Head cell not in thead

Norma cell
-
Normal cellNormal cell
>-
Footer cellFooter cell
}

Nux: I'd say that it's better to do it on the existing syntax, since I can't see the use case of having a row that looks like a thead but structurally isn't.

Fixed in r85922

michael wrote:

So when the patch is implement, what syntax would I use to divide a table into two or more row groups using tbody elements? This is not clear from the descriptions above.

andy wrote:

Can we get an update, please?

Nobody is currently working on this.

I think this proposal needs a clearer description of use cases, and why those use cases justify the complexity costs in:

  • the wikitext user interface
  • the VisualEditor user interface
  • Parsoid

As an example, how would this be sensibly presented in VE?

andy wrote:

Gabriel: The use case is out lined in Michael Zajac's initial post (timestamp: 2006-01-24 02:41:52); and in comments 4 & 11. Do you have questions about those?

It appeared from comment 15 that this was resolved four years ago; no reason for its reversion has been given here.

andy wrote:

Also, the heuristic suggested above won't work, as it's necessary to allow for more than one tbody per table.

(In reply to Andy Mabbett from comment #24)

Gabriel: The use case is out lined in Michael Zajac's initial post
(timestamp: 2006-01-24 02:41:52); and in comments 4 & 11. Do you have
questions about those?

What I see there is

  1. allows for more sophisticated formatting (comment 1)
  2. sorttables not sorting thead / tfoot (comment 4)
  3. facilitation of microformats (comment 11)

Are 1) and 2) actually still issues? To me it sounds like 2) would only be an issue with a footer, which is relatively rare. Otherwise, detecting a row with <th> elements should not be hard in a script.

  1. Is rather nebulous given that you can just as well attach classes to trs.

I am asking for is a clear use case. I want to do X, it's not possible because of Y, and it will be possible once thead / tbody / tfoot are supported. This is worth the costs because of Z.

A related use case: allows Parsoid to handle arbitrary table markup in WTS phase.

Although my proposal (for the record) would be *not* to add new pipes-and-punctuation markup for <thead> <tfoot> etc, but instead to just allow them to be generated by literal HTML embedded in wikitext, eg https://en.wikipedia.org/wiki/Help:Table#Other_table_syntax

Once your table is sufficiently complicated, it's probably best to use literal HTML, IMO. But we still need to permit thead/tfoot/colgroup etc in literal HTML within wikitext.

michael wrote:

(In reply to Gabriel Wicke from comment #26)

  1. allows for more sophisticated formatting (comment 1)

The main reason I requested this is the ability for an editor to create multiple row groups by adding multiple tbody elements in a table. This would allow grouping data in tables, making these groups accessible to assistive devices like screen readers, allow visual formatting of the groups with CSS (other than redundant inline CSS), and allowing behaviours like collapsing groups.

The solution in comment 1 simply automates adding a whole-table tbody element, and does not satisfy the requirement (the HTML DOM implicitly includes a full-table tbody anyway, so this solution is redundant.)

Some use-case examples that would benefit from this:

michael wrote:

(In reply to C. Scott Ananian from comment #27)

Once your table is sufficiently complicated, it's probably best to use
literal HTML, IMO.

But grouping table rows is a very simple concept.

There is high demand. Editors are already attempting to do this in tens of thousands of tables using complex, inconsistent, inaccessible, inadequate, and inappropriate hacks (rows of table headers, horizontal rules, inline CSS, nested tables, etc.).

It should be possible to accomplish this with dead-simple wikitext, and visually format it consistently and automatically in standard style sheets.

@Michael Zajac: nothing related to table parsing in wikitext is simple, unfortunately. So I'm suggesting to concentrate on *making it possible*, and let the template authors and/or VE, etc, worry about making it "dead-simple".

But I'm open to suggestions. The HTML elements not currently supported in wikitext are thead, tbody, tfoot, colgroup, and col. If someone would like to open a new wikipage proposing concrete "dead-simple" wikitext syntax for these, I'd be happy to re-evaluate. (But please make your proposal on a wikipage, so that this bugzilla isn't bloated out with endless bikeshedding over tweaks to the syntax.)

Note that the original page was reverted (as I understand the history of this bug) because the implementation constructed an entire in-model memory of the table during processing. Wikipedia tables can be *huge*. So any syntax proposal must be able to be parsed without buffering and using as little table context information as possible. Similarly, you should be prepared to demonstrate (using greps over a wikipedia dump, or similar) that the proposed syntax does not break any existing table markup.

michael wrote:

@C. Scott Ananian I do appreciate that the parsing and programming are likely very complex. And also that white-flagging the HTML is a good improvement and probably a step towards creating a wikitext syntax for these elements.

But wikitable syntax is fairly simple for editors to use, and I hope that these efforts can eventually add a simple way to mark the start of a new tbody (row group), and the other elements. I’m sorry that currently I can’t invest time in this, but thanks for the suggestions on how to proceed.

(In reply to C. Scott Ananian from comment #30)

If someone would like
to open a new wikipage proposing concrete "dead-simple" wikitext syntax for
these, I'd be happy to re-evaluate. (But please make your proposal on a
wikipage, so that this bugzilla isn't bloated out with endless bikeshedding
over tweaks to the syntax.)

I note the correct place for such a proposal would be https://www.mediawiki.org/wiki/Requests_for_comment

@Brad -- yes, I thought about mentioning that, but reconsidered; I thought it would probably be more useful to stage a draft in some user's talk space (or similar) first and let people hack on it for a while, before making things formal and hoisting the text into the RfC namespace. I didn't want to discourage contributors by forcing the RfC template and formatting on them right away.

But: if you're not afraid of extra process and formatting and are feeling confident in your proposal, then sure throw it directly into RfC space.

Some (fixable) template bugs appear to have been introduced as a result of the introduction of automatically generated <thead> and <tbody> tags. See https://en.wikipedia.org/wiki/Template_talk:Articles_by_Quality_and_Importance#rowspan_and_thead_bug

See https://en.wikipedia.org/wiki/Template_talk:Articles_by_Quality_and_Importance#rowspan_and_thead_bug

In that talk @TheDJ points out that jquery.tablesorter sets thead/tfoot automatically (based on th elements). So, as an alternative to suggested manual wikitable syntax, what about setting thead/tfoot consistently, i.e. also for unsortable tabels and without JavaScript?

Task T5156 was marked as duplicate of this in 2010 (T5156#74974). However, that task was proposing to allow thead and tbody in wikitext, the same way we also allow <table>, <tr> and <td> already. Thus essentially opt-ing out of our own wikitext syntax.

That proposal was merged into here, where the conversation has mainly revolved around what custom syntax to use. I propose to shelf that in favour of doing what Brion proposed there in 2009 already:

In T5156#74956, @brion wrote:

We could toss thead, tbody, and tfoot into the table whitelist in Sanitizer::removeHTMLtags... to do it right one would need to expend extra effort to ensure nesting is correct, though. (Or else leave it to Tidy...)

And later by others at T26274: Allow Sanitizer to process tbody.

I think a custom syntax might have merit, but there's all sorts of compatibility and usability things to keep in mind there. Either way, we would need to allow it in the Sanitizer, and for use in complex templates we'd want the HTML-like syntax to have feature parity, so I propose to first cut through this and unlock the primitive. Then the conversation about custom syntax can continue at its own pace as an improvement, rather than as a blocker. (possibly on a separate task, or we can unmerge them and re-open T5156 and address that first).

Jdlrobson raised the priority of this task from Low to Medium.Oct 26 2022, 8:14 PM
Jdlrobson added a subscriber: Jdlrobson.

Hi @cscott this is now causing some issues in desktop improvements as certain articles use sticky positioning for th rows (more background in T289817#8225410). Vector would like to offet the thead instead. How can we make using thead in templates possible?

We cannot make syntax changes right now. We should probably have a conversation to see how we can move this forward. @JMcLeod_WMF, can you please handle this?

In T6740#90295, @cscott wrote:

A related use case: allows Parsoid to handle arbitrary table markup in WTS phase.

Although my proposal (for the record) would be *not* to add new pipes-and-punctuation markup for <thead> <tfoot> etc, but instead to just allow them to be generated by literal HTML embedded in wikitext, eg https://en.wikipedia.org/wiki/Help:Table#Other_table_syntax

Once your table is sufficiently complicated, it's probably best to use literal HTML, IMO. But we still need to permit thead/tfoot/colgroup etc in literal HTML within wikitext.

Re-upping my suggestion from 8 years ago.

And in an effort to scope this task, the desktop refresh work AFAICT *only* wants to be able to put the <TH> cells inside a <THEAD>. This could be a stop gap hack eg in the sanitizer ( look for table first row containing only TH, wrap it in THEAD), we don't necessarily need to boil the ocean all at once.

Technically, I think they'd want what tablesorter does. Find the first TR rows that only contain headercells and put those rows inside thead.
https://github.com/wikimedia/mediawiki/blob/master/resources/src/jquery.tablesorter/jquery.tablesorter.js#L277

@cscott, @ssastry and myself met to talk about this. We agree this is a missing feature in wikitext syntax and want to put this on the parser roadmap.

Fixing this in the legacy parser is likely to be risky and we don't think theres a quick solution here, right now.
For the sticky header bug, I'll create a new ticket referencing this one to explore short term workarounds.

I agree with TheDJ that moving consecutive "tr" rows that contain only "th" cells to the "thead" element would work, and wouldn't need any new wikitext syntax. Sortable and the sticky gadget (for wikitable) do this using JavaScript. It should be done for all tables (sortable, wikitable, plain table) so they are easier to style. Ideally, it should be moved during template translation (wikitext to HTML) instead of relying on JavaScript (see No-JavaScript).

Subsection headers might be an issue, where the first one would be included in the consecutive move. Ideally you only want column headers for the whole table in the "thead". Maybe add a class to exclude it, where it would matter if for example the "thead" element is made top sticky (preserves rowspan). I would also suggest fixing T355492 so "sorttop" rows aren't mixed into the "thead".

I think the problem is the existing table header usages are in different ways.

For example, there are usually 2 ways to present table headers:

https://www.mediawiki.org/wiki/Special:Version#mw-version-ext

(HTML modified/truncated for demonstration)

[ File 1 ]

image.png (667×1 px, 177 KB)

[ File 2 ]

image.png (623×1 px, 173 KB)

(we're not talking about which one is better, as we can't really prevent/block something [ File 2 ] happend in wikitext typed by users)

I think the problem is the existing table header usages are in different ways.
(we're not talking about which one is better, as we can't really prevent/block something [ File 2 ] happend in wikitext typed by users)

I think it's important to think of what should be easy and automatic for 98% of the users of the wikitext (grouping of header rows into a thead to benefit sticky headers and tablesorter), vs giving people access to every single last option they could possible want to use and make it perfect.

Automatically having a thead that works for sticky table headers and tablesorter seems valuable and common to me. Writing convoluted tables with multiple tbodies and multiple headers that also work correctly with sticky table headers, requiring a couple hundred more words to document how to do it using wikitext syntax that no one will be able to remember and that isn't used by other languages, is considerably less so.

A problem like this, to me is similar to the situation of talkpages. Yes, theoretically you can do almost anything on a wikitext talk page. But something like DiscussionTools won't be able to support absolutely everything. So if you deviate too far from what discussion tools expects, your page might not work with the discussion tools. And that's fine, because it will still be a basic wikitext page.

So if you are making a "wikitable" ik can either give up in a case like that, or we can have attribute/class to disable the automated behaviour and maybe you know some magic that can make it ever more feature rich. But I see no reason to bother the other 98% of people with that complexity.

Yeah. Just to make sure to be fail-safe/compatible/having fallback behavior, I mean.

(We should also have a guide about which kind of usage is prefereed and works better, that would be good.)