Page MenuHomePhabricator

Colon (:) & semicolon (;) shouldn't output as HTML definition list when used for indentation, boldfacing
Open, MediumPublic

Description

Author: roac

Description:
Wikipedians often use lines starting with one or more colons to indent a line:
e.g.
: this line is indented
:: double indented

I recently noticed that the generated HTML contains a definition list (<dl/>).
Obviously this is not a good means to indent text. (http://www.w3.org/TR/html401/struct/lists.html#h-10.3)
Apparently, the mediawiki-syntax for defintion lists (;term : def) is being abused for visual effects.

I suggest that a <div style="margin-left: 2em"></div> is used instead of <dl><dd></dd></dl>, whenever
a colon is found without a preceding semicolon.

keywords: indentation colon semicolon dl dt dd definition list abuse parser


Version: 1.6.x
Severity: minor
URL: http://en.wikipedia.org/wiki/User:Joris_Gillis/SF
See Also:
T33146: Replace colons by em spaces at the beginning of verses inside poem tag

Details

Reference
bz4521

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

michael wrote:

Div-style hacks won't work without CSS

Which is a problem for all fifty people who don't use CSS

Divs are semantically meaningless. Using divs + css to create so-called
structure on a page is just inadequate, 1999-style markup, unworthy of
open software Wikimedia or a forward-looking project like Wikipedia.

For some perspective, see "level 4" at both:

Levels of CSS knowledge
<http://friendlybit.com/css/levels-of-css-knowledge/>

Levels of HTML knowledge
<http://www.456bereastreet.com/archive/200605/

levels_of_html_knowledge/>

ayg wrote:

(In reply to comment #17)

Divs are semantically meaningless. Using divs + css to create so-called
structure on a page is just inadequate, 1999-style markup, unworthy of
open software Wikimedia or a forward-looking project like Wikipedia.

I'm well aware of the benefits of semantic markup. The problem is, 95% of our
editors are not, and they're most of the ones who are using things like :. If
you use <dd> or <blockquote> to indent it, you're saying that the item is a
definition or a quote, when nine-tenths of the time it isn't. It's usually a
comment in a discussion, or otherwise just something that someone wants to indent.

To put it another way: users are entering content presentationally, not
semantically, and adding probably-false semantic meaning to their solely
presentational input is much worse from a semantics perspective than admitting
that, in fact, there are no semantics to the input. Genuine semantics is good;
calling all indentation "blockquote" so that you don't have to use the dreaded
meaningless div is pointless from a semantic perspective. <div class="indent">
is much more sensible and honest, because that's what it was entered as.

michael wrote:

Yes, but this bug and follow-up comments are an attempt to improve the situation in some way, not just throw
up our hands and give up. A definition list may not be perfect for discussions, but at least it expresses the
nested structure. Switching to divs would be discarding even this and turning the page to text soup.

It would be nice if wikitext markup for blockquotes existed. We may be stuck with definitions and discussion
threads being conflated, but maybe we can come up with some bright ideas to fix that.

ui2t5v002 wrote:

If I remember correctly, the HTML spec for definition lists is pretty loose,
allowing them to be used for plays, for instance. Using them for talk pages is
not semantically wrong.

Using them for indentation is.

michael wrote:

Dialogues is given as an example of other applications of definition lists in
the specification, but this still implies a term-definition relationship
between the parts. For example:

<dl>
<dd>John</dd><!-- a label for John's statements -->
<dt>Hi, how are you?</dt><!-- defines what John said -->
</dl>

Nested discussion in Wikipedia is a list of definition lists containing
definitions, but no defined terms, so I don't think it really makes semantic
sense as a definition list in the same way as a script. However, the nested
lists do imply a hierarchy, and the default indented formatting of
definitions does imply the same hierarchy in almost any text-only or
graphical web browser (don't know how well it works in screen readers).

Changing this formatting to divs would eliminate both the semantic and
visual relationship completely. This would make things worse, and not
acceptable, even if CSS was added to make it look the same in graphical
browsers.

Technically, the DTD allows a DL to contain only one or more terms, or
only one or more definitions, so there is no problem with validation.

So although using colons for threaded discussion is semantically odd, it
does work visually and semantically. The more I think about it, the more
comfortable I am with it. Since it is easy to type and is firmly entrenched
in Wikipedia talk pages, I suggest closing this bug.


Tangentially-related issue:

However, the indented display of definition lists is actually rather
unsuitable for definition lists in articles—it breaks up the left-hand vertical
line of text and looks sloppy. In countless instances, editors enter
wikitext like the following instead, where a definition list would be a
perfect semantic fit:

'''Term'''<br />
Definition

So, when both Bug 6200 (Linebreaks are mishandled in <blockquote> and
<li>) and Bug 4827 (blockquote support in wikitext) are fixed, so that
there is no longer any incentive to abuse definition lists for block
quotation formatting in articles, then the common style sheet ought to be
updated to not indent definitions, in articles only.


References:

Introduction to lists
http://www.w3.org/TR/html401/struct/lists.html#h-10.1
"Definition lists, created using the DL element, generally consist of a series
of term/definition pairs (although definition lists may have other
applications)."

Definition lists: the DL, DT, and DD elements
http://www.w3.org/TR/html401/struct/lists.html#h-10.3
"Definition lists vary only slightly from other types of lists in that list items
consist of two parts: a term and a description."

In the DTD for XHTML 1.0 transitional
http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-
transitional.dtd_dl
"<!ELEMENT dl (dt|dd)+>"

ui2t5v002 wrote:

(In reply to comment #21)

Dialogues is given as an example of other applications of definition lists in
the specification, but this still implies a term-definition relationship
between the parts.

They also give an example of a recipe, where the DTs function almost like
headers, and the DD contains another list or paragraph. I think that when they
say "although definition lists may have other applications", they mean it quite
liberally. I don't think using them for threaded discussions is semantically
wrong, regardless of the fact that there are no DTs.

Since it is easy to type and is firmly entrenched
in Wikipedia talk pages, I suggest closing this bug.

That would imply that the bug is only about using definition lists for threaded
discussions, but the title of the bug is about indentation, which *is* still a
problem in articles. Things like indenting math formulas and the like.

However, the indented display of definition lists is actually rather
unsuitable for definition lists in articles—it breaks up the left-hand vertical
line of text and looks sloppy.

I think it looks good. :-) I convert things like the example you gave to
definition lists whenever I see them.

So, when both Bug 6200 (Linebreaks are mishandled in <blockquote> and
<li>) and Bug 4827 (blockquote support in wikitext) are fixed, so that
there is no longer any incentive to abuse definition lists for block
quotation formatting in articles, then the common style sheet ought to be
updated to not indent definitions, in articles only.

I disagree. We should figure out what people are semantically trying to do when
they use DDs for indentation, and provide markup and CSS to provide the same
effect the correct way. They use it for blockquotes, so we have the
<blockquote> tag instead. They use it for indentation of disambig links, so we
should build the indentation into the dablink class and remove the DD from the
template, etc.

buzz wrote:

This one is really bugging me. I want to use definition lists for their real purpose and have them styled so they are inline and with slightly different default margins etc

Title: Definition
Title: Definition

but this of course messes up everywhere that : has been used for indentation. Having it so : :: ::: is parsed differently would be great. I would be quite happy for it to use <div class="indent"> or so

of course I can wrap my inline definition lists in a div and style it like that, but it seems like overcoming the problem in the wrong way

And it just seems wrong to use definition lists for indentation anyway.

Is it possible to parse

;title:defintion
and
;title
:definition

differently from

:indent
::indent

?

smccandlish wrote:

In response to an older comment, it isn't "arguably" semantically incorrect, it IS semantically incorrect. I don't care which of the proposed solutions is implemented, as long as its use for simple indentation renders as CSS not definition lists by the time it hits the user agent. Web markup semantics are important for accessibility reasons, among others.

Changing summary to reflect the direction of attack we would actually follow.

smccandlish wrote:

Generally I like the way this is heading. My 2 cents:

  1. General-purpose indentation should be done with divs.
  2. Blockquotes should be used for quotations, not general indentation.
  3. Definition lists should be used when the editor is intentionally demonstrating a relationship between the two parts (term + definition, character + dialogue, etc.).
  4. Ul/ol lists should not be used for things that are not actually lists.

And I concur strongly that that the present behavior of using dl/dt/dd lists for presentational indentation IS semantically invalid (not "arguably"), and that this is important to fix for accessibility and other reasons.

smccandlish wrote:

FYI: While fixing this bug would be very helpful, I have to point out that there are quite a few other definition list problems in MediaWiki, as shown by some simple test cases:

http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style_(glossaries)/DD_bug_test_cases

I'll probably file this as another bug (or more - there may be more than one issue here) after further investigation, and notify this Cc loop of the new bug number. But I think this may be something to do with MW's weird handling of lists more generally.

PS: For anyone pulling their hair out with vagaries of ";" and ":" markup, just use the real HTML tags, and the problems melt away. But don't mix-and-match.

smccandlish wrote:

Updating bug title to reflect related problem. Colon is output as a definition list definition (dd element), and semicolon, often abused for boldfacing and creating pseudo-headings, outputs a definition list term (dt element). Both of these should be replaced with CSS, at least if they are not in an actual definition list. There are three ways to handle this:

  1. Stop connecting this wikimarkup in any way to definition lists (which would have to be HTML-coded manually, like blockquotes and various other things that MediaWiki doesn't have special wikimarkup for).
  1. Have the parser test the conditions of the markup, such that if the material is formatted like:

    ;A1 :A2 ;B1 :B2

it is treated as a definition list, but if it has blank lines between any of these, or a ; without one or more :'s or vice versa, or otherwise doesn't fit this pattern, treat it as CSS-styled, non-list text.

  1. Always treat this markup as CSS-style regular prose, unless it is inside an explicit HTML dl element, in which case always treat it as a definition list (regardless of whitespacing and regardless of missing definitions or terms).
  1. Or some combination of these. I'm marginally against option 1, and feel that 3 should usually apply (always apply in the case of explicit dl markup), but can't see anything wrong with MW doing some very limited guesswork as in option 2.

ayg wrote:

I'd prefer (2). There are plenty of times I've used this for an actual definition list, and almost all of the abuses I've seen are cases where ";" isn't used at all, which is very easy to detect automatically. We should just treat ":" without accompanying ";" as <div class="indent"> or something, that's enough to fix the large majority of the cases.

smccandlish wrote:

content hidden as private in Bugzilla

cogden1970 wrote:

I just want to register my strong agreement with #2. I think there is a need for the association list markup, and that it should not be difficult to separate out sole : lines, or ; lines without one or more corresponding :s, for special treatment as <div>s.

Will this be addressed in Brion Vibber's parser rewrite?

(In reply to comment #31)

That would work for me too, provided that the corresponding case of ";" without
accompanying ":" be treated as a div with class="indent boldface" or whatever,
so that both abuses of def. list markup are fixed. PS: If you find that, when
you are actually using ";" and ":" for def. lists, that you can't get the
layout you want, try using explicit dl, dt, dd markup, and the problems go away
(see [[WP:MOSGLOSS]] for details).

(I'm quoting this message by S. McCandlish, posted on 2010-07-28 21:45:46 UTC becuase his original reply contained appended spam, which is the reason I hid the reply from view)

sumanah wrote:

Will this be addressed in Brion Vibber's parser rewrite?

I'm adding the newparser keyword to bring it to the parser rewriters' attention.

The rendering of :'s was improved at least inside of <poem> tags to fix bug 31146 (see gerrit change 13539).

Wouldn't be possible to do something similar here?

As was pointed out on http://en.wikipedia.org/wiki/Help_talk:Wiki_markup#semicolon_issue.3F , the <dl> markup currently produced by unpaired ; or : fails validation since the switch to HTML5. Shouldn't the importance of the bug be raised?

Since this is now an HTML5 issue, I have added it to bug 19719.

frungi wrote:

As long as the software translates colons to definitions, the syntax should absolutely not be used in articles except when a definition list is what is actually desired. But since _everyone_ uses these or *lists for threads on Talk pages, could the HTML conversion be changed specifically for Talk pages?

The issue has been recently noticed and deeply discussed also on the Italian wikipedia. For the sake of the open and standard formats (cf. Wikipedia's third pillar), validity and correctness appear to be important also for the copious discussion pages, where indentation in dialogues abound. In the quest for a solution to the problem, on it.wiki we contemplated abandoning the long-established incorrect habit of colon-indentation, or, even more unlikely, suggesting to HTML working groups a revision in the language grammar. Nevertheless, the easiest and place where one can fix such an issue is by a (most probably) small patch to the MediaWiki parser.

I am raising the importance of the this task and looking for a partner in the job.

SoujaK raised the priority of this task from Low to Medium.Feb 19 2015, 5:14 PM
SoujaK set Security to None.

@SoujaK There is also the issue of empty line-breaks between responses (which editors often do, to make it easier to read the wikitext, and to add a slightly larger gap between lines in the rendered output), E.g.

:1st person's comment. ~~~~
[blank line]
:: 2nd person's comment. ~~~~

This creates a 2nd definition list! This is terrible for anyone using some screen-reader software (eg NVDA, but not JAWS). The enwiki guideline cautioning against this habit, is at https://en.wikipedia.org/wiki/Wikipedia:LISTGAP and there have been some frustrating discussions about it at enwiki, over the years (e.g. archives 12 and 13 of [[WT:ACCESSIBILITY]])
(I couldn't see it mentioned in the googletranslation of the itwiki discussion; my apologies if I missed it.)

This is one of the many reasons that I initially applied to work with the team creating Flow. It has a long ways to go, in terms of features to be added and bugs to be fixed - and accessible/standards-compliant output (as you mention at itwiki) is something that needs more work - but in the medium-to-long term, that extension is likely to be the cleanest way to resolve this bug (and many many others).

As for changing the way that the parser interprets the existing :'s and ;'s in the millions of pages .... I'm not a developer, so not qualified to comment.
Hope that helps.

At the German Wikibooks project there is the following workaround: We have templates such as {{---|...}} for indentation. For example the code

{{---|A paragraph...

* item 1
* item 2
* item 3

Another paragraph.}}

will result in an 3 times indented block with an list inside. With

{{Formel|<math>...</math>}}

we can indent equations ("Formel" is the German word for "equation"). Those templates are mainly used in the project Mathe für Nicht-Freaks and on discussion pages (mostly for long posts because the colons don't need to be repeated for each paragraph of the post).

Technical details

The template Vorlage:Einrücken is used to indent something ("einrücken" means "to indent"). It basically has the definition

<div style="margin: 0.8em 0 0.8em {{#expr: {{{count|1}}} * 1.6 }}em;">
{{{content}}}
</div>

1.6em is the currently defined left indentation of <dd> with the current CSS. So count gives the number of indentation steps which defaults to 1 if this parameter is not defined. Actually it has some additional parameters to define the typ of the indented html block (it might also be <p> for example) and to make the content inline such as

<div style="margin: 0.8em 0 0.8em {{#expr: {{{count|1}}} * 1.6 }}em;">{{{content}}}</div>

Now we can define the templates {{-|...}} to {{-----|...}}. For example the template {{---|...}} has the definition

{{indent
 |count=3
 |html_tag={{{html_tag|div}}}
 |content={{{1}}}
}}

(I translated all variables and template names from German to English so that they are understandable)

For the template {{Equation|<math>...</math>}} one can define

{{indent
 |count=1
 |inline=yes
 |html_tag=p
 |content={{{1}}}
}}

So one can use

{{Equation|<math>1+1=0</math>}}

instead of

:<math>1+1=0</math>

Notes about inline and block content
Sometimes its necessary to control whether content is parsed as line text or not. I define content as "inline text" if you have the definition

<div ...>{{{content}}}</div>

In contrast this is a "block"-content for me:

<div ...>
{{{content}}}
</div>

When you have something like {{indent|content=test}} and we have second "block"-definition an additional <p> is created around the word "test". This is not the case for the first "inline"-definition.

Sometimes it's important to control how the content is rendered. For example

{{indent
 |content=
* item1
* item2
* ...
}}

does not work for an "inline"-definition of the content because the first asterisk is not the first character of the line. Here the content must be rendered as a "block"-content.

Definition of Template:Indent
The template {{indent|...}} might be defined as

<{{{html_tag|div}}} style="margin: 0.8em 0 0.8em {{#expr: {{{count|1}}} * 1.6 }}em;" {{#ifeq: {{{inline|no}}} | yes | >{{{content}}} | >
{{{content}}} }}
</{{{html_tag|div}}}>

with the parameters:

  • html_tag is the typ of the indented html block which defaults to div
  • count is the number of indentation steps of the block
  • content is the main content of the block
  • Unless inline is set to yes the content is rendered as an extra "block". If inline is equal do yes the content is rendered as inline text

This workaround is of course not the best one but it works with the current possibilities of the mediawiki software. Each improvement of this workaround is welcome because those improvements can be migrated to the currently used templates at the German Wikibooks project.

There may be an alternative solution using WAI-ARIA roles.

If the output wouldn't fit the content model for the <dl> element — i.e. it starts with a colon line (<dd>) or ends with a semicolon line (<dt>) — the parser could make the list presentational: <dl role="presentation">. Assistive technologies such as JAWS and NVDA should treat a <dl role="presentation"> element and its children as plain text instead of a list.

Making an initial colon line <dl role="presentation"><dd> should be an easy one-line change in openList(). Reading ahead to check if the definition list ends with a semicolon line would be a bit tougher.

This still bother's me a lot :)
As a short term improvement, i'm really liking @MattFitzpatrick's idea of decorating it with role="presentation". It should be relatively easy to get that rolled out. But while we currently have an incorrect semantic structure, the dl/dd's do provide A structure, where role=presentation will give none.... I'd really like to have some understanding about how @Graham87 currently interacts with this. Especially, as I know he does make use of the indentation. I wonder if he just counts the colons in the wikitext, or if he also uses the dl nesting structure of the HTML to figure out who is replying to who... In case of the latter, then adding role="presentation", might actually make the accessibility even worse.

I'd also really love to get the opinion of the Parsing-Team--ARCHIVED on this, because they have a lot of the long term vision about this.

Currently : is used for two things basically

  1. visual indentation
  2. structured indentation in discussions

For the first, you'd probably would best use <div>s with margin really. Visual indentation is however also rather rare in Wikipedia articles. Most of the uses of this in Wikipedia articles have been replaced with specific templates (hatnote's, quote templates etc). For new comers however, those templates are difficult to find (they ask about indentation buttons in the toolbars for instance), we need to think about UX here, but maybe we don't even have to special case the "visual identation" at all, because of this small usage.

For talk pages, I've recently been thinking a bit wilder. I've been considering if we should not put parsing logic about discussions into the parser itself. Make it possible to recognise the larger discussion patterns in wiki code and output customised HTML for it. We could output nested <section>s HTML, extract usernames, dates etc and add them as data attributes to sections. Custom UI could make use of that information etc. We would basically output a 'structured discussion' widget, IF the wiki code pattern complies with standard conventions. That will automatically convince people to use a more standardised approach with regard to the wiki code. Exposing this Flow style UI, might also make it easier to eventually convert pages into a different content model or something. Basically, u build Flow (or rather structured discussion) from two directions and make them converge. I admit it's a bit complex and annoying to go that route, but.. maybe more viable ?

I'd really like to have some understanding about how @Graham87 currently interacts with this. Especially, as I know he does make use of the indentation. I wonder if he just counts the colons in the wikitext, or if he also uses the dl nesting structure of the HTML to figure out who is replying to who... In case of the latter, then adding role="presentation", might actually make the accessibility even worse.

I just figure out who is replying to whom by context, generally. To reply using indentation, I generally just copy and paste the colons from the last message and add one more. I'm short of time at the moment so can't look into role=presentation.

@Graham87 thank you Graham. I know you are busy atm, i wasn't 'even expecting you to reply at this moment :)
So you ignore the definition lists in the html altogether (JAWS announces them and you ignore them, or JAWS doesn't even announce them at all) and make your own interpretation. And when replying you extend the previous reply.

I think I might make a few demo HTML pages at some point and when I have those I will ask people (including you hopefully) to review them. No need to look into role=presentation right now for you.

@TheDJ: JAWS doesn't announce the definition lists in colon-indented discussions. NVDA does though.

Quick and dirty test page. Definition lists conforming or not conforming to HTML5 spec, with or without presentation role.

https://en.wikipedia.org/w/index.php?title=User:Matt_Fitzpatrick/sandbox&oldid=793629139

ListJAWS 15.0NVDA 2017.2
1 Semantic definition list "Definition list of two items" ... "List end" "List with four items" ... "Out of list"
2 Semicolon bold "Definition list of one items" ... "List end" "List with one items" ... "Out of list"
3 Semicolon bold with presentation role "" ""
4 Colon indent "" "List with two items" ... "Out of list"
5 Colon indent with presentation role "" ""

Some of the comments above suggested treating article pages and talk pages differently. That's a mistake. A generic page should be presumed to contain both article content and discussions. Copying arbitrary article content to any page is a fundamental part of our work.

I'm not sure why this has gone on for so long without resolution. It seems pretty simple to me:

If the parser encounters a line starting with ; then it should not treat is as a dt unless the line includes a non-escaped : (a dd), or the next contiguous lines starts with : (a dd). Instead, generate a styled div for boldface.

If a line starts with : then do not treat it as a dd unless the previous line started with ; (a dt). Instead, generate a styled div for indentation.

Do not generate a dl if there are no dt/dd elements for one.

If a line starts with * then do not treat it as an li (nor create the ul wrapper) unless there's more than one of them and they're contiguous. Instead, treat it as a styled div, with indentation and a bullet. (If people actually want to create the weird case of a "list" of one item, e.g. to match multi-item lists elsewhere in the document, they can do this with explicit HTML ul and li markup).

They may need display:inline-block, for things like ::* and *:: to work right when used as discussion markup.

Could also apply a rule like this to #, but it is much less frequently abused (because if it doesn't produce a legit list, every item with such markup starts with "1.", which is not what anyone wants).

This isn't really true:

Currently : is used for two things basically

  1. visual indentation
  2. structured indentation in discussions

Those are two common misuses of the markup.

It is also used for what the element actually exists for: legit dd elements, i.e. descriptions/definitions/associations for the dt items/terms in a dl list. Examples are most of en.WP's "Glossary of ..." articles. See also https://en.wikipedia.org/wiki/Template:Defn and its documentation.

I'm not sure why this has gone on for so long without resolution.

Because nobody has submitted a patch proposal in Gerrit yet. Everybody can though: You are very welcome to use developer access to submit the proposed code changes as a Git branch directly into Gerrit. If you don't want to set up Git/Gerrit, you can also use the Gerrit Patch Uploader. Thanks in advance!

Those are two common misuses of the markup [..] It is also used for what the element actually exists for

That's why I said use of : not use of ; :

Because nobody has submitted a patch proposal in Gerrit yet.

There are more considerations though.

  1. Using div/span's doesn't completely fix accessibility either. A screenreader would still not have any context about which block of text is a reply to which other block of text. It's an improvement, but not a complete fix.
  2. This is some of the older code of the parser, with lots of side effects (consolidations, reparenting of HTML elements etc etc). Changing it is challenging (though easier now that Parsoid has added so many testcases)
  3. Due to parsoid, you will however need to fix two parsers now :(
  4. Changing the meaning of wiki syntax depending on how it is used, is not something that we do anywhere else as far as I know..

Just FYI: WMF's failure to fix this simple and obvious bug after over a decade is likely (given where this RfC is going) to result in en.Wikipedia literally PROHIBITING the use of more accessible markup instead of abuse of the <dd> element for visual indentation. This is pretty much an accessibility debacle of stunning proportions, resulting from a combination of editorial ignorance of accessibility and of HTML specs, laziness, and WMF failure to prioritize development. I'm stunned and depressed that it's come to this. I have no idea how an organization with tens of millions of dollars and a dedicated development team with almost all focus on a single software project cannot get it together to fix basic, obvious, and comparatively easy-to-solve problems in its functionality when given over ten years to do it. This should be fixed within this month.

If the ":" is not preceded by a line starting with ";" (or an explicit HTML <dt>) then it is not a <dd> and should emit a <span> with padding-left to produce the indent, without bogus list markup.

Just FYI: It's not up to "WMF" to fix every and any issue, to correct expectations a bit.

FYI, for anyone wanting to work on this, you will need to modify the BlockLevel pass of the parser:
https://github.com/wikimedia/mediawiki/blob/master/includes/parser/BlockLevelPass.php#L271

Instead of directly creating the corresponding HTML for lists (now it just concats everything), you would have to move the open list command and first interpret the : and ; chars, before adding the content to the output and varying the list type.

We'll discuss this soon -- with a corpus as big as we have on all the wikis, it is always tricky making changes like this. All the nastiness is in testing, identifying edge cases, and ensuring that nothing else breaks.

Apparently there is a major discussion going on at en.wp that brought this ticket back on the rader:
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(policy)#RfC:_Accessibility_versus_convenience_in_indentation

Next time, please link such things people, it can become very confusing if a party is talking with 2 discussions in mind, and the rest is having just the one discussion.

Since we all seem to agree on the use cases for colons, I'll list them and comment:

  1. Talk pages
  2. Indenting math
  3. Other random indents
  4. Definition lists

#1 is a lost cause. I don't think we should change core parser functionality to suit the insanity that is talk pages, and which have a known solution that some wikis choose to reject or ignore. (A solution with its own problems which need to be recognized and fixed, but the only sensible solution [or at least, only sensible class of solutions].) #2 has suitable alternatives that need working on and which would be time better spent. I've filed a few these past couple of days. #3 I see in the wild sometimes (especially on policy pages?), but we should identify and subsequently support those other ways as appropriate rather than vaguely insisting that the : making a span instead of a dd is the correct way to go. #4 I don't see much need to comment on.

Additional reasons

I agree entirely with these preventing this ticket from seeing implementation and there are aspects that extend from many of those. I'm in fact quite confused why this task hasn't been declined in the past 10 years. We have the alternatives. Let's make those more attractive.

I'd add one other, which is that allowing : to render an indent without semantic intent is exactly what semantic HTML tries to prevent (which is the mixing of semantic markup with unsemantic markup). We turn : into just a regular old blob in that scenario.

This is pretty much an accessibility debacle [inclusive_snip]

Avoid the hyperbole please.

My much earlier comment mistakenly suggested using a stylzed <span> when I meant <div>; indented "things" often contain block-level markup in them.

The "turn : into just a regular old blob" approach would be workable, though it would be preferable to retain its interpretation as a <dd> when it's actually immediately after a ;-marked <dt>. If that really must be sacrificed, we can live with it, since we have alternatives. The <dl> list output that actually represents description/definition/association lists at all is quite rare, and <dl> lists built with ; and : are "brittle" anyway – they can't tolerate line breaks and stuff, except with trickery like using <p>...</p> inline on the same line as the : – which is itself easily wrecked the next time someone edits the material. https://en.wikipedia.org/wiki/MOS:GLOSSARIES provides a detailed overview of the pitfalls, and the more robust alternatives for creating real description lists (see especially the "/DD bug test cases" subpage). Even if we lose the description list markup on everything currently using semicolon and colon markup, this is a small price to pay, since we can fix it later as part of everyday gnoming, and the output in the interim will actually look fine. (It will even be parseable by screen readers, just without it being announced as a definition list. Given that about 99.99% of that markup on WP is today mis-announced as d-lists when it's just visual spacing, I'm sure they'd all be happy about the change. Analogy: If Toyota puts out a car model that explodes 99 times out of 100 when you turn the key, it just doesn't matter how great it handles the 1 time it doesn't blow up.)

In Izno's list of :-markup use cases, 1 through 3 (talk, math, and random-content indentation, respectively) are all the exact same thing: visual indentation by a set amount, that has nothing to do with description lists. The same fix (emitting <div> with CSS padding-left, or something roughly equivalent) will deal with them all simultaneously. Yep, even talk pages. However, those could be improved further by also not interpreting * or # as list markup, and just doing them as <div> with some CSS – except when they actually form valid lists of two or more items in an unbroken list. A "list" of one item is not a list anyway, it's just a line with a bullet in front of it (usually intentional) or (usually not intentional) an extraneous "1." that makes no sense in the context.

None of this actually has anything to do with math markup in particular – indenting a <math> block of any other formula display with : is precisely the same as doing that to any other chunk of content. The "math markup is special" shtick is a red herring that came up in that train-wreck of an RfC, but which can be dispensed with. And it's not hyperbolic to call it a train-wreck; it's shocking how many people who clearly should know better are (or pretend they are) utterly convinced that using : to cause a purely-visual indentation "is a Wikipedia standard" and should be enforced against all alternatives. Even this very day, multiple parties are showing up to pages like MOS:LISTS and MOS:ACCESSIBILITY declaring that the canvassed RfC is proof of consensus against more accessible markup and trying to delete the latter from the guidelines. I kid you not.

The parser shouldn't be overly strict. A one-item ul or ol is okay in the HTML spec. And there are reasons a wiki might legitimately want to do that. For example, a single placeholder item, with JavaScript to add or remove items to the list.

That said, dl is a different case. A dl that begins with a dd or ends with a dt is not okay in the HTML spec. And since JavaScript may or may not run on any particular browser, this would be a bad idea for a placeholder or fallback. So these should be tweaked into a <dl role="presentation"> or, to be stricter but possibly more disruptive, <div class="whatever-dl-pseudo-indent-class">.

I would caution, though, that such a tweak may never happen. This ticket is almost 12 years old with 0 patches submitted. It may not even be technically feasible. This ticket should not be a reason to continue lazy HTML habits, assuming someone else will patch it up later.

@MattFitzpatrick:

  1. The HTML specs say all kinds of things are permissible, but MW doesn't support all of them. That's okay. Heck, there are entire elements MW just ignores.
  2. Your "one-item ul" case won't matter: the in-wiki markup will remain *, and the output will look identical. (If we're concerned someone might be doing something special with style, we can give converted-to-div "list" elements a class they can use to match their list styling; edge case probably no one cares about.)
  3. When someone comes along and actually uses your placeholder "*" to create a real list with 2+ items, then after saving the page, MW will parse it as a list, not as a styled div. Meanwhile, if there is only one * we get the double benefit of a) screen readers no longer being told there's a "list" but finding only one item, and b) same users being told there's a list but there's nothing there at all, just an empty <li>.
  4. <dl role="presentation"> is still going to be invalid without a least one each of <dt> and <dd>, so by all means we should do something like <div class="whatever-dl-pseudo-indent-class">.

This ticket should not be a reason to continue lazy HTML habits, assuming someone else will patch it up later.

But it is and it will be until it actually is fixed. The largest Wikipedia, and largest project WMF has, just had an RfC that concluded to continue abusing the markup because the devs are just going to fix it eventually. Every single project I look at is also rampantly abusing : as indentation, ; as a boldfacing and pseudo-heading shortcut, * by itself as way to make an emphasized note with a bullet, and so on. Without exception. This isn't a user education issue, it's a "we've provided a dangerous tool, and not provided a better one, so the dangerous one will keep being used and keep on maiming" issue. I would love to just fix this myself, but it's not my kind of coding. I don't even know what language MW is written in. I'm a scripter and a plain-English writer, not an application programmer.

This comment was removed by Jc86035.

I also actually think that this is up to developers honestly. Especially for discussion pages, maybe for content we can still avoid it, as it's much less prevalent..
Unfortunately, when a usage has become so engrained among people, then it becomes hard to retroactively change that. Additionally, you have the entire corpus of 17 years of revisions that has already been 'poisoned'. When you reach that stage, it's too late to start correcting people. It's a bit like [[Desire path]].

However, we also need to be realistic. This is BAD, but in all the years i've had few indications that this prohibits people from contributing. It's mostly used outside of content, where people are so committed to the project that they have learned to deal with it (and even none of the above proposed fixes would significantly improve their ability to participate and read discussions unfortunately) and among content it is rare enough. Not correct, annoying, but not obstructive.

But it's been 17 years, and it is about time we fix it.

Agreed with TheDJ on all counts, except that it's actually worse. What's happening is that particular and well-organized camps of editors are declaring that they refuse in mainspace to use more accessible markup. They are using talk page habituation to the bad markup as a "reason" to abuse description list code for purely visual layout gimmicks in articles we send to all our readers, and to effectively force all editors to do so (by changing our content guidelines to impose the bad style and to remove any mention of more accessible approaches). This is really, really not okay.

Related discussion on the English Wikipedia: this is part of the local discussion for the WMF's talk pages consultation 2019. @Whatamidoing-WMF suggested that I should post something here.

In summary, I made/developed(?) several hypothetical proposals to introduce either/both new indentation syntax and/or changes to the four tildes so that they also produce an extension tag to indicate the end of a comment; I think something similar could potentially be implemented without significantly disrupting user workflows, i.e. so that the current syntax still works (see comment 16:56, 27 February 2019).

The proposals received some positive feedback, and some "this would be unnecessary" feedback (both mostly related to the earliest proposal, which proposed a complete switch to a new indentation character).

To elaborate, using an extension tag could avoid unnecessarily re-parsing(?) older pages, and would allow for the extension tag to be manually re-added to older discussions. Automatically including each comment's diff ID in the extension tag (with suffixes added for multiple-signature edits) would also allow for permalinks/anchors and such. Furthermore, introducing new syntax while also allowing the current syntax to be used would allow users to continue to use actual (semantically correct) lists within comments, and could allow incorrect nesting in discussions (which happens fairly frequently) to be ignored for the most part. It could also prevent some edge case issues with visual editing or inline replies (depending on implementation), and would avoid breaking definition lists used outside discussions.

Jc86035 wrote:

When used before a signature, the use of the current list item syntax would be interpreted as discussion nesting – unless preceded by the new-style markers, in which case it would be interpreted as list item syntax. This would make it more complicated for existing users, who would nevertheless not be forced to use the new syntax; but (with an inline comment/reply interface) would also make commenting less fragile and easier to use for users in general.

As noted in other sections of the local consultation page, resolving this bug and related issues is almost certainly less likely to alienate existing users than re-introducing and/or continuing development on something like Structured Discussions, which a lot of users clearly dislike for various reasons. (Fixing this could also potentially help to allow for interface changes, better CSS, visible indication of nesting, inline replies, VisualEditor on discussion pages, etc.) I think it could be possible to make the current discussion system formally "structured" in a satisfactory way while also staying with a wikitext format, although (since I am not a MediaWiki developer) I don't know whether something like this would be technically infeasible due to the complexity of the existing code and/or other issues.

I hope this is useful. Sorry if this post is too lengthy or overly emphasizes the (purely hypothetical) proposal, although most of the previous discussion doesn't really discuss anything similar; and it would be nice to know at some point if this would actually be somewhat feasible or if I've just been spouting into the void.


I've copied part of one of my comments from the page so that the actual effects of the proposals are clearly illustrated (the original comment by Alsee is real but I shortened it). At time of writing, no one has actually provided feedback to the "modified proposal" indicated below, although the extension tag specifically was (briefly) discussed by Wnt.

==== Current syntax ====
* I oppose the proposal for the following reasons:
*# People use the existing markup in comments for a reason, such as numbered lists. And yes I deliberately used # in my response here. [[User:Alsee|Alsee]] ([[User talk:Alsee|talk]]) 15:23, 27 February 2019 (UTC)

==== Original proposal ====
<discussion type="vote">
\ I oppose the proposal for the following reasons:
\# People use the existing markup in comments for a reason, such as numbered lists. And yes I deliberately used # in my response here. [[User:Alsee|Alsee]] ([[User talk:Alsee|talk]]) 15:23, 27 February 2019 (UTC)

==== Modified proposal, style 1 (the <sig/> tag would be generated by the four tildes) ====
* I oppose the proposal for the following reasons:
*# People use the existing markup in comments for a reason, such as numbered lists. And yes I deliberately used # in my response here. [[User:Alsee|Alsee]] ([[User talk:Alsee|talk]]) 15:23, 27 February 2019 (UTC)<sig rev="885361658"/>

==== Modified proposal, style 2 ====
>1*
I oppose the proposal for the following reasons:
# People use the existing markup in comments for a reason, such as numbered lists. And yes I deliberately used # in my response here. [[User:Alsee|Alsee]] ([[User talk:Alsee|talk]]) 15:23, 27 February 2019 (UTC)<sig rev="885361658"/>

==== Modified proposal, style 3 ====
>1* I oppose the proposal for the following reasons:
# People use the existing markup in comments for a reason, such as numbered lists. And yes I deliberately used # in my response here. [[User:Alsee|Alsee]] ([[User talk:Alsee|talk]]) 15:23, 27 February 2019 (UTC)<sig rev="885361658"/>