Page MenuHomePhabricator

ins or del 1st occurrence in blockquote causes bogus newline
Closed, ResolvedPublic

Description

Author: smccandlish

Description:
This was briefly mentioned at Comment #27, Bug #6200, but the issues appear to be essentially unrelated, so I'm filing this as a separate bug.

Working examples of this problem are well-documented at http://en.wikipedia.org/wiki/Template:Ins

The problem affects both manual and templated use of blockquote. While bug 6200 #27 proposes use of div inside blockquote to resolve this, I think it is actually more semantically proper to work around it the way that Template:Ins does (using span), until it is genuinely fixed in the software.

Failure to fix this means disuse of semantic markup tags in favor of ones with no semantic meaning at all (u and s), which are ignored by screen readers, thus making this an accessibility issue: "Foo <del>bar</del> <ins>baz</ins> quux" will be read by screen reader software in most cases as "foo bar baz quux", which is obviously wrong.


Version: unspecified
Severity: minor
URL: http://en.wikipedia.org/wiki/Template:Ins
See Also: T53086

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:21 PM
bzimport set Reference to bz15491.
bzimport added a subscriber: Unknown Object (MLST).

smccandlish wrote:

Self-correction! I meant to say:

'Failure to fix this means disuse of semantic markup tags in favor of ones with
no semantic meaning at all (u and s), which are ignored by screen readers, thus
making this an accessibility issue: "Foo <s>bar</s> <u>baz</u> quux"
will be read by screen reader software in most cases as "foo bar baz quux",
which is obviously wrong in almost all cases in Wikipedia, where something more like "Foo <del>bar</del> <ins>baz</ins> quux" was intended, and will be rendered properly.'

To continue in more detail:

Depending upon the screen reader, the aural rendering will be something something like "[normal tone] Foo [tone change] deleted-text bar [different tone change] inserted-text baz [return to normal tone] quux" - but people won't use actually do this properly because of the bug reported here. With the u/s rendering it will simply be "[normal tone] foo bar baz quux", with all the non-semantic u/s markup ignored.

No one's going to do it properly until this bug is fixed, yet by a HUGE margin, the WP usage of u/s is in talkspace (and occasionally quotations of manuscript materials) as stand-ins for ins/del, respectively. There are almost zero legit uses of u and s outside of the insertion and deletion contexts. U has been abandoned almost Web-wide, and s even off of Wikipedia is hardly ever used to indicate anything other than deleted/redacted material (i.e., what del is for). As non-semantic markup elements, u and s serve no purpose other than decoration, and should be abandoned on WP just as they are being deprecated in the specs in favor of CSS.

ayg wrote:

I'm curious -- do you have specific examples of popular screen readers where <del> is treated differently from <s>, and <ins> differently from <u>?

smccandlish wrote:

I'm curious -- do you have specific examples of popular screen readers where
<del> is treated differently from <s>, and <ins> differently from <u>?

ALL of them, as far as I know. I'm unaware of any screen reader that will do anything with pure-presentation markup (b, i, u, s, font, etc.), which are only useful for [fully-]sighted user, unless the user overrides the default behavior of ignoring them. Screen readers generally *will* do something audibly different with each sematic markup tag (again, unless overridden). Instead of pestering me about this sort of thing bug after bug after bug, please just go do some research on Web accessibility; this is honestly getting very tedious.

ayg wrote:

(In reply to comment #3)

ALL of them, as far as I know. I'm unaware of any screen reader that will do
anything with pure-presentation markup (b, i, u, s, font, etc.), which are only
useful for [fully-]sighted user, unless the user overrides the default behavior
of ignoring them. Screen readers generally *will* do something audibly
different with each sematic markup tag (again, unless overridden). Instead of
pestering me about this sort of thing bug after bug after bug, please just go
do some research on Web accessibility; this is honestly getting very tedious.

I do know something (not a lot, I freely admit) about web accessibility. What I knew about screen readers made me believe that <ins> and <u> would be treated the same. I just downloaded the JAWS trial version, and I have confirmed that without my reconfiguring anything, all markup like <ins>, <u>, <strong>, <b>, whatever seems to be totally ignored. Specifically, this page:

data:text/html,<!doctype html><title>Test</title><ins>Hello</ins>

when loaded in Firefox, is read as "One hundred percent. Page has no links. Test hello". It's read in exactly the same way if I replace <ins> by anything else, like <u> or <strong> or <b> or <i> or <em>. There is no change in tone or anything. "Foo <ins>bar</ins> <del>baz</del> quuz" is read as "Foo bar baz quuz".

I don't blame you for not being willing to put in the work to test things. I also don't blame you for being overconfident and assuming that your knowledge of screen readers was correct. But I really have to object to the fact that when you made a statement without providing any supporting evidence, and were asked to provide evidence, your response was to tell *me* to do the research, in a condescending tone no less.

If you do not have evidence, you should say so. Not having evidence beyond hearsay and rumor is fine, but you should not act as though your claims are well-supported and obviously correct when they are not.

So, anyway, I did do the research. At least for this screen reader -- the most popular one in the world, in its default configuration as far as I can tell -- you are mistaken. If, again, you have specific evidence to the contrary, I would be interested in hearing it, just as a matter of general knowledge about screen readers. But in the future, I will continue to "pester" you and everyone else for evidence when you make claims that I suspect are incorrect.

Of course, this is still a bug and should still be fixed. But correct conclusions do not justify erroneous arguments.

ayg wrote:

(BTW, I've looked at the code for this, and I don't think it's easy. Removing keyword.)

smccandlish wrote:

Aryeh: Fair enough, but your argument assumes that screen reader users will go with the defaults. For purely presentational markup, that's a "safe" option generally (vision impaired users are unlikely to care if something is italicized [as opposed to emphasized]. But not customizing at least some of the more important semantic elements will lead to confusing, even total gibberish output, especially in the case of del and ins. Ergo, it seems a very long bet that experienced screen reader users do not customize them to work around such issues.

I checked this out. The wikitext:

<blockquote>Foo <del>bar</del> <ins>baz</ins> quux</blockquote>

does indeed get rendered as:

<blockquote>
<p>Foo</p>
<del>bar</del> <ins>baz</ins> quux</blockquote>

but it's not the parser's fault: it's tidy which is adding the <p> tags. In particular, the 'enclose-block-text' option in includes/tidy.conf.

On the other hand, if bug 6200 is fixed, the parser output would be:

<blockquote><p>Foo <del>bar</del> <ins>baz</ins> quux</p></blockquote>

...and tidy doesn't screw with this output. So maybe fixing bug 6200 is the Right Thing.

Change 77961 had a related patch set uploaded by Cscott:
Make linebreaks in <blockquote> behave like <div> (bug 6200).

https://gerrit.wikimedia.org/r/77961

Change 77961 merged by jenkins-bot:
Make line breaks in <blockquote> behave like <div> (bug 6200).

https://gerrit.wikimedia.org/r/77961

Change 78972 had a related patch set uploaded by Cscott:
Make line breaks in <blockquote> behave like <div> (bug 6200).

https://gerrit.wikimedia.org/r/78972

Reopened bug, since earlier fix for bug 6200 was reverted.

Change 78972 merged by jenkins-bot:
Make line breaks in <blockquote> behave like <div> (bug 6200).

https://gerrit.wikimedia.org/r/78972

(In reply to C. Scott Ananian from comment #8)

On the other hand, if bug 6200 is fixed, the parser output would be:

<blockquote><p>Foo <del>bar</del> <ins>baz</ins> quux</p></blockquote>

...and tidy doesn't screw with this output. So maybe fixing bug 6200 is the
Right Thing.

Bug 6200 does not fix this issue since <blockquote>inline-content</blockquote> do not get p-tags around inline-content. The p-tags are added only if inline-content is not on a line with the opening/closing blockquote tags. So <blockquote>\ninline-content\n</blockquote> for ex.

Change 445449 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/core@master] <ins>/<del> elements can be phrasing or flow

https://gerrit.wikimedia.org/r/445449

Change 445449 merged by jenkins-bot:
[mediawiki/core@master] <ins>/<del> elements can be phrasing or flow

https://gerrit.wikimedia.org/r/445449

Arlolra claimed this task.

Change 468859 had a related patch set uploaded (by Legoktm; owner: Arlolra):
[mediawiki/core@REL1_31] <ins>/<del> elements can be phrasing or flow

https://gerrit.wikimedia.org/r/468859

Change 468859 merged by jenkins-bot:
[mediawiki/core@REL1_31] <ins>/<del> elements can be phrasing or flow

https://gerrit.wikimedia.org/r/468859