Page MenuHomePhabricator

Image syntax shouldn't use caption as alt= text
Closed, ResolvedPublic

Description

Author: mpt

Description:
Wikimedia's image syntax should have separate fields for
the caption and for alternate text. Captions and
alternate text have opposite purposes: captions only make
sense if an image *is* visible, whereas alternate text is
intended for display only if the image is *not* visible. So
it doesn't make sense for them to use the same data.

The current effect, in screenreaders and in text-only
browsers, is two recitals of the same non-sequitur.


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/wiki/Wikipedia_talk:Extended_image_syntax#Alternate_text_shouldn.27t_be_the_same_as_caption_text

Details

Reference
bz368

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 7:07 PM
bzimport set Reference to bz368.
bzimport added a subscriber: Unknown Object (MLST).

kop wrote:

See http://meta.wikimedia.org/wiki/
MediaWiki_1.3_comments_and_bug_reports#Separate_captions_and_alt_text for further comments, links to
further comments, and related issues. (In particular, I think that alternate text should be stored with the
image, on the image page. If that's the only place it goes then the current wiki syntax can be used to
encode the caption. But this may not be the best solution.)

mpt wrote:

Unfortunately, that wouldn't work. Appropriate alternate
text will almost always, and appropriate caption text will
often, be different for the same image used in different
articles.

apb wrote:

There are actually three different text values, not just two.

alt text (shown instead of the image by text browsers or audio browsers), title
text (typically shown as a tool tip by graphical browsers), and caption text
(shown below the image), should all be distinct. See [[en:Wikipedia_talk:Extended
image syntax#Alt.2C_title.2C_and_caption_text_in_extended_markup]] for some
discussion.

I suggest syntax like [[Image:Filename.ext|other|options|here|title=Title
text|alt=Alt text|caption=Caption text]], with defaults like: If caption is unset,
then no caption; if alt is unset, then copy the title or the caption, or fall back
to the file name; if title is unset then copy the alt text or the caption, or fall
back to the file name. For backward compatibility, allow the existing [[Image:
Filename.ext|other|options|here|Text that gets used for all three purposes]].

michael wrote:

ALT is always required for an image, but caption and title should only be present if explicitly set and not generated by
default. Title is least important. A better order for arguments would be:

[[Image:Filename.ext|other|options|here|alt=Alt text|caption=Caption text|title=Title text]]

mpt wrote:

However, the most appropriate alt text for an image is
usually nothing at all -- even on a Wiki, despite Wikis'
avoidance of purely decorative images. Usually a Wiki
image is either a diagram of a point that has already
been put, as well as it can be in text format, in the
article text already (in which case repeating it as more
text is pointless), or it's an illustration of something
that's marginally interesting to those viewing images
(e.g. a person's appearance) but not interesting enough
to feature in a text-only rendition of an article (e.g.
you wouldn't bother mentioning the image if reading the
article to someone over the phone). I am not saying alt
should always be nothing at all, but it is more likely
that the most appropriate alternate text will be nothing
than that the most appropriate caption will be nothing.

Therefore, I think alternate text should come after the
caption, to reduce the compulsion for people to create
redundant alt text (which they frequently do). I don't
see the point of having a customizable title= at
all, but I suppose that belongs to another bug report.

wmahan_04 wrote:

In the 1.4pre CVS branch, there are problems with image captions that
contain links; the image alt text and "title" attribute of the link to
the image page are set incorrectly. I think fixing this bug would be a
way of fixing that problem as well.

michael wrote:

(In reply to comment #5)

However, the most appropriate alt text for an image is
usually nothing at all

This is simply incorrect.

This is saying that users of alternative browsers or on slow connections
should be denied access to images. It's also treating some
handicapped users as second-class citizens. If there's a a point in
putting an image into an article, then everyone who reads the article
should be able to see that it's there, get an idea of what it is, and have
the option of loading or downloading it.

All images on WP must have alt attributes. Many will have captions, but
they aren't strictly necessary, as many images can speak for themselves
(unless, of course, they have empty alt text). E.g.: a prominent portrait
on a bio page, a chart with a graphical title/caption, a country's
location map.

And remember that every image on WP also serves as a link to the
Image: page, which may contain a more information that's not in the
article, important copyright info, or a more detailed text description of
the image.

rowan.collins wrote:

Firstly, note that image parameters are not order-specific, so all discussion of
"which should come first" is irrelevant. The only parameter that is currently
order specific is that which is currently used for alt, title and caption; this
is order specific in the sense that it is not treated as a parameter at all.
(Roughly, the syntax is
[[<image-name>|<zero-or-more-params>|<caption-and-alt-text>]]). If we wanted, we
could define parameters of form alt=, caption= and title= and use the last
not-really-a-parameter only as a fall-back. This leaves us with two questions:
precedence, and defaults.

[I'll refer to the text currently used for all three as <alltext>]

  • <caption> text is only used on images using the "frame" or "thumb" parameters;

it needs to default to <alltext> to avoid breaking current usage.

  • <alt> text should be present on every image, but ideally be different from a

visible caption; however, non-captioned images (those with neither "thumb" nor
"frame") may have <alltext> deliberately chosen as a good <alt>; we could
therefore fall back to <oldtext> for non-captioned images, but generate
something from the filename for captioned images.

  • title text is the other position of <alltext> in non-captioned images, and so

like <alt> should probably fall back to that; it's less clear, however, what
fallback title a captioned image should have

I propose the following:
[[Image:<filename>|<options>|alt=<alt>|title=<title>|<caption>]]

  • <caption> is compulsory, as now; the others are optional

For a non-captioned image:

  • the alt attribute contains <alt> if set, <caption> otherwise
  • the title attribute contains <title> if set, <caption> otherwise

For a captioned image:

  • the alt attribute contains <alt> if set, <filename> otherwise
  • the title attribute contains <title> if set; if none set, I'm not sure: <alt>?

<filename>? <caption>?; note that if <alt>, we need a third fallback for when
that isn't set either.

rowan.collins wrote:

(In reply to comment #6)

In the 1.4pre CVS branch, there are problems with image captions that
contain links; the image alt text and "title" attribute of the link to
the image page are set incorrectly. I think fixing this bug would be a
way of fixing that problem as well.

If we still use the current <alltext> as a fall-back for anything, we need a way
of fixing that problem anyway, otherwise existing instances will remain broken.
However, no non-captioned image should have links in its <alltext> anyway, so if
we generate <title> and <alt> from <filename> for these cases, we can indeed
avoid fixing the problem.

tom wrote:

(In reply to comment #7)

(In reply to comment #5)

However, the most appropriate alt text for an image is
usually nothing at all

This is simply incorrect.

No, it isn't. It's a very valid viewpoint (and I don't believe alt text is an exact science, there are many
different points of view), and one that I share.

This is saying that users of alternative browsers or on slow connections
should be denied access to images. It's also treating some
handicapped users as second-class citizens. If there's a a point in
putting an image into an article, then everyone who reads the article
should be able to see that it's there, get an idea of what it is, and have
the option of loading or downloading it.

Blank alt text does not deny the image to anyone. It simply prevents the image from sometimes becoming a great
annoyance.

As Matthew pointed out, alt text is currently usually duplicated caption text - what's the point in that? The
caption text is already in the document. Having it displayed or read out twice is pointless.

All images on WP must have alt attributes.

Yes, but the attribute may be empty.

Many will have captions, but they aren't strictly necessary, as many images
can speak for themselves (unless, of course, they have empty alt text).
E.g.: a prominent portrait on a bio page, a chart with a graphical
title/caption, a country's location map.

For a portrait, good alt text would either be nothing or possibly a brief description of the person's appearance
(especially if their appearance is specifically relevant to the article). Bad alt text is "George W. Bush", it adds
nothing - but it's what a lot of portraits currently have.

For a chart, good old text would be details about the data it shows - possibly a brief summing up of the results
with a link to full information. Bad alt text is "a chart showing the growth of Wikipedia".

For a map, good alt text could be a brief description on where the country is, presuming it isn't already in the
text. Bad alt text is "a map of <wherever>". Especially bad alt text is "Image:LocationUSA.png", as you currently
get.

Most of the bad alt text would make an appropriate title, however.

And remember that every image on WP also serves as a link to the
Image: page, which may contain a more information that's not in the
article, important copyright info, or a more detailed text description of
the image.

I'd suggest that's what the longdesc attribute is for. But caption text can also be linked to the image page.

rowan.collins wrote:

(In reply to comment #10)

As Matthew pointed out, alt text is currently usually duplicated caption text

  • what's the point in that? The

caption text is already in the document. Having it displayed or read out twice

is pointless.

Hence my suggestion that it not default to being the same as <caption>, ever.
But images that don't use |frame| or |thumb| don't have automatic captions
anyway, so its a case of deciding whether the text that has been entered should
be used for alt, title, or both.

For a chart, good old text would be details about the data it shows - possibly

a brief summing up of the results

with a link to full information. Bad alt text is "a chart showing the growth

of Wikipedia".

Just a small point: as far as I know, alt attributes can't contain links;
they're just a short string.

For a map, good alt text could be a brief description on where the country is,

presuming it isn't already in the

text. Bad alt text is "a map of <wherever>". Especially bad alt text is

"Image:LocationUSA.png", as you currently

get.

Most of the bad alt text would make an appropriate title, however.

The problem is, how to tell which any given image's label is: a decent alt text,
or a decent title text. If we default to alt="" title="<alltext>" we break any
images where people *did* consider the alt text. For captioned images, though,
you might be right: just fall back on "".

And remember that every image on WP also serves as a link to the
Image: page, which may contain a more information that's not in the
article, important copyright info, or a more detailed text description of
the image.

I'd suggest that's what the longdesc attribute is for. But caption text can

also be linked to the image page.

Auto-linking caption text to the image page runs contrary to the current ability
to put formatting - including links - in the caption. I'm afraid I don't know
anything about the longdesc attribute: what is its official role, and is it
widely supported by UAs?

I guess it all boils down to whether we think that people who can't see images
would want to know they were there anyway. The advantage of having a non-blank
alt text is that it shows the user that there is something there, and allows
them to easily access a description page which may have more details about it
(which could include the description of results for a graph...). The
disadvantage is that, since they can't see it, they may simply be frustrated at
an alt text that tells them little more than that there is an image there.

michael wrote:

(In reply to comment #10)

(In reply to comment #7)

(In reply to comment #5)

However, the most appropriate alt text for an image is
usually nothing at all

This is simply incorrect.

No, it isn't. It's a very valid viewpoint (and I don't believe alt text is an exact science, there are many
different points of view), and one that I share.

This is saying that users of alternative browsers or on slow connections
should be denied access to images. It's also treating some
handicapped users as second-class citizens. If there's a a point in
putting an image into an article, then everyone who reads the article
should be able to see that it's there, get an idea of what it is, and have
the option of loading or downloading it.

Blank alt text does not deny the image to anyone. It simply prevents the image from sometimes becoming a

great

annoyance.

Images with blank alt attributes, *and* the links surrounding them, are hidden and inaccessible in Lynx. I
presume that audible and braille page readers would similarly omit an empty string, and possibly the existance
of the link, too. There are probably other cases we haven't thought of.

As Matthew pointed out, alt text is currently usually duplicated caption text - what's the point in that? The
caption text is already in the document. Having it displayed or read out twice is pointless.

I agree; this is the point of this bug. A mechanism that lets us specify useful alt text, and optionally captions. A
caption is *not* a substitute for alt text.

Captions are displayed next to an image, *adding to or claryifying* information that we can see by viewing the
image.

Alt attributes (optionally supplemented by title attributes and longdesc destinations) attempt to be a *substitute
for the image*. The W3's Web Content Accessibility Guidelines say that these should constitute "equivalent
information to the visual or auditory content."

All images on WP must have alt attributes.

Yes, but the attribute may be empty.

Yes, this is a way to *hide* a purely decorative image. The HTML 4 recommendation section 13.8 says:

<blockquote>
Do not specify irrelevant alternate text when including images intended to _format_ a page, for instance,
alt="red ball" would be inappropriate for an image that adds a red ball for decorating a heading or paragraph. In
such cases, the alternate text should be the empty string ("").
</blockquote>

I can't think of any instances on Wikipedia where an uploaded image is purely decorative. If the image is
content, then alt text must also be content. Alt attributes with zero content are for images of zero content.

Many will have captions, but they aren't strictly necessary, as many images
can speak for themselves (unless, of course, they have empty alt text).
E.g.: a prominent portrait on a bio page, a chart with a graphical
title/caption, a country's location map.

For a portrait, good alt text would either be nothing or possibly a brief description of the person's appearance
(especially if their appearance is specifically relevant to the article). Bad alt text is "George W. Bush", it adds
nothing - but it's what a lot of portraits currently have.

A portrait is not decorative -- it conveys information about what a person's appearance, dress, and age. It can
provide information about context and period (e.g. sepia-toned photo in a top-hat, or fresco with crown and
sceptre).

A brief description would be good, but even "portrait photo of George W. Bush" tells you what's there, and helps
you decide whether to look at it.

If the alt text is empty, how does a user browsing without images know that there is a portrait available for
viewing or downloading? At best, some browser *might* show that some image exists or display its filename,
which is practically never an acceptable text equivalent.

For a chart, good old text would be details about the data it shows - possibly a brief summing up of the

results

with a link to full information. Bad alt text is "a chart showing the growth of Wikipedia".

For a map, good alt text could be a brief description on where the country is, presuming it isn't already in the
text. Bad alt text is "a map of <wherever>". Especially bad alt text is "Image:LocationUSA.png", as you

currently

get.

Agreed, but in all these cases bad alt text is better than no alt text.

Most of the bad alt text would make an appropriate title, however.

And remember that every image on WP also serves as a link to the
Image: page, which may contain a more information that's not in the
article, important copyright info, or a more detailed text description of
the image.

I'd suggest that's what the longdesc attribute is for. But caption text can also be linked to the image page.

longdesc is one way of indicating more info about an image. Wikipedia makes it more accessible (in most
current browsers) by linking the image to the full image page, which may carry additional text.

W3 HTML 4 recommendation section 13.2 says about alt and longdesc:

<blockquote>
The alt attribute provides a short description of the image. This should be sufficient to allow users to decide
whether they want to follow the link given by the longdesc attribute to the longer description...
</blockquote>

Ideally, the Image: page would include a fuller "text equivalent" of the image. E.g. a description of George
Bush's appearance and circumstances of the photo.

As Wikipedia style guides are developed, we should incorporate more accessibility guidelines into them. The
Wikimedia interface must also be developed with the tools to support them.

wmahan_04 wrote:

I think everyone can agree that the alt text for an image is logically
separate from the caption, as Matthew and others noted. As a practical
matter, a caption may contain links and be as long as reasonably
necessary, while alt text should be text-only and is generally shorter,
as Rowan said.

So we should be able to do:
[[Image:foo.jpg|alt=<alt>|<caption>]]. I agree with that part of
Rowan's proposal, and hope most others do too.

The obvious question is what to do if no alt text is specified:

  1. Use empty alt text
  2. Use the caption

Option 2 would be most consistent with current behavior, and
is what Rowan proposed. However, I would favor option 1, for
the reasons Matthew and Tom gave. A caption is a long description
and could contain links, and in general doesn't make good alt
text, IMHO. If an image needs alt text, and it isn't specified,
I think that's a deficiency in the article, and not something
for which we can find a technical solution.

As Rowan noted, there is also the issue of the title attribute
of the link. Unlike the alt text, I don't see any good reason
to allow editors to specify the title explicitly; I think we
should keep the image syntax as simple as reasonably possible.
Practically speaking, the main use of the title attribute is in
a "tooltip" in modern browsers. Possibilities for title include:

  1. Blank
  2. The caption
  3. The name of the link target (always the image page, for now)
  4. Whatever the alt text is

I would argue against option 2 for the same reason as above.
I like option 3, which would give users with modern browsers
a hint about what happens when they click on the image, and
could complement a fix to bug 539. I don't feel strongly about
it, though.

(In reply to comment #8)

  • title text is the other position of <alltext> in non-captioned images, and so

like <alt> should probably fall back to that; it's less clear, however, what
fallback title a captioned image should have

When neither "thumb" nor "frame" is specified, the "caption" text isn't
used as a caption at all, so it makes sense to use it for the title.
I don't think it's necessary to use the caption for the title with
"thumb" or "frame", though, because then the caption is always displayed.
I don't think it makes sense to repeat the caption, which the reader can
presumably already see, in the tooltip.

mpt wrote:

The HTML spec's guidelines are not exhaustive. They say
alt="" is appropriate for "purely decorative" images, but
that does not mean alt="" is inappropriate in less "pure"
cases. Even very smart people often misunderstand this.

Maybe this will help: Imagine that MediaWiki didn't allow
graphics at all, just text. Would it be acceptable to
litter articles with non-sequiturs like "foopy.jpg" or
"Eiffel Tower viewed from the east"? Of course not. That
wouldn't make any sense when listening to the article.
Contributors would instead work towards the best possible
text for the article.

Now imagine graphics support has been added to MediaWiki.
For those using text-only UAs, has the best possible text
for each article suddenly changed? No, the ideal text is
still the ideal text. So there still isn't any reason to
add gibberish like "foopy.jpg" or "Eiffel Tower viewed from
the east" to the text. That still wouldn't make any sense
when listening to the article. Anyone who listens to it can
tell that any alt= text like that is wrong.

So when to use alt=? Well, occasionally a graphic makes
some of that ideal text redundant. For example, a map may
replace a sentence or two telling where a place is. A graph
may replace a text summary of some data. A diagram may
replace a list of instructions, a description of a chess
move, etc. In these cases, the text that the image replaces
should be moved into the alt= attribute for the image.

That is the only reason for alt= to be anything other than
"". It's the only reason. alt= is the *text equivalent*:
the text that you honestly would have included if the image
was never there, but that is redundant when the image is
visible. alt= is hard to understand because often we have
to work backwards. The image is introduced before the
text-only version is finished (especially on a Wiki, where
the text is never finished), so we have to ask: "If we had
the ideal text, would this image replace any of it?"

Portraits are an interesting example. Graphic portraits
feature regularly in biographical articles, but textual
portraits hardly ever do. So even if you *could* describe
someone's appearance textually, the most appropriate alt=
for a portrait image is still "" -- unless you can argue
with a straight face that their appearance would still have
been important enough to describe in the text if MediaWiki
didn't allow images (which probably isn't true), *and* that
the most appropriate text description is neatly made
redundant by the image (which probably isn't true either).

... Apologies for turning Bugzilla into a podium.

For those concerned about breaking existing syntax for
alternate text: I just visited [[Special:Randompage]] until
I had encountered 50 images with custom alternate text
specified. The alternate text was appropriate in ... zero
of them. (That's even fewer than I was expecting.) So if
the syntax is changed now, it will break very little that's
not already broken, and certainly much less than it fixes.

Images with blank alt attributes, *and* the links
surrounding them, are hidden and inaccessible in Lynx.

Hidden? Of course, that's the whole point. Inaccessible?
Not true -- see bug 371 comment 4.

michael wrote:

(In reply to comment #14)

so we have to ask: "If we had the ideal text, would this image
replace any of it?"

[...]

So even if you *could* describe someone's appearance textually, the
most appropriate alt= for a portrait image is still ""

You're using a backwards argument to say that images have no inherent
value. An image that shows *what someone looks like* isn't replacing
some text in an article, *it is part of the article*. It could still
have value to someone who is browsing without images turned on. A
"text equivalent" (which can consist of one or more of alt, title, and
longdesc) attempts to:

  1. Indicate the presence and nature of the image.
  2. Convey some information which is in that image.

In an ideal world, the image's longdesc attribute links to a page that
actually describes the image in detail, so an unsighted user could get
an idea of what George Bush looks like. Wikipedia's not there yet, so
we should be trying to improve accessibility, not work around it!

W3's Web Content Accessibility Guideline number 1 is "Provide
equivalent alternatives to auditory and visual content". Alt="Portrait
of George W. Bush" is infinitely closer to this than alt="".

For those concerned about breaking existing syntax for alternate
text: I just visited [[Special:Randompage]] until I had encountered
50 images with custom alternate text specified. The alternate text
was appropriate in ... zero of them. (That's even fewer than I was
expecting.) So if the syntax is changed now, it will break very
little that's not already broken, and certainly much less than it
fixes.

Okay. Let's build the tools that let us do it right, then show people
how to use them well.

Images with blank alt attributes, *and* the links surrounding them,
are hidden and inaccessible in Lynx.

Hidden? Of course, that's the whole point. Inaccessible? Not true --
see bug 371 comment 4.

Okay, so if a Lynx user suspects that maybe you've hidden a link and a
portrait on a page, he can type "*", guess that "bush_por.jpg" might be
useful, download the file, and fire up the Gimp to view the image.
Oops, turns out it's a Portuguese flowering shrub.

This doesn't prove that all non-visual browser users have access to
an image with alt="".

This doesn't fulfil WCAG Guideline 1.

This scheme substitutes dumb luck for established accessibility
techniques. It's a somewhat poorer user experience than FTP.

wmahan_04 wrote:

(In reply to comment #15)

W3's Web Content Accessibility Guideline number 1 is "Provide
equivalent alternatives to auditory and visual content". Alt="Portrait
of George W. Bush" is infinitely closer to this than alt="".

Even if there is already a caption with exactly that text?

I don't think there's any dispute that editors should be able to
specify alt text separately from the caption. To me, the only question
is what default to use when no alt text is given, and that is either
blank, or the caption text.

Beyond that, it's not a MediaWiki issue, IMHO, and you can take the
discussion to [[Wikipedia talk:Alternative text for images]], or
whereever, and argue for whatever policy you want.

ayg wrote:

*** Bug 8186 has been marked as a duplicate of this bug. ***

ayg wrote:

Fixed in r41364:

  • [[Image:Foo.png|alt=xyz]] now works as desired, setting the alt text.
  • If no alt text is specified, the alt text is the empty string, since it's useless to repeat the caption.

If the second point causes problems for users of deficient user agents that are incapable of allowing their users to interact with images without alt text, please open another bug (and link to it from this one). longdesc support would be nice, too, probably would be done by linking to a specially-named page like [[Image:Imagename/longdesc]] if it exists and doing nothing otherwise (but that scheme would have to handle non-local images). Again, that's another bug.

It's kind of sad that this was like a ten-line fix that anyone could have done at any point in the last four years, which could significantly increase Wikipedia's accessibility. Oh well, it's done now.

This caused parser test regressions; reverted in r41407:

Revert r41364 -- broke 22 parser test cases with change of alt behavior.

The caption was originally defined *as* the alt text (defaulting to the image file name if there is no alt text). Note that a separate caption text is only displayed in some display modes ('frame' and 'thumb', iirc), and not by default.

Please run the parser tests and check the effect you have on them. If it's really an appropriate change, then update the test cases. If you're not sure, consider backing out pending further discussion. :)

It might be appropriate to not set the 'alt' attribute for frame/thumb cases, but definitely not for inline images where we already have a way of setting the alt text which you're removing!

ayg wrote:

I tried running the parser tests, but they weren't working for me, per the e-mail I sent to Wikitech-l: they crashed, r40209 broke them. So I couldn't make sure they passed. And they're still broken on current trunk, with the same error message. So I pointed out the problem and committed without the parser tests.

The behavior of using the same syntax to mean captions for thumbs/frames but alt text for inline images seems confusing and kind of broken, not at all what would be expected. It seems like it would make more sense to consistently require alt= for alt text, for all images. I'm guessing it wouldn't cause much incompatibility, because I don't think I've *ever* seen the extra parameter used for alt text on inline images. Probably in most of the cases where it's used, people were trying to make a caption anyway, so the text would be just as inappropriate as in all the other cases.

(In reply to comment #20)

I don't think I've *ever* seen the extra parameter
used for alt text on inline images. Probably in most of the cases where it's
used, people were trying to make a caption anyway, so the text would be just as
inappropriate as in all the other cases.

I have seen images with the extra parameter with text. Perhaps not made for alt,
but appropiate to be used as it. So it may be good to use alt parameter for 'normal'
images, but it should fallback to the text.

ayg wrote:

Re-committed with the requested modifications in r41837.