Page MenuHomePhabricator

Images: parse caption separately all the way to DOM and use DOMFragment encapsulation
Closed, ResolvedPublic

Description

Image tag produced with this wikitext: [[File:Wiki.png|caption<b>123</b>]] should have alt attribute with value "caption123" set.

It would be great for VE team if we could actually get HTML DOM of caption (instead of text with stripped out HTML tags) because then we would be able to provide nice experience when converting from inline image to block image and other way round.


Version: unspecified
Severity: normal

Details

Reference
bz48958

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:41 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz48958.

Our plan so far has been to only set the alt if it was explicitly specified using the alt=foo option. The caption HTML DOM will be in the data-mw.caption member. This lets VE set the alt and caption separately, while client-side or server-side postprocessing can set the default alt to the textValue of the caption when rendering for viewing.

In the current implementation parsing the caption to DOM is completely missing for inline images, and only implemented in an implicit way for block-level images (by returning tokens for the figcaption content inline).

We should use a full pipeline to parse captions all the way to DOM, and then use the internal DOMFragment mechanism to preserve these fragments through token transformations. They are then unpacked at the end of DOMPostProcessor processing. As a side effect this will also properly enforce nesting of captions without the hackish closeUnclosedBlockTags helper.

Sounds reasonable to me. Now that we have implemented properly-nested-DOM requirement on some wikitext constructs, we should be able to use that for image captions as well.

[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]

On a related note, we should *not* set an alt attribute for read-only viewing that just contains an image's file name. The alt attribute should only contain proper alternate descriptions of the image that would be useful for users with screen readers.

I think this is a dup of bug 52567, and should probably be resolved as such.

  • Bug 52567 has been marked as a duplicate of this bug. ***

Some notes re accessibility from a recent meeting with Gerardo Capiel:

  • Long-term we should try to store the alt text or long description along with the image itself, and use that to populate the alt attribute if none was provided explicitly. This requires significant work in core to store metadata along with image pages.
  • For screenreaders it would be good to also provide a longdesc attribute linking to the textual description on the image page.

Open a separate bug for the longdesc issue?

Focusing this bug on the "parse caption to DOM" issue -- test case is:

[[File:Wiki.png|caption<b>123]]

which should really have the </b> in the data-mw attribute.

This appears to have regressed. The data-mw.caption value is wikitext again. It should be parsoid DOM.

Change 145029 had a related patch set uploaded by Cscott:
WIP: Parse caption to DOM using recursive wikitext parse.

https://gerrit.wikimedia.org/r/145029

ssastry removed a project: Patch-For-Review.
ssastry set Security to None.