Page MenuHomePhabricator

lang support for SVG images using SystemLanguageAttribute ill-defined and not properly supported in browsers
Open, LowPublic

Description

This was added to the PHP parser in bug 32987. Basic parsoid support was added in change Iae50f6e4948844b94a66f8437e12e05aa3ec1685.

Currently Parsoid set the lang attribute on the <img> and references the SVG directly in the <img> tag's src attribute. This is semantically correct.

However, in my tests, Firefox and Chrome (and all other browsers?) ignore the current lang when directly embedding SVG content.

Further, the SVG spec itself appears to be somewhat broken: http://www.w3.org/TR/SVG11/struct.html#SystemLanguageAttribute says that the system language should be *operating system's* language, not the current document language. Therefore, if I'm viewing he.wikipedia.org from my standard en desktop, all the SVGs will still be in English. That seems wrong.

So there are two wrongs here -- Parsoid isn't implementing this exactly the same as the PHP parser, and the PHP parser appears to be misusing the systemLanguage attribute. Hopefully someone will straighten this out eventually.


Version: unspecified
Severity: normal
See Also:
T18052: Support for multilingual SVGs

Details

Reference
bz58920

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:23 AM
bzimport added projects: Parsoid, I18n.
bzimport set Reference to bz58920.

Here is our evaluation of the pros/cons of using SVGs directly:

+ future-proof
+ accessibility / screen readers
+ quality of scaling, support for high-resolution devices

  • potential performance disadvantage with large or complex SVGs
  • no support in old browsers, but fall-back to PNG possible with JS

Further, the SVG spec itself appears to be somewhat broken:
http://www.w3.org/TR/SVG11/struct.html#SystemLanguageAttribute says that the
system language should be *operating system's* language, not the current
document language. Therefore, if I'm viewing he.wikipedia.org from my
standard
en desktop, all the SVGs will still be in English. That seems wrong.

I think systemLanguage was intended to be similar to content negotiation with accept-language.

So there are two wrongs here -- Parsoid isn't implementing this exactly the
same as the PHP parser, and the PHP parser appears to be misusing the
systemLanguage attribute. Hopefully someone will straighten this out
eventually.

I fail to see how PHP parser is misusing the attribute. From the php-parser perspective, the system in question isn't the browser, it is MediaWiki.

bawolff: yes, the confusion about 'system in question' is part of the problem here. In a client/server system where some duties (for example, image thumbnailing) are sometimes done on the client (for example, scaling driven by the height/width attributes on an <img> tag) and sometimes done on the server (using mediawiki's imagescaler to turn SVG into bitmaps) -- which of these is "the operating system" and who gets to decide what the current language is? You can answer one way or the other, I'm just saying the spec is not clear.

Concretely, parsoid would like to emit an <img> tag with the SVG in question and allow the browser to do the scaling --- and language conversion, presumably. The current way that 'systemLanguage' is designed and implemented does not allow that. Mediawiki *must* render all SVGs into bitmaps server side because we are implementing a behavior at odds with the SVG spec (since we are using "wiki language" and an explicit "lang option" override, not the "user's operating system's language") and at odds with browser behavior (since no browsers appear to implement systemLanguage at all).

(In reply to C. Scott Ananian from comment #3)

bawolff: yes, the confusion about 'system in question' is part of the
problem here. In a client/server system where some duties (for example,
image thumbnailing) are sometimes done on the client (for example, scaling
driven by the height/width attributes on an <img> tag)

Well in the mediawiki context, img width/height attribute is to prevent rendering re-layouts. Its not used for scaling. But I get what you're saying about the term "system" being extremely ambiguously, and I agree.

Concretely, parsoid would like to emit an <img> tag with the SVG in question
and allow the browser to do the scaling --- and language conversion,
presumably. The current way that 'systemLanguage' is designed and
implemented does not allow that. Mediawiki *must* render all SVGs into
bitmaps server side because we are implementing a behavior at odds with the
SVG spec (since we are using "wiki language" and an explicit "lang option"
override, not the "user's operating system's language") and at odds with
browser behavior (since no browsers appear to implement systemLanguage at
all).

Well its at odds with the browser behaviour, the svg spec says:

"Evaluates to "true" if one of the languages indicated by user preferences exactly equals one of the languages given in the value of this parameter, or if one of the languages indicated by user preferences exactly equals a prefix of one of the languages given in the value of this parameter such that the first tag character following the prefix is "-"."

So if we were blindly following the spec, we would use the language in Special:Preferences. Nowhere in the spec do I see it say operating system language.

This is hardly the only place where librsvg is going to be at odds with browser behaviour. The interpretation of the spec MediaWiki currently uses is much more useful to our users (imo) than the alternative.

If parsoid really wants to serve raw svgs to the users (Which would be problematic for other reasons given how large in terms of file size some of them are at commons), it could always just blacklist svgs with systemLang in them (surely you're going to need to be scaling some other image formats - djvus, tiffs, pdfs, xcfs, jpegs with a rotation set, etc). It could even inject js into the svg to change its structure appropriately (although that would be hacky beyond hacky).

@Bawolff: hm, maybe I got "operating system" from http://www.w3.org/TR/SVG11/i18n.html which states, "Multi-language SVG documents are possible by utilizing the ‘systemLanguage’ attribute to have different text strings appear based on the client machine's language setting."?

In any case, there are issues: https://bugzilla.mozilla.org/show_bug.cgi?id=936517 http://lists.w3.org/Archives/Public/www-svg/2013Nov/0039.html http://www.w3.org/TR/SMIL3/smil-content.html#adef-allowReorder

Wait, smil3 allows a switch statement based on cpu architecture... thats the silliest feature ive ever heard for a high level mark up language.

Like seriously.

(In reply to C. Scott Ananian from comment #3)

no browsers appear to implement systemLanguage at all

I'm not sure quite what you mean: Firefox, for example, implements the spec
perfectly. (As an example, add "de" to intl.accept_languages in your about:config -- not sure exactly where this is exposed in the GUI, but I'm sure you can work it out). Then visit https://upload.wikimedia.org/wikipedia/commons/4/48/P-n_junction.svg to see captions in German.

@Jarry1250: this may have been implemented recently in firefox, and/or I might have been tripped up by the 'allowReorder' issue mentioned in the urls reference in comment 5. It didn't work as I expected it to when I tested it.

But regardless, it doesn't work as *wiki* wants it to. That is, we expect the "system language" to correspond to the *wiki language*, which isn't how the feature is implemented in firefox at all. Nor does the SVG honor the HTML lang attribute, even it is embedded. In short, [[File:Foo.svg|lang=de]] is nothing like <img src="Foo.svg" lang="de"> or <figure><svg lang="de"> ... </svg></figure>.

(In reply to C. Scott Ananian from comment #8)

But regardless, it doesn't work as *wiki* wants it to. That is, we expect
the "system language" to correspond to the *wiki language*, which isn't how
the feature is implemented in firefox at all.
Nor does the SVG honor the
HTML lang attribute, even it is embedded. In short,
[[File:Foo.svg|lang=de]] is nothing like <img src="Foo.svg" lang="de">
or <figure><svg lang="de"> ... </svg></figure>.

"Nothing like" is a bit strong. If your point is that if you serve SVGs directly in some glorious SVG future, then you can't force the language they're displayed in (for the moment, at least), then yes, I agree. For the moment, we can, and, in some sense, we have to (that is, we are obliged to *try* the display SVGs in the correct language). I'm not sure where this bug is headed? Something to do with Parsoid?

Bug 61649 relates: it wants the 'lang' attribute in an SVG to be inherited from the document (or really, the wiki, which sets the document language).

Bug 58663 covers a related issue in media viewer.

Bug 3593 asks for client-side SVG rendering, which is obviously not possible when browsers implement 'lang' different than the thumbnailer does.

Bug 16052 attempts to be the master bug for multilingual SVGs.

Jarry1250: I'm just trying to figure out what the desired behavior of 'systemLanguage' is. From the bugs I reference in comment 10, it seems obvious that we *want* the systemLanguage to be inherited from the wiki language/document language, which is not how the existing SVG spec works. That's probably fine, it just means we will continue to use thumbnailers to convert our SVGs (and need to patch Parsoid to do so); the comments on bug 3593 give a few reasons why we might want to do that anyway.

Longer-term, it would be nice if the SVG guys got together with the HTML guys and gave us a proper solution.

One small clarification, as I've learned a bit more about mediawiki recently. The wiki's *content language* is not necessarily the same as the *wiki language*. The various translation tools allows you to have translated content in (say) italian on an english-language wiki. The Mediawiki API provides hooks to override the content language for a particular article.

So bug 61649 is really asking that the lang attribute in an SVG to be inherited from the *current article's* content language (which is not necessarily the wiki language).

I believe that content lang usually = wiki lang, page language = content lang for the current page (if translate/whatever messes with it)

Yes, I'm still a bit fuzzy on the exact APIs in use here, see bug 71380.

The main point of my comment is to warn whoever writes a patch for this issue not to blindly use the wiki's default language.

Long story short: most likely if we served SVGs directly we'd have to preprocess them for (among other things) the language setting to ensure expected behavior. (We'd probably also want to preprocess them for size, whitespace, stripping comments, decimating nodes from paths, etc, so this doesn't worry me terribly.)

@brion Yes. Unless we can lobby the HTML/SVG working group to make the HTML lang attribute work the way we want it to. Our input should be valuable to them, since I'd guess commons is one of the largest multilingual SVG repos out there. (But as you point out we might want to munge the SVGs somewhat in any case.)

Agreed. We'd also have to think about what the "expected behaviour" would be: at the moment I can fix the language of just the SVG, without fiddling around with anything else. I suspect that continuing something akin to that would be desirable.

From the task description:

Further, the SVG spec itself appears to be somewhat broken: http://www.w3.org/TR/SVG11/struct.html#SystemLanguageAttribute says that the system language should be *operating system's* language, not the current document language. Therefore, if I'm viewing he.wikipedia.org from my standard en desktop, all the SVGs will still be in English. That seems wrong.

Since I was around for the original drafting of that language (see the test attribute section of the SMIL 1.0 spec), I can speak with some authority on what the intent was.

The point of this was to put language choice in the user's hands, so that if someone was viewing a document on a Hungarian website, but they don't speak Hungarian (e.g. they only speak Japanese), they can set their client preference to Japanese, and then hope that a Japanese version is available. The site can still only offer Hungarian, at which point, Hungarian it is.

We (the working group responsible for SMIL 1.0) didn't fully anticipate the possibility that the user preference would be clearly articulated and controllable server-side, and it's not clear what we should recommend to the SVG working group. The preprocessing step that @brion outlines in T60920#1690716 seems sensible to me; that means we could still serve compliant SVG in accordance with user preference.

@RobLa as I recommended above, it would be valuable to have a feature like the "system language" that actually inherited the value from the HTML lang attribute. This would be consistent with the rest of the HTML and browser specs, and would allow server-side control where that was desirable.

This could be specified *in addition to* the existing "system language" attribute, to allow document authors the ability to have either client- or server-side control (and to accomodate standalone SVG viewers). However, in the browser context, I expect that the new attribute would be generally preferred since the browser provides greatly superior user control of language selections, fallbacks, etc.

I agree that we could preprocess, and that we'll want to do so for other reasons as well.

@cscott: Hi, I'm resetting the task assignee due to inactivity. Please feel free to reclaim this task if you plan to work on this - it would be welcome! Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for more information - thanks!

I would close this task as can't fix for won't fix.

The systemLanguage attribute is not ill-defined in the SVG spec.

We have no control over the SVG Specification and the semantics of systemLanguage. We cannot change the semantics to be the page language or the language specified by the lang attribute of the including document.

The systemLanguage attribute is properly supported in browsers. The attribute follows the specification. It may not be what we think is right, but the support follows the spec.

Currently Parsoid set the lang attribute on the <img> and references the SVG directly in the <img> tag's src attribute. This is semantically correct.

[[File:Foo.svg|lang=de]] is NOT equivalent to <img src="Foo.svg" lang="de">. The HTML lang attribute is declarative. It states the content of the element is in a certain language. So <span lang="en">good day</span> says that "good day" is English. The HTML <span lang="de">good day</span> is not a request to translate "good day" to "guten Tag". Quite simply, if Foo.svg is multilingual, then <img src="Foo.svg" lang="de"> is a false declaration. Also, lang is single valued; it is not designed to accept the set languages that Foo.svg may support.

HTML's lang attribute and MW's lang parameter may be spelled the same, but they are not equivalent.

MW serves PNGs of localized SVG files. MW will set the "user agent" language to the wiki page language if that is available and call the rasterizer.

Is it clear what Parsoid wants?