Page MenuHomePhabricator

RFC: Re-evaluate librsvg as SVG renderer on Wikimedia wikis
Open, HighPublic

Assigned To
None
Authored By
MZMcBride
Jun 27 2012, 10:00 PM
Referenced Files
F34149094: SVG_CSS_Test.png
Mar 10 2021, 5:40 AM
F34144683: Screenshot from 2021-03-08 07-48-13.png
Mar 8 2021, 8:37 PM
F28610136: Screenshot from 2019-04-11 13-39-13.png
Apr 11 2019, 11:41 AM
Tokens
"Like" token, awarded by Jc86035."Like" token, awarded by JoKalliauer."Like" token, awarded by MichaelSchoenitzer."Like" token, awarded by Liuxinyu970226.

Description

I don't know the exact history, but at some point Wikimedia wikis added the ability to support inline SVGs by passing them through librsvg, which takes the SVG code and generates PNGs, as I vaguely understand it.

There are some notes here: https://meta.wikimedia.org/wiki/SVG_image_support.

I can't find any information about which version of librsvg Wikimedia is currently using, but the choice of using librsvg should be re-evaluated, given its rendering issues (cf. other bugs in this bug tracker) and the existence of perhaps better alternatives.

See Also:
T53555: librsvg seems unmaintained
T120746: Improve SVG rendering
T10901: [DO NOT USE] SVG rasterisation and management on Wikimedia sites (tracking)

Details

Reference
bz38010

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
StalledNone
StalledNone
StalledNone
StalledNone
StalledNone
OpenFeatureNone
StalledBUG REPORTNone
StalledBUG REPORTNone
StalledNone
DuplicateBUG REPORTNone
OpenNone
StalledBUG REPORTNone
OpenNone
StalledBUG REPORTNone
StalledBUG REPORTNone
OpenNone
StalledBUG REPORTNone
StalledNone
OpenNone
OpenNone
DuplicateNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

The new version of librsvg is not an acceptable renderer:

  1. it does not take an IETF langtag (major)
  2. it does not handle textPath (medium)

The IETF langtag problem is growing. SVG Translate is injecting illegal langtags into SVG files: T271000. I suspect that confusion may have developed from librsvg 2.40 failing to handle hyphenated langtags correctly: T154237. SVG Translate's bogus langtags allow it to trick librsvg 2.40 into displaying Serbian in either Latin or Cyrillic scripts.

The new version of librsvg 2.44 wants a Unix locale string in the LANG environment variable instead of an IETF langtag. In simple terms, that means that MW would have to map IETF langtags to locale strings and then set the Unix LANG environment variable to that locale string. That is not a 1:1 mapping. It is not a problem for langtags such as en, de, or fr; it is a problem for sr-Latn, sr-Cyrl, zh_Hans, and zh_hant. And there is probably no way for non-IETF langtags such as sr-EC and sr-EL to survive the round trip.

Consequently, MW would have to localize the SVG file before handing it off to librsvg. That's not hard to do, MW should do it in the long run, but MW does not do it now.

MW cannot upgrade to a new version of librsvg; the mulitlingual SVG files will break.

The absence of even a limited textPath has always been an annoyance. IIRC, Firefox copped out by treating it as a text element; it didn't follow the path, but at least it provided the information. MW has lived without it for a long time, but that has had ugly consequences. Graphic artists convert their text to curves. Commons is a multilingual project, and curves are hard to translate. It is time to require some textPath support, and librsvg does not have it.

resvg will have its own problems, but it may be the only expedient. I presume it will be easy to pass the IETF langtag argument to it. There may be some rendering differences and problems, but that does not scare me given the mountain of workarounds librsvg has required. The vast majority of SVG files will be pedestrian and render the same. The only significant reservation is the resvg CSS parser.

I'd prefer that we could mark SVG files (say < 40 kB) to be served directly. I suspect that most modern browsers have sufficient SVG support. SVG files loaded into img elements will display title elements and animations while disabling scripts. Its 2021, and MW is using animated GIFs from 1987. (For consistent semantics, MW might localize the SVG file before serving it.)

@JoKalliauer Your timing results for Inkscape were a big surprise for me, and I ran all your tests in inkscape --shell on my little laptop.

114 files from Commons Featured collection rendered at 512px in 92 seconds. At least 75% of those were created by Inkscape, and no more than 15% by Illustrator.

1335 files from resvg collection rendered at 512px in 88 seconds (10 files entered some weird loops and inkscape exited; those were later excluded)

512 files from W3C collection rendered at 512px in 40 seconds (6 did not finish)

The actions pasted into the shell were like
file-open:Iowa_16_inch_Gun.svg; export-type:png; export-width:512px; export-do;
file-open:Pianino_-_mechanizm_angielski.svg; export-type:png; export-width:512px; export-do;
file-open:Flag-map_of_the_world.svg; export-type:png; export-width:512px; export-do;

I don't understand how you got those huge numbers.

@Glrx

I suspect that most modern browsers have sufficient SVG support.

Sadly, it's not true. Or at least as long as "browsers" is Chrome. Firefox has a pretty bad textPath support, while Chrome is pretty bad with effects.
Firefox has issues even with clipPath and doesn't support baseline-shift (aka subscript/superscript) at all.
And no browser support enable-background, albeit it was deprecated in SVG2.
Overall, browsers are way better than librsvg, but you still need workarounds for them. And more importantly, browser specific one.

The only significant reservation is the resvg CSS parser.

It's not perfect, but I wouldn't call it that bad.

I'd prefer that we could mark SVG files (say < 40 kB) to be served directly.

SVG size doesn't matter. Content does.

@Glrx

I suspect that most modern browsers have sufficient SVG support.

Sadly, it's not true. Or at least as long as "browsers" is Chrome. Firefox has a pretty bad textPath support, while Chrome is pretty bad with effects.
Firefox has issues even with clipPath and doesn't support baseline-shift (aka subscript/superscript) at all.
And no browser support enable-background, albeit it was deprecated in SVG2.
Overall, browsers are way better than librsvg, but you still need workarounds for them. And more importantly, browser specific one.

That "browsers are way better than librsvg" is sort of the point. Most of the SVG on Commons is pedestrian because librsvg is limited; textPath sometimes slips in for a big map, but otherwise it is only in little used files. I'm also less concerned with Chrome and Firefox because they do fix bugs (albeit on a timescale of about 6 months to a year) and they can probably be shamed by adding tests to https://www.caniuse.com.

Yes, baseline-shift is a problem, but the text is still readable. Furthermore, nobody gets it right: try doing e^(x^2).

I'm more concerned about support on Safari and other browsers. WMF has a diverse audience.

One proposal (below) is to serve only SVG that has been marked. There are a lot of SVG files that are trivial and should display reasonably on any browser.

The only significant reservation is the resvg CSS parser.

It's not perfect, but I wouldn't call it that bad.

JoKalliauer's tests showed good selector functionality, so I'm not too worried there. Many files on commons are done with Inkscape, and those files tend to not use CSS style elements (but heavily use style attributes). Illustrator overuses class, but its uses seem to be pedestrian. From what you've said, I just expect some CSS surprises.

I'd prefer that we could mark SVG files (say < 40 kB) to be served directly.

SVG size doesn't matter. Content does.

Size may be an issue for server bandwidth. Commons has lots of big SVG files.

The reality is WMF is not going to serve SVG any time soon, so that (partial) option is off the table. WMF cannot upgrade to the recent librsvg because that will break systemLanguage files. To use the recent librsvg, WMF will need to localize the SVG before passing it to librsvg. At the end of all of that, librsvg still has rendering issues; many bugs are fixed in the new release, but others still remain. That leaves two minimal-effort paths: limp along with the old librsvg or use a plug-in replacement for it. The replacement candidates appear to be resvg, inkscape, and batik.

Consider setting the SVG agent's IETF langtag preference(s).

  • librsvg uses the Unix $LANG environment variable, which is a locale string rather than a langtag. That's a problem. There is not an option for setting the langtag.
  • inkscape also uses the Unix $LANG environment variable. I do not see a command line argument that sets the langtag preference. Unless inkscape has some way to set the langtag (e.g., writing an options file), then it has the same locale string problem as librsvg.
  • batik has a -lang command line option that sets a langtag.
  • resvg has a --languages command line option that sets a list of langtags (no q value).

So systemLanguage constraints leave batik and resvg on the table.

There are a lot of SVG files that are trivial and should display reasonably on any browser.

This is actually a very good idea. Automatically analyzing SVG to detect what features they use should simplify the process.

From what you've said, I just expect some CSS surprises.

The main limitation is that CSS3/"CSS4" would not work. And mainly because of processing and not parsing. Something like CSS variables is pretty hard to implement.

The replacement candidates appear to be resvg, inkscape, and batik.

Inkscape and batik are far behind resvg. Especially performance-wise.

resvg has a --languages command line option that sets a list of langtags (no q value).

Yes, resvg doesn't care about $LANG, mainly because it's surprisingly hard to implement in a crossplatform and safe way. So it simply uses the value user provided.
On the other hand, resvg doesn't have a complete support for language tags. To do so you have to properly parse them and do some complex matching, so for now it simply matches the whole value.

@Glrx:
client side rendering; animated SVGs
I agree on that whitelisted SVG should be rendered on client side ( T5593 ), with opt-out or opt-in in the preferences.
Animated Gifs and Videos (e.g. webm) are imho still the golden html-standard. I hardly see animated svgs on the web, however I think thats also an advantage of client-side-rendering, since animated svg-converter are hardly known (imho e.g GPAC, and animated SVGs, animated GIFs and movies have a imho different scope.

CSS might be the biggest issue for resvg, however I read several bug-reports on help-pages often related to one of those (in reducing importance-order): T36947 T217990 T35245 T20463 T276684, and CSS is hardly mentioned (resvg has imho a better css-support than librsvg 2.40 which is currently used on commons). So the biggest downside is imho still a improvement. I know help-pages do not necessaryly represent importance: if you ask commons-svg-experts about the biggest current issue on commons imho most would say T11420 which agrees with the most common bug in the featured-test-suite, also it is hardly mentioned in questions on help-pages.
Generally for CSS-Problems you can easily make a workaround; often by just using https://svgworkaroundbot.toolforge.org/ . I know that CSS can be helpfull, but it is imho mostly used by SVG-experts, and I personally avoid it, and in e.g. inkscape you imho cannot add CSS. (I also find it confusing if the xml contains a different value than CSS, and one of them overwrites the other, depending on the priority-list.)

@Ponor:

I rerun the tests with 512px

librsvgresvgbatikInkscape (start per image)inkscape (run all in the same job)inkscape (remove two files)
time featured-collection (512px)4m 28,886s1m 15,307s10m 8,168s5m 9,164s2m 27,598s
time resvg-collection (512px)6m 13,054s2m 35,135s63m 36,648s38m 5,628s17m 8.889s2m 22.970s
time w3c-collection (512px)1m 46,776s1m 12,591s29m 46,446s21m 14,825s4m 13.46s
time 2006-MediaWiki-collection (512px)23.129s9.551s186.809s87.313s

Differences to you:

  • I start and exit inkscape (without gui) for every image (inkscape "$file" -w 512 --export-type="png"), which is (for inkscape) very time-consuming, it is imho as it is done currently on WMF-Servers, @Gilles I'm not shure if it is a good idea to keep inkscape-job open and run all images in the same process (e.g. if it hangs) .
  • I measure the CPU-time (and limited to one CPU), not the real wall-clock-time.
  • I excluded only one image Cone clutch.svg (featured) (Inkscape hangs), so in my case only one featured image fails in Inkscape (and no other), in your case 16 different images test-suite fails.

As discussed with you, you excluded images that fail after a long time. (Which makes sense but cause huge differences!)
Inkscape in the resvg-collection (my times)

  • with restarting for each image needs about 38minutes
  • without restarting 17minutes, and
  • with removing two images (as you did) 2minutes.

So how to limit maximum time (before success/chrash) should be depending on the time-out-limit, see T200866 as well as https://commons.wikimedia.org/wiki/User_talk:JoKalliauer/SVG_test_suites#time-out-limit

From what you've said, I just expect some CSS surprises.

The main limitation is that CSS3/"CSS4" would not work. And mainly because of processing and not parsing. Something like CSS variables is pretty hard to implement.

SVG 1.1 uses a subset of CSS2. The simple view is WMF only supports SVG 1.1, so CSS3/CSS4 are irrelevant. Even SVG 2.0 subsets CSS. IIRC, SVG 2.1 is toying with ::before and ::after psuedo selectors. Most graphics editors are going to output pedestrian SVG with no or trivial CSS.

Yes, resvg doesn't care about $LANG, mainly because it's surprisingly hard to implement in a crossplatform and safe way. So it simply uses the value user provided.

I wish Gnome understood that.

On the other hand, resvg doesn't have a complete support for language tags. To do so you have to properly parse them and do some complex matching, so for now it simply matches the whole value.

And that is where Gnome went astray. BCP 47 has several types of langtag matching. one can create langtags with * wildcards. If one looks at all that BCP 47 might imply, then one could believe that SVG needs complicated langtag matching and therefore should use somebody's BCP 47 langtag library. But that is not the case. SVG 1.0, 1.1, and 2.0 have all specified the "Basic Filtering" matching method. SVG did not adopt "Extended Filtering".

Adding HTTP Accept-Language with SMIL allowReorder processing takes less than half a page. It does not do complicated matching but rather scores each clause and keeps the best.

With a specific locale string of LANG=es_ES.utf8 (which is a transliteration of es-ES to a locale string), librsvg displays systemLanguage="es" text. It should only display text that is at least es-ES. See T261192#7053643

Incidentally we're just having a very similar discussion in the Inkscape project, and I believe I can clarify some things here:

Usage of system locale

Our current opinion is that usage of the system locale (for example $LANG variable of the form "de_DE.UTF-8" which holds a POSIX locale) is the most suitable thing to do for many applications:

  • The SVG spec simply states "Evaluates to "true" if one of the language tags indicated by user preferences is a case-insensitive match"
  • It does not say anything about how applications are supposed to enable the user to state their preferences.
  • Implementing something similar to how browsers allows users to set Accept-Language to "arbitrary" values, certainly is *one* way to go but is likely to be overkill for most applications.
  • Considering the system locale therefore is the most obvious way to derive the user's preferences.
  • If for example the user prefers "es_ES" locale it's only reasonable to present them with "es-ES,es" (in that order).

Overriding system locale

The observation above ("inkscape also uses the Unix $LANG environment variable") is not wrong but actually only captures a small part of what Inkscape uses. In fact Inkscape considers *all* locale-related environment variables, i.e. LANGUAGE, LC_ALL, LC_MESSAGES and LANG (and possibly even other native locale indicators depending on OS). For the inclined reader: Inkscape internally uses glib's g_get_language_names() for this.

The environment variable most people will want to use to override the system locale is therefore the much more suitable LANGUAGE, which even accepts a list of languages like LANGUAGE=es_ES:en and would only match "es-ES" in that case but not "es".

Unfortunately I believe MediaWiki currently does not allow setting LANGUAGE.

In any case we'd be interested if it made sense for potential users (and I count MediaWiki here) if Inkscape offered a command line option to allow specifying the "language preference" explicitly (it could always be added if the environment variables are not sufficient).

allowReorder

First of all, note allowReorder is not part of any version of the SVG specification.

Inkscape currently renders according to SVG 1.1 spec. In this version of the spec the first matching object in the <switch> is rendered, even if it is not the "most preferable" language for the user).

SVG2 changed the spec: A <switch> is now always rendered as if the allowReorder attribute, defined in the SMIL specification, was set to 'yes'. The allowReorder attribute itself is still not part of the spec, though.

Incidentally we're just having a very similar discussion in the Inkscape project, and I believe I can clarify some things here:

Usage of system locale

Our current opinion is that usage of the system locale (for example $LANG variable of the form "de_DE.UTF-8" which holds a POSIX locale) is the most suitable thing to do for many applications:

  • The SVG spec simply states "Evaluates to "true" if one of the language tags indicated by user preferences is a case-insensitive match"
  • It does not say anything about how applications are supposed to enable the user to state their preferences.
  • Implementing something similar to how browsers allows users to set Accept-Language to "arbitrary" values, certainly is *one* way to go but is likely to be overkill for most applications.
  • Considering the system locale therefore is the most obvious way to derive the user's preferences.
  • If for example the user prefers "es_ES" locale it's only reasonable to present them with "es-ES,es" (in that order).

It is not clear what your goal is. Locale may be "the most suitable thing to do for many applications," but that is not what is being discussed here. SVG agents need to be able to set their language preference much like the HTTP Accept-Languages header sets preferences. If I want to set the SVG language preference to es-ES, then I do not want any processing such as "If for example the user prefers "es_ES" locale it's only reasonable to present them with "es-ES,es" (in that order)."

It is not a problem if an SVG agent at start up guesses that the user's default locale es_ES implies something like LANGUAGE=es_ES,es. That's a reasonable guess. But the current problem is whether librsvg can be told an explicit preference without garbling or extending it. There is a huge type conflict: librsvg wants locale string types and WMF wants to specify IETF langtag types. (Well, WMF is even confused there because there are also Wiki language tags that are different from IETF langtags.)

Overriding system locale

The observation above ("inkscape also uses the Unix $LANG environment variable") is not wrong but actually only captures a small part of what Inkscape uses. In fact Inkscape considers *all* locale-related environment variables, i.e. LANGUAGE, LC_ALL, LC_MESSAGES and LANG (and possibly even other native locale indicators depending on OS). For the inclined reader: Inkscape internally uses glib's g_get_language_names() for this.

The environment variable most people will want to use to override the system locale is therefore the much more suitable LANGUAGE, which even accepts a list of languages like LANGUAGE=es_ES:en and would only match "es-ES" in that case but not "es".

Unfortunately I believe MediaWiki currently does not allow setting LANGUAGE.

In any case we'd be interested if it made sense for potential users (and I count MediaWiki here) if Inkscape offered a command line option to allow specifying the "language preference" explicitly (it could always be added if the environment variables are not sufficient).

MediaWiki software currently only passes its PHP $lang argument through the LANG environment variable. That was done to make librsvg work. it is trivial to set other environment variables in the PHP rasterize() method. Setting an environment variable(s) is not the correct method for resvg and probably also incorrect for batik (both of which take command line arguments).

In the larger sense, setting the locale is the wrong thing to do. Say the program discovers an error and want to log that error. To me, it should write the error in the system's language (say English) rather than the Chinese that it might be processing at the moment. I expect server logs to be in the local language and be independent of any language a user may have requested.

In Inkscape's case, imagine an English-speaking graphic artist who wants to look at the systemLanguage="zh-Hant" text to see if the spacing is OK. Does he really want his whole user interface turned into Chinese? Or does he just want the graphic to display in Chinese?

The preferred method for WMF is a command line argument that looks exactly like Accept-Languages. Initially, WMF would only use one langtag, but its langtags can be nonstandard (e.g., als, sr-EC, sr-EL, zh-Hans, and zh-Hant). Non-standard langtags probably do not survive a trip through locale string processing.

Interestingly, the librsvg locale code can take IETF langtags and possibly even an Accept-Languages string, so it should be easy for librsvg to add a command line argument. Gnome has an issue number for it.

allowReorder

First of all, note allowReorder is not part of any version of the SVG specification.

Inkscape currently renders according to SVG 1.1 spec. In this version of the spec the first matching object in the <switch> is rendered, even if it is not the "most preferable" language for the user).

SVG2 changed the spec: A <switch> is now always rendered as if the allowReorder attribute, defined in the SMIL specification, was set to 'yes'. The allowReorder attribute itself is still not part of the spec, though.

allowReorder processing snuck in during SVG 1.1. IIRC, Firefox had and obeyed the allowReorder attribute, but it did the langtag processing wrong. SVG 2.0 never had the attribute, but early versions made allowReorder processing optional.

OK, sorry for trying to clarify and help. Got the message, will keep out of the discussion going forward again.

@Patrick87

I for one found your insights useful, and would like you to feel encouraged to participate in this and other tickets of your interest.

From the Discussion on Inkscape about getting the Wikipedia-renderer

Martin Owens (the owner of the inkscape project) raised the question, if we would like SVG 2.0-Support, which is an inofficial Draft. Side-note Validators would call SVG 2.0-Files in most cases invalid, even if it does not influence rendering. (But validity is imho not something to aim for.).

That's according to Owens a bit a political question, if Wikimedia supports SVG 2.0-files, it is more likely that more renderer support SVG 2.0 and it won't "end up being just another Inkscape SVG format" (like SVG 1.2, which imho will never release).

Owens wrote that Inkscape is primary an editor for SVG documents, and only secondly a SVG generator for browsers. So it supports inkscape-features which are neither in the SVG 1.1 nor in the SVG 2.0 DTD . For example Inkscape uses <sodipodi:namedview pagecolor="#ffffff" inkscape:pageopacity="1"/> for creating a white background, but only <rect width="100%" height="100%" fill="#ffffff" sodipodi:insensitive="true"/> or <circle r="1e4" fill="#ffffff" sodipodi:insensitive="true"/> are supported by browser/render. That is imho a good feature for SVG-editors, but maybe not for SVG-Renderer.

I see the attitude of Wikimedians that we want SVG-Files, that are editable by any software, and not Inkscape-files, that's a reason, why we require free-licenses even for file-formats (e.g. don't allow *.mp4). So I personally see Inkscape as svg-render problematic, because it supports features that are not defined by any SVG DTD (neither in SVG1.1 nor in SVG2.0) (knowingly that Inkscape is under a free license).

Currently of about 100 broken files on Commons by librsvg about 2 files contain rendering-relevant SVG 2.0, so the importance on Commons for SVG 2.0 is currently negligible (notice even under the broken ones, and librsvg 2.40. imho does not support any SVG 2.0). The support of SVG 2.0 is imho not something to aim for, since we should stick to the current SVG1.1 standart to have a clear, unique rendering (knowingly that browsers have some SVG 2.0-support). The support of SVG 2.0 is imho not something bad, knowingly that supporting this features is according to the current SVG 1.1 Standard stricly speaking imho wrong (i.e. bug).

I think Inkscape sounds good for Inkscape-Editors, but I think it is optimized for creating files not for rendering. (But still it is a good choise, imho at least compared to librsvg.)

I do not care (that much) if we change to resvg or inkscape (or similar), however to stick to librsvg (even the current version is too buggy) just because we do not know how to decide is imho the wrong way to go.

I do not know if headless browsers would also be a suiteable solution.

My recommendation: Maybe mass-svg-rendering should be done by resvg (fast, SVG 1.1) and having a flag that specific images (e.g. with inofficial inkscape-features) get rendered by inkscape. [Iff we want Inkscape-files (invalid SVG), that can only be rendered/edited by Inkscape.]

OK, sorry for trying to clarify and help. Got the message, will keep out of the discussion going forward again.

Please don't. I 'll echo @Krinkle here. I found your input useful and I learned some things from your comment, which I found very down to earth and clarifying.

It is not clear what your goal is.

Hello, Keep CoC and Phabricator etiquette in mind. Thank you.

Hello, Keep CoC and Phabricator etiquette in mind. Thank you.

In the Wikiversum: New anonymous newbies have the ability to change content without understanding Wikipedia, so many experienced users in the Commuinty are used to speak quite straight(Not something I like.), otherwise the Wiki get crowded by wrong edits. I think that's a reason why many Newbies find Wikipedia-Discussions generally quite harsh, also we maybe have similare rules in :w:en:Wikipedia:Etiquette and :w:en:Wikipedia:No_personal_attacks. So I would not overinterpret the tone of Glrx (Commons-Community-member).

I find Patrick87's and Glrx comments useful, I hope you both keep in the discussion.


Coming back to the topic:

What I understood from todays discussion T283083 is that this evaluation is stuck till Thumbor got upgraded T216815 , before that we/WMF can't do "anything" .

as far as I understood @AntiCompositeNumber: The upgrade is planed this summer , and as I understood, it might depend on the workload of @Gilles when he is able to work on Thumbor , if thumbor can be upgraded this summer or later.

Is using an :w:en:AppImage, e.g. as provided by Inkscape, a possible solution? That would make the renderer-version independent on the Debian-version or any libary-version. As far as I know AppImages (single-executable-file, distribution-independend) are portable and can be run on any Linux-system without any prerequisites, without installing (without root-permission).

So for Inkscape we could use the latest release, the latest development-version, any older release or even all of them allongside (e.g. to avoid regression-bugs), independent when/if we update Thumbor .

librsvg repo haas been disabled and doesn't support node v12+ (https://github.com/2gis/node-rsvg/tree/0.7.0). I see we could switch to puppeteer. E.g. https://github.com/etienne-martin/svg-to-img as a replacement?

(Debian bullseye uses nodejs 12.22.5).

Even the repo it says to use hasn't received an update since 2019...

librsvg repo haas been disabled and doesn't support node v12+ (https://github.com/2gis/node-rsvg/tree/0.7.0). I see we could switch to puppeteer. E.g. https://github.com/etienne-martin/svg-to-img as a replacement?

(Debian bullseye uses nodejs 12.22.5).

Even the repo it says to use hasn't received an update since 2019...

We don't use that. Thumbor is written in Python (2, we know), but we shell out to rsvg-convert anyway. Librsvg is written mostly in Rust now, but the version currently in production is still C. Upstream is https://gitlab.gnome.org/GNOME/librsvg, packaged as https://packages.debian.org/stretch/librsvg2-bin.

librsvg repo haas been disabled and doesn't support node v12+ (https://github.com/2gis/node-rsvg/tree/0.7.0). I see we could switch to puppeteer. E.g. https://github.com/etienne-martin/svg-to-img as a replacement?

(Debian bullseye uses nodejs 12.22.5).

Even the repo it says to use hasn't received an update since 2019...

We don't use that. Thumbor is written in Python (2, we know), but we shell out to rsvg-convert anyway. Librsvg is written mostly in Rust now, but the version currently in production is still C. Upstream is https://gitlab.gnome.org/GNOME/librsvg, packaged as https://packages.debian.org/stretch/librsvg2-bin.

Mathoid uses it though. I presumed that's what this task was about.

librsvg repo haas been disabled

No it has not been. That's an unrelated repo.

Mathoid uses it though. I presumed that's what this task was about.

Indeed. Please file a different task though tagged Mathoid and Platform Engineering and subscribe @Physikerwelt as well. If a dependency is abandoned, a replacement needs to be found/written. Otherwise we 'll eventually need to disable that functionality

Mathoid uses it though. I presumed that's what this task was about.

Indeed. Please file a different task though tagged Mathoid and Platform Engineering and subscribe @Physikerwelt as well. If a dependency is abandoned, a replacement needs to be found/written. Otherwise we 'll eventually need to disable that functionality

T247697: Rethink mathoids SVG to PNG conversion already exists as a subtask of this one, which may have been the cause of the confusion. I'm not sure it really should be, as Mathoid is self-contained (producing both the SVG and PNG output) and doesn't have to use the general-purpose renderer.

Mathoid uses it though. I presumed that's what this task was about.

Indeed. Please file a different task though tagged Mathoid and Platform Engineering and subscribe @Physikerwelt as well. If a dependency is abandoned, a replacement needs to be found/written. Otherwise we 'll eventually need to disable that functionality

T247697: Rethink mathoids SVG to PNG conversion already exists as a subtask of this one, which may have been the cause of the confusion.

I see, my mistake I missed that. Thanks for pointing it out. I did leave some comments on that task. Let's not hijack this task more for Mathoid's SVG functionality.

@JoKalliauer

I've basically read through all the discussion here, is there currently a problem with librsvg not being upgraded to the latest version(Rust)?

If you need a Chrome-like rendering effect, I would suggest trying skr-canvas, which runs in Node.js and uses skia to render SVG.

With skr-canvas + resvg it should solve most of your problems in SVG_test_suites [1], and you might also consider adding skr-canvas to SVG_test_suites for test results.

[1]: https://commons.wikimedia.org/wiki/User:JoKalliauer/SVG_test_suites/resvg_Issues_details

@Yisibl The better is the enemy of good. Thanks for that comment, however skr-canvas is "This project is in pre-release stage. And there may some bugs existed.". I don't see any benchmarks. resvg had a optional skia-backend-support in earlier versions, but @RazrFalcon dropped it, so I'm doubtfull that is on a comparable development level as resvg.

Making Time benchmarks is pretty straightforward, but compare functions is tricky, because you often have to read and interpret the SVG1.1-Spec to know who is right and who is wrong, and many are not defined. However you are free to add it to User:JoKalliauer/SVG_test_suites.

As long as WMF-developers do not have any time to do any progress on this, I don't see any sense in adding another renderer to the benchmark. If WMF needs a benchmark to decide, I will check skr-canvas, otherwise it is imho useless waste of time.

Improve SVG Rendering is currently on the 5th Place in the Community Wishlist Survey 2022. In the last two years they took the first 5 Projects each. If you like to support the project your support might be essential. You can vote till 11. Februar, 18:00 UTC (so vote in advance to avoid time-zone-problems).

My opinion is to let the browser do the rendering. We just have to strip the javascript withing the SVG on upload to prevent XSS attacks.

About the SVG language, all browsers currently support it. https://caniuse.com/?search=systemLanguage

I get that you say to allow the user to switch the language depending on which wiki they are. This could be solved with javascript. First we can get the current browser language by navigator.languages, then we search the DOM using Javascript's querySelector() and do the changes in each <switch> tag.

Just my two cents.

@Arthurfragoso, the only issue is that font rendering can be wildly inconsistent between devices and browsers. It can cause labels to not line up correctly, text to overlap, or even certain characters not to show up. Sure, best practice may be to convert raw text to paths, but there are lots of cases where that isn't practical or desirable (especially if an SVG file needs frequent edits or updates).

My opinion is to let the browser do the rendering. We just have to strip the javascript withing the SVG on upload to prevent XSS attacks.

No, there are several more things to consider. E.g. embedding an external SVG into the SVG. But I'm not an expert on this.

I listed all illegal SVG-content i found out on https://commons.wikimedia.org/wiki/User:JoKalliauer/IllegalSVGPattern JavaScript is Point 5. (So it already get blocked during upload, even though some files still exist which were uploaded before the filter was introduced.)

About the SVG language, all browsers currently support it. https://caniuse.com/?search=systemLanguage

Yes and No. Yes all common browsers support simple cases, however systemLanguage is handled completely differently by different engines.
E.g. a systemLanguage without a switch or systemLanguage="en_US" aswell with a systemLanguage="en".
Some things are rendered wrong, others are not even defined in the Definition, therefore there exists no unique rendering.

In the end you have to check every SVG if it is supported or not, that's what I documented on User:JoKalliauer/SVG_test_suites/ReSVG-Test-suite for 1000s of files.
E.g. https://commons.wikimedia.org/wiki/File:Test_suite_resvg_a-systemLanguage-006.svg is rendered correctly by Wikimedia, but rendered wrong by Chrome, so e.g Chrome does not fully support systemLanguage, and Firefox is not better.

The SVG-Compatibily in Browsers is a own task: T134410 Evaluate SVG rendering compatibility in browsers

If a SVG-file contains systemLanguage (e.g. File:Unicode_Geschlechtersymbole.svg) and the SVG should be rendered locally, Wikimedia should imho provide a single Image for every language.

@Arthurfragoso, the only issue is that font rendering can be wildly inconsistent between devices and browsers. It can cause labels to not line up correctly, text to overlap, or even certain characters not to show up. Sure, best practice may be to convert raw text to paths, but there are lots of cases where that isn't practical or desirable (especially if an SVG file needs frequent edits or updates).

That is a very valid point. Most SVGs that are embedded directly, converted the text to path. However files on Commons should be editable for derivatives, therefore they should almost always contain real text. Often Users only think about their own Device and often use copyright-protected fonts like Arial or Times, which are not available on e.g. default-Linux-Systems. Since the image is rendered correctly on their PC, they don't care.
We imho need to ensure a unique rendering that it is the uploaders responsibility and not the readers one, otherwise we will end up having many broken files.

@Ahecht : Maybe using µsvg might be a solution. µsvg converts the SVG into the most simplistic SVG (converts text to path, CSS&use will be resolved,...). Those svgs can be generally rendered correctly even by simple SVG_renderer. So the uploaded SVG could be complex, but the SVG for rendering by the client would be simple.

Another problem about browser-rendering is imho that browsers mostly support the unreleased draft of SVG2.0. Supporting the latest cutting edge features is something everyone want's; but in 10years rendering those files correctly might be a pain in the ass. (Okay SVG2.0 won't change that much, but you might get the point.)

I would like WMF to directly serve SVG files. Today's browsers offer reasonable SVG support. Letting the browser render the image also allows for dynamic interaction.

The font issue is solvable. The brute force solution would have WMF serve webfonts.

The systemLanguage issue can be solved many ways. It can be localized at the server or it can be localized at the client. WMF could even choose to change the semantics of its webpages: the SVG image displays the image in the user's preferred language rather than that wiki's language. It would have little impact on most users. If I'm usually on the de.Wiki and my preferred language is de, then it does not impact my viewing. It's only strange when I visit the zh.Wiki and see German illustrations.

I do not know WMF's constraints, but one of the advantages of its servers converting SVG to PNG and then serving PNG is lower average network bandwidth. That is a big deal for JPEG and PNG files, which I think make up the bulk of WMF's images. Instead of serving a full-size 3 MB JPEG, WMF can serve a small 40 kB thumbnail. My gut tells me that WMF needs to do that to be efficient. I also expect that cellphone users appreciate the smaller bandwidth and quicker page loads.

I believe SVG files are a small fraction of the images on Commons, so serving them at full size would not hurt as much as serving full-size JPEGs. I still see merit in thumbnailing SVGs into PNGs. One editor's measurements suggested the average SVG file is 700 kB. With GZip compression, the transferred size might be 200 kB. That's probably still 5 times larger than the expected PNG thumbnail.

Furthermore, thumbnailing protects users from not only XSS, but it also protects them from malicious rendering. WMF puts a clock on rendering an image into PNG. It must render in a few seconds or it is abandoned. Even if the SVG took several seconds to render on a WMF server, the rendering of the resulting PNG on a browser at 1:1 will be fast (sub second). Somebody can make an SVG file that is computationally expensive. Consider and image that has 8,192 Gaussian-blurred layers that are semi-opaque. If that image is not slow enough, then use a million layers. What happens when one views a wiki page with several such images with her browser?

There are even more sophisticated issues. I can make an SVG file that blinks at 3 Hz. If that file is directly served, then it might trigger an epileptic seizure. I might also generate some SVG that flashes single frame subliminal messages.

I would like WMF to directly serve some SVG files, but there are both technical and security issues that arise.

JoKalliauer raised the priority of this task from Low to High.Jun 10 2022, 9:04 PM

According to https://www.mediawiki.org/wiki/User:JoKalliauer/phab/wikimedia-svg-rendering#table I think this task might be the most important one of Wikimedia-SVG-rendering . Almost all bugs reported in Wikimedia-SVG-rendering depend in the renderer.

Here's what I understand.

Directly serving SVG is not a short-term decision.

There are only two thumbnail renderers in contention: the Rust version of librsvg and the newly-minted resvg. Either should fix a huge backlog of SVG rendering problems, so either renderer will be a big improvement. Both have better CSS support than the current renderer. Developers for both renderers want to support WMF.

Both those renderers use Rust, a language that WMF's current version of Debian does not support.

The task of upgrading Debian will allow Rust-based renderers such as the new librsvg and resvg to be dropped in.

I do not know, but I expect both will consult the operating system for the list of fonts. Nothing special should be needed.

Both renderers should work for hyphenated systemLanguage langtags such as zh-Hant (the current WMF version of librsvg does not).

However, MW currently passes the language-to-render $lang variable in the $LANG environment variable. That is a type violation: $LANG should be a Unix locale string; it is not an IETF langtag. There is not a 1:1 mapping between IETF langtags and Unix locale strings (which are also supposed to be opaque!).

Consequently, MW / Thumbor must change how the $lang variable is passed to the renderer. The argument passing should not use locale environment variables.

As I understand it, both librsvg and resvg now take command line arguments. (Citations)

librsvg 2.52.x will have a new --accept-language parameter, which will allow to specify the user's preferred languages by passing the HTTP Accept-Language header to librsvg: https://gitlab.gnome.org/GNOME/librsvg/-/issues/356 (Not sure if it will get backported to the 2.50.x series)

guessing this is the Rust source for command line arguments...

For MW, that means a rewrite of rasterize().

That is, starting at line 343, the external command must also process a $lang pattern substitution. The code starting at line 355 should be deleted: $lang is not a Unix locale string.

The $lang pattern substitution could be involved if $lang is false, but I think that could be simplified. The current MW semantics render SVG files in English if a language is not specified. Therefore, if $lang is not specified, then set it to en. That should give the same semantics as before. In the past, if $lang was not specified, then the operating system's $LANG environment variable would come into play, and that variable would specify en.

Consequently upload URLs with .../300px... would be rendered in en English, and URLs with .../langzh-hant-300px... would be rendered in zh-hant Chinese.

For Thumbor, engine/svg/svg.py needs similar changes.

See also T261192.

T308395 reveals a problem when the SVG filename contains strings such as $output.

The code should be written so either librsvg or resvg can be used.

We should choose one of the renderers. JoKalliauer should have good insight into that choice.

Thanks for that roundup, @Girx! There's a lot that has happened in the SVG world since I dared to call myself an expert on the subject.

Who is the expert on the SVG sanitation code in MediaWiki core these days, and when was the last time they made a serious change to that part of the code? My hunch/fear is that the answer is "no one has seriously looked at that code in a while", and I'm guessing that's the biggest bottleneck.

Your hunch/fear would be correct.

I would like to note that this can all easily be implemented for non-wmf wikis. If someone just spent some time on adapting SVGHandler (or created an extension to override SVGHandler).

It just CANNOT easily go to WMF production any time soon because of security reviews, thumbor plugins which would have to be made, and the fact that the thumbor install itself is stuck in old systems that require updating all things for which there currently are no WMF budgets..

I am… getting impatient enough to ask: how hard is it to, really, just make our own statically-compiled rsvg-convert binary into a deb package and then deploy it? I mean:

  • Rust already builds binaries with rust stuff statically linked in.
  • rustup is available for getting us an installation of rust without going through debian, and without interfering with anything stored in a prefix. Only thing that could stop rustup is the glibc version, but even then we could just build it on a newer distro and do static-crt.
  • System C deps for librsvg feel… reasonably conservative? I am not ruling out the possibility that it’s too new though.
  • deb packages are easily assembled from a DESTDIR structure with dpkg-buildpackage.

We can get this as a stop-gap measure *while* we talk about what else to switch to. The surface for any security review would be minimal compared to anything that requires adding a layer of adaptation to the PHP side (hopefully we just do the language code change).

I am… getting impatient enough to ask: how hard is it to, really, just make our own statically-compiled rsvg-convert binary into a deb package and then deploy it?

Discussion on upgrading librsvg should go to T265549. It's too hard to follow this task when multiple proposals are discussed in parallel.

In a recent discussion in WikiProject Mathematics yet another rendering bug was encountered. Several users expressed the sentiment that SVG support in MediaWiki will never get fixed, and it is better to give up on them altogether and revert to PNGs. This is specially frustrating because the browsers can render the SVG correctly, but MediaWiki insists on passing it through librsvg and serving the resulting garbage instead.

Mathoid stopped serving PNG images for quite a while, and there were no complains, even though the SVG images from mathoid are quite special. Thus, we have one more data point that the browsers svg support is quite good.