Page MenuHomePhabricator

Han characters in SVG files misplaced and clustered
Open, Stalled, LowPublic

Description

Author: winstonyin

Description:
Several Wikimedia fonts -- including some default fonts -- do not support vertical writing for Han scripts.

Centre-aligned Han texts are misplaced, and tb-rl vertical Han texts is completely clustered into one blob. The former problem is not seen with Latin script texts. SVG files generated using AI or Inkscape have this issue.

Examples:
Problematic Chinese version:

English version, no problem (note the vertical text is rotated, not tb-rl):
*https://commons.wikimedia.org/wiki/File:History_of_the_Universe.svg
Latin characters no problem
*https://commons.wikimedia.org/wiki/File:SFR_Yugoslavia_autoput_sr.svg

Partly addressed by https://bugzilla.gnome.org/show_bug.cgi?id=664533

https://gitlab.gnome.org/GNOME/librsvg/issues/364 "Top-to-bottom text is rendered incorrectly"


Version: wmf-deployment
Severity: normal

Details

Reference
bz63236

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:55 AM
bzimport set Reference to bz63236.
bzimport added a subscriber: Unknown Object (MLST).

Thanks for taking the time to report this! Might need fixing in librsvg which is used for this, but not maintained by Wikimedia maintainers.

@Shizhao, per your action, is this also happened on zhwiki?

If yes, a screenshot would be much helpful

@Shizhao: I don't see anything specific about Wikimedia Commons in this task. Hence removing the Commons tag. If there is, please elaborate.

The following image is also problematic in the Commons:
https://commons.wikimedia.org/wiki/File:2014%E5%A4%A7%E8%BF%9E%E7%BB%B4%E5%9F%BA%E4%BA%BA%E6%98%A5%E5%AD%A3%E8%80%83%E5%AF%9F%E5%9B%A2%E6%B5%B7%E6%8A%A5%E7%AB%96%E7%89%88.svg
Later I saw GIMP repositery closed a bug recently for similar problem
https://gitlab.gnome.org/GNOME/gimp/issues/2518
In this bug, an user reported WenQuanYi Zen Hei font makes Chinese characters overlapped into one piece if he converted to vertical text layout, then the developer pointed that this font has broken vertical metrics, so they closed it. If we are using this font too, I suggest we should replace it by other approciate font such as Source Han Sans.

The following image is also problematic in the Commons:

For non-Chinese speakers, any chance to explain / show how the image should look in order to not be problematic?

For the records, the SVG file at https://commons.wikimedia.org/wiki/File:2014大连维基人春季考察团海报竖版.svg defines font-family="'FZYXJW--GB1-0'".

For non-Chinese speakers, any chance to explain / show how the image should look in order to not be problematic?

OK, in this image you'll see how this image should looks like:
https://commons.wikimedia.org/wiki/File:2014%E5%A4%A7%E8%BF%9E%E7%BB%B4%E5%9F%BA%E4%BA%BA%E6%98%A5%E5%AD%A3%E8%80%83%E5%AF%9F%E5%9B%A2%E6%B5%B7%E6%8A%A5%E7%AB%96.png
The font-family properties refered a non-free font made by Founder Technology (because the font name has "FZ" prefix, which stand for Chinese 方正 Fangzheng), which shouldn't be appeared.

Ah, thanks! I did not compare well, because this is how it looks when I open that image in inkscape-0.92.3-5 using librsvg2-2.44.8-1:

Screenshot from 2018-12-05 14-49-59.png (692×524 px, 26 KB)

Ah, thanks! I did not compare well, because this is how it looks when I open that image in inkscape-0.92.3-5 using librsvg2-2.44.8-1:

In Inkscape 0.92.3 (2405546, 2018-03-11) on Windows 10 it looks like:

2014大连维基人春季考察团海报竖版.png (2×1 px, 192 KB)

with the command:
inkscape 2014大连维基人春季考察团海报竖.svg --export-png=2014大连维基人春季考察团海报竖.png

@Aklapper : Does Inkscape use librsvg?
@RazrFalcon Said Inkscape has it's own rendering backend.
https://phabricator.wikimedia.org/T40010#4443804

@Great_Brightstar :
I used the current (02:11, 21. Mär. 2014 idenical to first) svg:
https://upload.wikimedia.org/wikipedia/commons/archive/5/5a/20140321020128%212014%E5%A4%A7%E8%BF%9E%E7%BB%B4%E5%9F%BA%E4%BA%BA%E6%98%A5%E5%AD%A3%E8%80%83%E5%AF%9F%E5%9B%A2%E6%B5%B7%E6%8A%A5%E7%AB%96%E7%89%88.svg

The svg uses: font-family="'FZYXJW--GB1-0'"
I don't know which fallbackfont inkscape used, and I don't know how to find out (except of trying and comparing resulsts).

@Aklapper : Does Inkscape use librsvg?

Uh! librsvg is not listed as a dependency of inkscape. I never realized that! Thanks a lot!
Alright, so with librsvg2-2.44.8-1 (that is a newer version which is NOT yet on Wikimedia servers, see https://phabricator.wikimedia.org/T193352 ):

Screenshot from 2018-12-05 19-06-20.png (957×600 px, 94 KB)

I don't know which fallbackfont inkscape used, and I don't know how to find out (except of trying and comparing resulsts).

The most simple way to find out is opening %windir%/Fonts directly, where will show you the preview of fonts, then pick up which fonts could be used.

@Aklapper : Does Inkscape use librsvg?

Uh! librsvg is not listed as a dependency of inkscape. I never realized that! Thanks a lot!
Alright, so with librsvg2-2.44.8-1 (that is a newer version which is NOT yet on Wikimedia servers, see https://phabricator.wikimedia.org/T193352 ):

Screenshot from 2018-12-05 19-06-20.png (957×600 px, 94 KB)

The vertical text layout looks still not so good, it would be nice if librsvg has some entries to configure the fonts.

JoKalliauer changed the task status from Open to Stalled.May 16 2021, 12:59 PM
JoKalliauer added a project: Upstream.

For https://commons.wikimedia.org/wiki/File:History_of_the_Universe-zh-hant.svg , the SVG file itself is incorrect.
See comment in https://gitlab.gnome.org/GNOME/librsvg/issues/364 :

I'm afraid this file is incorrect; it has writing-mode inside <tspan>, but per https://gitlab.gnome.org/GNOME/librsvg/-/issues/565 that property should only be applied to <text>.

For https://commons.wikimedia.org/wiki/File:2014大连维基人春季考察团海报竖版.svg the writing mode is part of <text> as it should be.
The SVG is rendered correctly when using e.g. librsvg2 2.54.5, a version much newer than on Wikimedia servers.

Another possible solution is integrating Source Han Sans into Wikimedia server as replacement to WenQuanYi Zen Hei, which is somewhat buggy for vertical text layout (I can reproduce with Inkscape and LibreOffice Writer). (T336684)

The file https://commons.wikimedia.org/wiki/File:History_of_the_Universe-zh-hant.svg has font-family="SimHei".

writing-mode="tb-rl" is only set in text and g elements.

The file https://commons.wikimedia.org/wiki/File:History_of_the_Universe-zh-hant_(2).svg has font-family="AR PL UKai TW".

T335361 indicates WMF is using server: Thumbor/6.3.2 (librsvg 2.40) and server: Thumbor/7.3.2 (librsvg 2.44?).

Attempt rendering of zh-hant.svg. Thumbor/7.3.2 displays incorrectly / overwrites the vertical Chinese.

Attempt renderings of zh-hant (2). Thumbor/6.3.2 displays correctly.

Change zh-hang.svg to use font-family="AR PL UKai TW, SimHei". Now displays correctly.

So Commons font is bad.

The file https://commons.wikimedia.org/wiki/File:2014%E5%A4%A7%E8%BF%9E%E7%BB%B4%E5%9F%BA%E4%BA%BA%E6%98%A5%E5%AD%A3%E8%80%83%E5%AF%9F%E5%9B%A2%E6%B5%B7%E6%8A%A5%E7%AB%96%E7%89%88.svg uses font-family="'FZYXJW--GB1-0'". Also has glyph-orientation-vertical="0" writing-mode="tb". (Strangely, the text elements are also stroked; that is an unusual practice and makes the characters look odd.)

If that font is not on Commons, then the default font on Commons does not do vertical Chinese writing.

Change the file to use AR PL UKai TW. File displays correctly.

The file https://commons.wikimedia.org/w/index.php?lang=qct&title=File%3AHistory_of_the_Universe_%28multilingual%29.svg uses the default font and displays the langtags qcs and qct correctly. Says the default font displays Chinese correctly.

UPDATE: the file has font-family="DejaVu Sans, Liberation Sans, Arial, sans-serif", so it may not use the default font. DejaVu Sans and Liberation Sans are in fc-list.

UPDATE: the vertical Chinese text tests for a requiredFeature. If that feature is not present, then it inserts letter-spacing="1.2em". The file is hacked and therefore not a good test.

SimHei and FZYXJW--GB1-0 are not in the font list copied at T280718 or the current https://noc.wikimedia.org/conf/fc-list

So that suggests the default font displays incorrectly?

Yes, I believe so. As Noto CJK fonts are already available on our server, it's reasonable to update the list.

I do not see NotoSansCJK in the fc-list.

We should test whether WenQuanYi Zen Hei is the default Chinese font....

Really? I uploaded some SVG images in https://test.wikipedia.org/wiki/File:Test_ground.svg, Set font face as Noto Sans CJK or Noto Serif CJK to upload, then they're really works.

And I can also reproduced with https://commons.wikimedia.org/wiki/File:%E5%85%83%E7%B4%A0%E8%B1%90%E5%BA%A6.svg

Many Noto fonts are in the fc-list, but NotoSansCJK is not Even "CJK" is not in the fc-list.

I just tried several fonts at https://commons.wikimedia.org/wiki/File:SVG_Test_Vertical_Chinese_Fonts.svg

Generic sans-serif and serif fail; monospace works.

There are Noto Sans CJK fonts (so list is out of date).

T280718: Re-evaluate whether keeping around https://noc.wikimedia.org/conf/fc-list is a good practive

List is not current?

@Glrx: According to T65236#8458679 the SVG-file has been fixed, could you provide an image that currently provides this issue and edit the task-description. (Otherwise I find it unclear&confusing.)

@Dzahn: Could you provide the fc-list as you already did in T280718#7025405 ?

@JoKalliauer Sure, I can paste the current output but that's about all I have. I have no knowledge why it's nowadays much smaller:

[mw2300:~] $ fc-list :fontformat=TrueType
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraMoBI.ttf: Bitstream Vera Sans Mono:style=Bold Oblique
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraBI.ttf: Bitstream Vera Sans:style=Bold Oblique
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraBd.ttf: Bitstream Vera Sans:style=Bold
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraSe.ttf: Bitstream Vera Serif:style=Roman
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraMoBd.ttf: Bitstream Vera Sans Mono:style=Bold
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraSeBd.ttf: Bitstream Vera Serif:style=Bold
/usr/share/fonts/truetype/ttf-bitstream-vera/Vera.ttf: Bitstream Vera Sans:style=Roman
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraMono.ttf: Bitstream Vera Sans Mono:style=Roman
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraIt.ttf: Bitstream Vera Sans:style=Oblique
/usr/share/fonts/truetype/ttf-bitstream-vera/VeraMoIt.ttf: Bitstream Vera Sans Mono:style=Oblique

If you need more, kindly contact this workboard (put the ticket on that tag and/or contact the members there): https://phabricator.wikimedia.org/project/view/3775/

Thumbor uses the following font packages:
https://phabricator.wikimedia.org/diffusion/THMBREXT/browse/master/.pipeline/blubber.yaml$31

The fc-list script might be looking in the wrong place.

@Glrx: According to T65236#8458679 the SVG-file has been fixed, could you provide an image that currently provides this issue and edit the task-description. (Otherwise I find it unclear&confusing.)

@Dzahn: Could you provide the fc-list as you already did in T280718#7025405 ?

For a test file, use

That file shows that most generic fonts DO NOT WORK correctly. From the comments and tests above, the bug is due to the default Chinese fonts lacking support for vertical mode writing. If a font that has that support (such as Noto Sans CJK SC or AR PL UKai TW) is used, then the vertical text is rendered correctly.

The fc-list is not the whole story. It only tries to report the available fonts. It does not report other font configuration details.

The right goal is to set default fonts that support vertical writing. WenQuanYi Zen Hei may be the current default font; it should not be used. A simple approach would be to use the Noto Sans CJK fonts as the default Chinese fonts. Debian may configure several vertically challenged Chinese fonts as defaults.

In addition, the font substitution list should be checked (if it is not the same thing). It is responsible for mapping commercial font names such as Arial to reasonable facsimiles. For example, a font such as "SimHei" should not map to the faulty "WenQuanYi Zen Hei" font.

What does /etc/fonts/fonts.conf look like for the WMF distribution?

For a workaround, edit individual files showing an issue to use a font that works (such as one of the Noto Sans CJK XX fonts.

.see

Winston_Sung renamed this task from Chinese text in SVG files misplaced and clustered to Han characters in SVG files misplaced and clustered.Jun 10 2023, 1:30 AM
Winston_Sung updated the task description. (Show Details)