Page MenuHomePhabricator

Many non-latin fonts don't cover the latin character set
Closed, DeclinedPublic

Description

The XeLaTeX backend for the new OCG PDF renderer does not fallback if the font selected for a given language does not contain a given codepoint. For many possible fonts for a given language, the latin code pages are not included. This makes page numbers, dates, citation numbers, and even bullets in lists render as tofu (blank square boxes).

The latex renderer should keep track of which code pages are present in a font, and add explicit font-switch commands to the output when needed. (Including redefining the command used for bullets in lists to ensure it is rendered in a latin font.)

Ideally we could automatically generate coverage tables from a font. But at *least* we should treat the Latin codepoints as a special case, and fall back to the default latin font when latin codepoints are used in a font without latin codepage coverage.

(This has been an issue for Russian, Indic languages, the google noto fonts, etc, and accounts for most of the present difficultly in choosing an appropriate font for a particular wiki language.)


Version: unspecified
Severity: normal

Details

Reference
bz68922

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:40 AM
bzimport added projects: Collection, I18n.
bzimport set Reference to bz68922.

Change 151360 had a related patch set uploaded by Cscott:
Use Lohit fonts when possible.

https://gerrit.wikimedia.org/r/151360

Change 151360 merged by jenkins-bot:
Use Lohit fonts when possible.

https://gerrit.wikimedia.org/r/151360

The above patches partially fix the problem -- they switch to the default latin font for latin code pages. But we should really enumerate the full set of code points mapped by a font. That's part two of fixing this bug.

Note that the default latin font doesn't cover the ~ character (!), which is used in https://en.wikipedia.org/wiki/Moon#Internal_structure in the sentence, "this is only ~20% the size of the Moon, in contrast to the ~50% of most other terrestrial bodies".

This sort of font hardcoding really doesn't scale... https://gerrit.wikimedia.org/r/#/c/151360/1/lib/index.js,cm

Is it really impossible to load the appropriate fonts as installed on the server (we already install them for EasyTimeline etc. (e.g. bug 20825)?
See also the ULS fontrepo: https://git.wikimedia.org/tree/mediawiki%2Fextensions%2FUniversalLanguageSelector/HEAD/data%2Ffontrepo

@cscott: Any news (as you're set as assignee)? Still "high priority"?