Page MenuHomePhabricator

PDF on gu.wikisource only shows squares instead of characters
Closed, ResolvedPublic

Description

PDF generated from gu.wikisource

While exporting in PDF on gu.wikisource, fonts are not randered and hence, instead of characters only boxes are displayed in pdf (see attached). I checked another indic wikisource, to find out whether there is any issue with indic fonts in PDF format, but found Devnagari fonts displayed correctly in Marathi wikisource's PDF.


Version: REL1_19-branch
Severity: normal
See Also:
T39384: Modify Collection format variables from PDF to ODT for Gujarati Wikiprojects

Attached:

Details

Reference
bz35668

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:18 AM
bzimport added projects: Collection, I18n.
bzimport set Reference to bz35668.
bzimport added a subscriber: Unknown Object (MLST).

I guess this has something to do with fonts not being present in the PDF Server. Although bug 28206 might still affect creation of proper Indic books, but I think this bug is even more fundamental than that. I tried printing a Gujarati page on my wiki with $wgCollectionMWServeURL = "http://tools.pediapress.com/mw-serve/"; using the pediapress mw-serve, I got similar pdf with squares. I am still figuring my way setting up collection locally.

So i created a symlink as given below and was able to get Gujarati rendered by mw-render.

http://www.mail-archive.com/mwlib@googlegroups.com/msg01073.html

The fontconfig.py in mwlibrl contains reference to the below font. So please check if this font exists.

{'name': 'Sarai',
 'code_points': ['Gujarati', 'Devanagari'] ,
 'file_names': ['ttf-devanagari-fonts/Sarai_07.ttf'],
 },

If the above font is unavailable,Lohit Gujarati can be used which will be anyway present as part ttf-indic-fonts and below config needs to added to fontconfig.py in mwlibrl and rebuilt

{'name': 'Lohit Gujarati',
 'code_points': ['Gujarati'] ,
 'file_names': ['ttf-indic-fonts-core/lohit_gu.ttf'],
 },

That's very good news Srikanthlogic, seems the thing is moving somewhere at least.

One thing to note, please try not to use lohit fonts, as they are far from the natural gujarati script, more like devnagari and in books, that would be the last thing we would like to use. Don't know which fonts are the basic fonts, but that represents original gujarati script quite closer.

the following fonts are now installed on pdf1-3

ii ttf-bengali-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for the Bengali language
ii ttf-dejavu 2.23-1 Metapackage to pull in ttf-dejavu-core and ttf-dejavu-extra
ii ttf-dejavu-core 2.23-1 Vera font family derivate with additional characters
ii ttf-dejavu-extra 2.23-1 Vera font family derivate with additional characters
ii ttf-devanagari-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for languages using the Devanagari script
ii ttf-gujarati-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for the Gujarati language
ii ttf-indic-fonts 1:0.5.0-0ubuntu1 Metapackage for free Indian language fonts
ii ttf-indic-fonts-core 1:0.5.0-0ubuntu1 Core collection of free Indian language fonts
ii ttf-kannada-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for the Kannada language
ii ttf-malayalam-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for the Malayalam language
ii ttf-oriya-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for the Oriya language
ii ttf-punjabi-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for the Punjabi language
ii ttf-tamil-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for the Tamil language
ii ttf-telugu-fonts 1:0.5.0-0ubuntu1 Free TrueType fonts for the Telugu language

implemented in https://gerrit.wikimedia.org/r/#/c/7282/

volker.haas wrote:

The problem with the Gujarati script is two-fold:

a) The current configuration uses an unsuitable font for Gujarati (Sarai_07.ttf)

I have fixed this issue with https://github.com/pediapress/mwlib.rl/commit/ecbaa8b871621a08dc4136fd55d2387925039e95

Please note that I haven't updated the software on the servers because of the second issue.

b) The rendering engine mwlib is using to produce the PDFs is not capable to handle the complex character shaping/ligatures that indic scripts require. Therefore the final PDF is still broken (see the screen-shot I'll attach).

Fixing b) is unfortunately a very complex and time consuming task which involves a couple of unsolved technical problems and is therefore currently not on my agenda. One of the biggest problems is that I haven't found a PDF back-end that would meet all requirements.

sumanah wrote:

Volker, I think you haven't attached the sample screenshot of a broken PDF yet?

Volker, Thanks for the update. Agree that complex rendering would still be a dependency and might take some time to fix that. But as far as the font is concerned, Dhaval points out Lohit font is not good for reading on pdf. May be he could suggest alternatives from ttf-gujarati-fonts (http://packages.debian.org/lenny/all/ttf-gujarati-fonts/filelist) or any free licensed font which can be used in mwlib.rl

(In reply to comment #8)

.....Dhaval points out Lohit font is not good for reading on pdf. May be
he could suggest alternatives from ttf-gujarati-fonts
(http://packages.debian.org/lenny/all/ttf-gujarati-fonts/filelist) or any free
licensed font which can be used in mwlib.rl

I would suggest Raghu is the best fonts to use, aesthetically it is the most natural looking font. However, when we tested on Firefox, there was a rendering issue (see https://bugzilla.wikimedia.org/show_bug.cgi?id=33932 and http://crossbrowsertesting.com/users/34057/screenshots/zc6a1910ebcefa7d4d1c/public). If that's not going to affect us, Raghu is the best.

Btw, what is the other font that's currently used for Gujarati wikis, apart from Lohit? the default font? It would be the best to use that font, and if not then only think of Raghu. Most of the fonts in debian package are too artistic, and are good for headings, etc. but not for a whole book/page.

volker.haas wrote:

Comparison of Gurajati rendered in the browser and as PDF with mwlib

Attached:

gujarati_browser_vs_pdf.png (926×1 px, 693 KB)

(In reply to comment #10)

Created attachment 10599 [details]
Comparison of Gurajati rendered in the browser and as PDF with mwlib

Same issue of rendering fonts as was faced on firefox... I think the fonts used in pdf are Lohit, will it differ if we chosed a different font??

See http://www.jainlibrary.org/elib_master/jlib/004501_book_gujarati_21/Narsimha_Mahetana_Pado_004610_TOC.pdf for an example of Gujarati being correctly rendered in PDF.

Attached:

gujarati_browser_vs_pdf.png (926×1 px, 693 KB)

If you know of any additional Debian/Ubuntu font packages that should be installed on the PDF servers, feel free to tell us.

P.S. The link to github.com above gives me a 404

ralf_wikimedia wrote:

bugzilla is messing up the github links.

(In reply to comment #13)

bugzilla is messing up the github links.

Please file a report against product=Wikimedia / component=Bugzilla separately.

(In reply to comment #12 by Daniel Zahn)

P.S. The link to github.com above gives me a 404

Link works for me, now that Bugzilla bug 40344 is fixed.

As per comment 6 b), I currently don't see anything that could be solved by ops. Removing keyword.

Steps to reproduce the problem:

  1. Go to https://gu.wikisource.org/wiki/સૌરાષ્ટ્રના_ખંડેરોમાં
  2. Click પુસ્તક નિર્માતા નિષ્ક્રિય કરો in side pane
  3. Click પુસ્તક બતાવો (૧ પાનું)
  4. Choose ડાઉનલોડ: તમારું પુસ્તક ડાઉનલોડ કરવા શૈલી પસંદ કરો અને બટન પર ક્લિક કરો. શૈલી: e-book (PDF),
  5. Click ફાઇલ ડાઉનલોડ કરો

Just for the records, other Gujarati font packages included in Fedora:

  • kalapi-fonts
  • lohit-gujarati-fonts
  • samyak-gujarati-fonts

This appears to be fixed now... Can someone confirm?

Jdlrobson subscribed.

Is this still going to be an issue with the new Proton service?
Also might be fixed... https://phabricator.wikimedia.org/T37668#1292110 ?

TheDJ claimed this task.
TheDJ subscribed.

Results with Proton

Per that result and per T37668#1292110 i'm considering this fixed. New tickets can be filed separately if this is still not OK.