Page MenuHomePhabricator

PDF export: Use LaTeX formulas instead of inline images
Closed, InvalidPublic

Assigned To
Authored By
bzimport
Feb 20 2011, 12:06 AM
Referenced Files
F27191709: Satz_des_Pythagoras.pdf
Nov 11 2018, 7:58 PM
F15962351: 1362.pdf
Mar 22 2018, 1:13 PM
F11013902: current_rendering.pdf
Nov 26 2017, 6:26 PM
F11013899: how_it_should_look.pdf
Nov 26 2017, 6:26 PM
Tokens
"Like" token, awarded by He7d3r.

Description

Author: wikimedia

Description:
Say I have the page

https://secure.wikimedia.org/wikipedia/en/wiki/Septic_equation

then I create a PDF from it via "Print/export, Download as PDF".
In the generated document the formulas are obviously embedded as bitmap images.
How about using LaTeX formulas directly (that is enclosed in $..$)? What we have in <math>..</math> Wikipedia markup are actually LaTeX formulas.


Version: unspecified
Severity: enhancement

Details

Reference
bz27574

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:25 PM
bzimport added a project: Collection.
bzimport set Reference to bz27574.

volker.haas wrote:

This is not possible since the PDFs are generated using a toolkit that can not render LaTeX forumulas - therefore embedding them as images is the only solution.

wikimedia wrote:

Then I propose that PDFs are generated from LaTeX code that is generated from Wiki markup. I assumed, that it works already this way.

Hmm, well really it *ought* to be possible to render the latex to EPS and then embed that into the PDF output, or some such. I don't know how hard that'd be to plug into the current mwlib & friends export arch though (might still need bitmaps for other formats?)

wikimedia wrote:

Problems with embedding EPS: You have to ensure that font size of text and formulas match and you should certainly avoid that math fonts are embedded in every EPS formula.
All this is easily avoided by generating PDF from LaTeX in one go.

wikimedia wrote:

I have found a LaTeX style called 'wiki.sty' that allows to typeset (currently very simple) Wiki markup by LaTeX:
http://www.latex-community.org/index.php?option=com_content&view=article&id=279:wikipedia-markup-for-latex&catid=44:news-latex&Itemid=111

I'm reopening this as I agree that the bitmap math is not really very nice. It looks like the PDF output gets something circa 150dpi, which may be ok for on-screen reading (depending on how the viewer scales them), but looks pretty bad when zoomed in or blown up.

wikimedia wrote:

I found the project wb2pdf, that might solve the problem.
http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

hunniger wrote:

Well my tool (wb2pdf) didn't work yesterday because there was a change in the mediawiki software. Now I have taken this into account, thus it works and you can generate LaTeX documents from wikis using the tool, found at the link given above. Especially you get the formulas embedded as vector graphics in the pdf file.

Note that we're also looking into switching our main math rendering to using MathJax (bug 32696) which uses MathML or HTML+CSS to render lovely equations in client-side web goodness.

dirk.hunniger wrote:

My tool is now part of debian sid. The package is called mediawiki2latex. So this problem is actually solved. The question is do you want to integrate into mediawiki. Currently its a standalone command line version. A GUI version for windows is also provided.

Jdlrobson subscribed.

Is this still a problem? Something we can take care of easily in Electron?

The formulae in the PDF are not nicely integrated into the text, but at least readable. If https://bugs.chromium.org/p/chromium/issues/detail?id=152430 is resolved, we can improve that. So I think there is nothing that can be done from a chromium based rendering engine at the moment.

@ovasileva I took a look at the Spetic Equation article on production and generated a PDF to compare. They look the same to me. The equations are present in the PDF.

ovasileva claimed this task.
Debenben subscribed.

For me the Spetic Equation article looks horrible.


I know that it is very unfortunate that Chrome cannot handle MathML, but if we wait for it to be implemented, we probably have to wait forever. For the "how_it_should_look" I cheated a little, because I used Firefox. However what I did is

  • using a math font that roughly fits the text font
  • setting the scaling to 100% as opposed to 118% currently used in enWP
  • using MathJax with HTML-CSS output to render the <math> tags

In principle all of this should also be possible in Chrome.

This is mainly due to browsers converting SVGs to PDF poorly. However, this could be worked around by optimzing the mathjax-node configuration in mathoid.

@Pkra how exactly would you optimize the config? @mobrovac and I are now very actively working on the final update of the WMF forks to mathjax 2.7 and and mathjax node 1.0.

@Debenben we are not exactly waiting for the MathML implementation to happen. We will be starting to collect funds in 2018 to help MathML disabled browsers. I can share more details with you if you are interested in this topic...

I just got to know http://mediawiki2latex.wmflabs.org. That also works:

It contains one little glitch, where it appears to be unable to convert the unequal sign of the math template in en.wikipedia to latex notation, so the font-weight and spacing of the unequal sign is not perfect, but that is nothing compared to the rendering problems of the current setup.

People also may have different opinions about the layout of the "Polynomials and polynomial functions" navbox at the end of the article, but it is better than in the current rendering setup, because there it is not included at all.

@ovasileva I just created a pdf of https://de.wikipedia.org/wiki/Satz_des_Pythagoras and half of it is not shown at all and the other half still looks horrible:

Can you please put this back on the to-do list for the pdf-creator tool?

the other half still looks horrible:

FWIW, the overly thick equation rendering in PDF should show fine when printd (not that that helps PDF users); it could also be fixed completely by changing the mathjax configuration slightly (as mentioned on other threads).

Hmm, it seems that the renderer prints before the page is finished downloading all the images for the page is finished.

Regardless, this ticket seems unrelated to the problems in that PDF. This ticket is about using native amsmath in the PDF renderer. But since the PDF renderer is no longer based on LaTeX, that isn't even possible. Please file separate tickets.

Given @TheDJ 's responsne, this issue seems invalid.

I also cannot reproduce the issue with the example @Debenben gave (even though I remember being able to do so).

Once we change the mathajx configuration, the SVG output should also actually look good in PDF (not just look good in actual print which it already should).