Page MenuHomePhabricator

use mediawiki's HTML output
Closed, ResolvedPublic

Description

Author: ralf_wikimedia

Description:
the collection extension should generate PDFs from mediawiki's HTML
output instead of using a custom parser and relying on mediawiki's
broken expand templates feature.
That would fix all bugs related to parsing mediawiki markup and expanding templates.


Version: unspecified
Severity: normal

Details

Reference
bz47867

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:23 AM
bzimport added a project: Collection.
bzimport set Reference to bz47867.
bzimport added a subscriber: Unknown Object (MLST).

I recommended this a few years ago, but we went with the second parser solution as it was already in development.

Using a headless WebKit browser to generate PDFs is fairly straightforward, but I'm not sure how best to handle combining multiple articles together etc. This wouldn't be a trivial project, but would be nice to investigate.

Parsoid HTML with RDFa might be a better starting point for print-specific customization as it contains a lot of semantic information. See
http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec.

Another potentially relevant project would be http://bookjs.net/, a JS library that prepares HTML content for printing in WebKit. It did not work in my testing and seems to be pretty cutting-edge, but there are probably ways to make it work as it is used by the booktype project.

Update re book.js: The demo at http://bookjs.net/data/body.html does not work for me, but the demos in the git checkout work both in Chromium 28 and Chrome 29:

git clone https://github.com/sourcefabric/BookJS.git

(In reply to comment #4)

Is this fixed by the new PDF renderer?

I'd say yes. I'll leave the pleasure of closing this bug to the PDF team though ;)

The new PDF renderer isn't deployed yet; maybe we should wait until then?

mwalker wrote:

With the 'public' release of the OCG renderer; I'm going to close this bug.