use mediawiki's HTML output
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• bzimport
	Apr 30 2013, 11:09 AM

Description

Author: ralf_wikimedia

Description:
the collection extension should generate PDFs from mediawiki's HTML
output instead of using a custom parser and relying on mediawiki's
broken expand templates feature.
That would fix all bugs related to parsing mediawiki markup and expanding templates.

Version: unspecified
Severity: normal

Details

Reference: bz47867

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Aklapper	T49575 Generating PDFs for books with more than one page fails horribly at pl.wikisource
		Resolved		None	T49867 use mediawiki's HTML output

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 22 2014, 1:23 AM

• bzimport added a project: Collection.

• bzimport set Reference to bz47867.

• bzimport added a subscriber: Unknown Object (MLST).

• bzimport created this task.Apr 30 2013, 11:09 AM

I recommended this a few years ago, but we went with the second parser solution as it was already in development.

Using a headless WebKit browser to generate PDFs is fairly straightforward, but I'm not sure how best to handle combining multiple articles together etc. This wouldn't be a trivial project, but would be nice to investigate.

Parsoid HTML with RDFa might be a better starting point for print-specific customization as it contains a lot of semantic information. See
http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec.

Another potentially relevant project would be http://bookjs.net/, a JS library that prepares HTML content for printing in WebKit. It did not work in my testing and seems to be pretty cutting-edge, but there are probably ways to make it work as it is used by the booktype project.

Update re book.js: The demo at http://bookjs.net/data/body.html does not work for me, but the demos in the git checkout work both in Chromium 28 and Chrome 29:

git clone https://github.com/sourcefabric/BookJS.git

Is this fixed by the new PDF renderer?

Re-implementing PDF support

http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073059.html

Status update on new Collections PDF Renderer

http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073238.html

(In reply to comment #4)

Is this fixed by the new PDF renderer?

I'd say yes. I'll leave the pleasure of closing this bug to the PDF team though ;)

The new PDF renderer isn't deployed yet; maybe we should wait until then?

mwalker wrote:

With the 'public' release of the OCG renderer; I'm going to close this bug.

use mediawiki's HTML outputClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

use mediawiki's HTML output
Closed, ResolvedPublic
Actions

Related Objects
Search...