Page MenuHomePhabricator

Lua modules are ignored during PDF export
Closed, ResolvedPublic

Description

It appears that Lua modules are completely ignored when using Download as PDF.

Whether that requires a change to Scribunto or a change to Special:Book, I don't know. At present the conversion of templates to Lua is causing the corresponding information to be lost in the PDF exports.


Version: master
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=50178
https://bugzilla.wikimedia.org/show_bug.cgi?id=50750
https://bugzilla.wikimedia.org/show_bug.cgi?id=48052

Details

Reference
bz45861

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 1:27 AM
bzimport added projects: Collection, Upstream.
bzimport set Reference to bz45861.

The PDF exporter uses its own parser rather than MediaWiki's, and so probably needs to be updated somehow. See for example bug 39095 and bug 41748, which seem to be similar sorts of issues.

I contacted Volker Haas and Ralf Schmitt last August, explaining that Lua support would be required in mwlib, and giving an outline of the approach they might take to implement it. They didn't seem to be in any hurry.

The deadline has passed. Can we just disable the Collection extension for now?

(In reply to comment #2)

Can we just disable the Collection extension for now?

This seems reasonable. Or at least add a warning to users so that their PDFs are not silently missing pieces.

(In reply to comment #1)

The PDF exporter uses its own parser rather than MediaWiki's [...]

Yeah, this is pretty wtf to me. Unless there's a really, really good reason, a separate parser shouldn't be used here.

volker.haas wrote:

Can someone please post a link to a minimal example where lua modules fail. Unfortunately the two bugs mentioned above are not related to mwlibs inability to handle lua.

Thanks

ralf_wikimedia wrote:

we are working on a solution for this

Would it be possible to get a rough timeline for how long you believe it may take for this bug to be solved and a fix deployed? Days? Weeks? More than a month?

Enwiki is essentially ready to start deploying Lua-based citations; however, if we migrate the citation templates while this bug is still open, then all of the references are going to start disappearing from the PDF exports.

For coordinating purposes it would be nice to have some idea of whether the PDF engine is likely to be fixed quickly or not.

This question should be asked when deployment of Scribunto was pondered, not now. In my opinion it is not easy thing and nothing that can be done withing days.

I don't think that disabling Collection "for now" in exchange for new cool feature is a good thing (recently stumbled into really good PDF stuff made by community on different projects).

Marcin, Tim said above that the PDF team was notified in August. Lua is not a surprise, and there are very important reasons for it.

ralf_wikimedia wrote:

To be fair, I asked Tim for a schedule for deployment. What I got back
is his comment above asking to disable the collection extension
because the deadline has passed.

We currently do plan to replace our own template expansion with a call
to api.php with action=expandtemplates. As MZMcBride noticed this is
still pretty wtf, but it would give him a chance to bitch about
mediawiki's parser itself, as that one intermingles template expansion
with parsing, i.e. for certain pages the expanded page does parse
differently from what you get if you expand and parse in one step.

I'm not sure at the moment if we're allowed to use
action=expandtemplates, or if that will bring the whole of wikipedia
down cause it overloads the cluster.

If we're allowed to do that, I guess it's a matter of days until this
issue is resolved.

(In reply to comment #10)

To be fair, I asked Tim for a schedule for deployment. What I got back
is his comment above asking to disable the collection extension
because the deadline has passed.

In August, the rollout schedule had not been decided. In September, Heiko again asked for a schedule. In October, Rob Lanphier replied to Heiko, saying that it had already been rolled out to mediawiki.org at that time, and that it would be progressively deployed to more wikis. That rollout is still in progress.

The right time to start development work was in August, which is why I sent my email at that time.

I'm not sure at the moment if we're allowed to use
action=expandtemplates, or if that will bring the whole of wikipedia
down cause it overloads the cluster.

You can use it. It's unlikely that there would be any problems. A CPU overload would not bring the site down, since API requests go to a separate Apache cluster. At most you would bring down the API.

ralf_wikimedia wrote:

ok, so I'm going to implement the action=expandtemplates thing. I guess we can still disable the collection extension, when the API request cluster is down.

Another thing that's not going to work then is template blacklisting/print templates. volker took a look at that some time ago, and it should not have a big effect on pdf output. note that some elements will still be filtered via the class=noprint attribute.

Thanks Ralf. Robert, can you hold off on the Cite-to-Lua migration at least until 3/18 or so? Ralf, please let us know if you run into any issues with your attempted solution and if we can help. This is a rather urgent issue and we'd prefer to avoid disabling the PDF feature (especially the sidebar link receives very substantial usage).

ralf_wikimedia wrote:

do I have to sync with anyone when deploying a new version?

Ralf: Let me know when you have something ready (ie: something in Gerrit and reviewed) and we (you and I) can coordinate a deploy slot. I've noted this issue on the deployments calendar (https://wikitech.wikimedia.org/wiki/Deployments) in the to be scheduled section.

Thanks.

Not sure there is anything to be changed in the Collection extension code.

I think what needs to be done is to update mw-lib (https://github.com/pediapress/)
and the code on the rendering server the PediaPress is running. So I think at least some communication should be prepared towards the non-tech wiki community that changes are coming.

I'm happy to help with the deployment and/or handling bug reports/issues/whatever comes after this.

(In reply to comment #16)

Not sure there is anything to be changed in the Collection extension code.

I think what needs to be done is to update mw-lib
(https://github.com/pediapress/)
and the code on the rendering server the PediaPress is running.

In that case, it seems appropriated to tag this with the 'upstream' keyword.

ralf_wikimedia wrote:

we need to update mwlib on the pdf cluster. this is something I can do on my own.
but mwlib will then call api.php with different parameters and there's a chance that will have a negative effect on some of the machines. we've been through this when we changed the size of images we fetch, and I'd prefer not to repeat this rather unpleasant experience.

ralf_wikimedia wrote:

the new version is ready to be deployed.

(In reply to comment #16)

[...] So I think at least some communication should be prepared towards the
non-tech wiki community that changes are coming.

What changes are relevant to the non-tech wiki community?

(In reply to comment #18)

we need to update mwlib on the pdf cluster. this is something I can do on my
own.
but mwlib will then call api.php with different parameters and there's a
chance that will have a negative effect on some of the machines. we've been
through this when we changed the size of images we fetch, and I'd prefer not
to repeat this rather unpleasant experience.

Right. So I think a few things should happen here:

  1. Ensure that the API requests are using an identifiable (preferably versioned) User-Agent string (cf. [[m:User-Agent policy]]).
  1. Coordinate with either Erik, Tim, or some other Wikimedia Foundation technical staffer to schedule a deployment slot (time window) of this new code, while preparing for the possibility of needing to roll back the new code, if necessary.
  1. Stay on IRC (#wikimedia-operations on irc.freenode.net) during the deployment and for a few hours subsequent to the deployment to ensure that the increased API load isn't problematic.

ralf_wikimedia wrote:

here's the changelog: http://mwlib.readthedocs.org/en/latest/changelog.html

I don't know what the "non-tech wiki community" is, so I don't know which of those changes are relevant.

the user agent being used should be mwlib.

This is very good news.

current mwlib rendering is completely (well, almost completely: enough to be unusable) borked on RTL wikis. iirc, someone said once something about "not having access to appropriate free hebrew fonts", but the whole page was mangled with RTL, regardless of the super-ugly fonts. i sure hope that switching to API rendering will fix this too.

peace.

(In reply to comment #20)

  1. Coordinate with either Erik, Tim, or some other Wikimedia Foundation

technical staffer to schedule a deployment slot (time window) of this new
code,
while preparing for the possibility of needing to roll back the new code, if
necessary.

Hi, your friendly Release Manager here :)

Please coordinate with me. The current Deployments calendar is available here:

https://wikitech.wikimedia.org/wiki/Deployments

Please tell me what time slot works for you that is available and has good coverage in Pacific timezone (so we have Ops/Platform teams available as needed).

  1. Stay on IRC (#wikimedia-operations on irc.freenode.net) during the

deployment and for a few hours subsequent to the deployment to ensure that
the
increased API load isn't problematic.

Agreed.

kipod, that sounds like a different bug. Please file if there's not already one.

ralf_wikimedia wrote:

I've logged into #wikimedia-operations as schmir.

(In reply to comment #24)

kipod, that sounds like a different bug. Please file if there's not already
one.

it is, of course, a different bug (didn't look too deeply, but i'll be very surprised if it's not reported at least once).

i just expressed hope that switching from proprietary parsing to API parsing will squash this one "en passant". if it won't, i'll make sure a separate bug report exists.

peace.

ralf_wikimedia wrote:

My first attempt to upgrade mwlib failed. I had to fix another bug, which I introduced in mwlib 0.15. I had hoped to upgrade the code right now, but greg says that's not possible.

Can you offer us some deployment windows?

ralf_wikimedia wrote:

forget that last message. I've deployed the new mwlib code a few minutes ago.

The test case at https://en.wikipedia.org/wiki/User:BJorsch_(WMF)/Bug_45861 seems to work now. Can folks test with a few more complex cases?

Lua tests:

http://en.wikipedia.org/wiki/Bays_of_the_Philipines

The presentation of the Lua-based coordinates is somewhat malformed, but still usable (whereas before they were simply absent).

http://en.wikipedia.org/wiki/Climate_of_Russia

The Lua-based climate boxes look great in PDF, but all the references embedded in the templates are absent from the References section.

http://en.wikipedia.org/wiki/List_of_longest_bridges_in_the_world

The Lua-based number formatting and conversions look perfect. However some of the non-Lua based references seem incorrect (e.g. #54).

Complex, non-Lua test:

http://en.wikipedia.org/wiki/Barack_Obama

On wiki there are 335 numbered notes, the PDF skips many of the numbers so that only 58 are actually present, and the numbers that are present don't match the wiki numbering. To be clear this is testing the Barack Obama article with _traditional template-based citations_. So whatever change was made appears to have caused problems with normal article rendering.

So altogether these changes look like good progress towards handling Lua, BUT there also seem to be negative impacts the presentation of other unrelated page elements, especially references. This includes the traditional {{cite}} template-based references.

volker.haas wrote:

(In reply to comment #30)

Lua tests:

http://en.wikipedia.org/wiki/Bays_of_the_Philippines

The presentation of the Lua-based coordinates is somewhat malformed, but
still
usable (whereas before they were simply absent).

I fixed this with https://github.com/pediapress/mwlib/commit/fbce5a01c3b05b5f6aea6ee38bdf7acf76b44719

http://en.wikipedia.org/wiki/Climate_of_Russia

The Lua-based climate boxes look great in PDF, but all the references
embedded
in the templates are absent from the References section.

The problem is that the expandTemplates API call returns with some unexpanded templates:

Minimal Example: http://en.wikipedia.org/wiki/User:Volker.haas/TemplateExpansion

Result: https://en.wikipedia.org/w/api.php?action=expandtemplates&format=jsonfm&text={{User:Volker.haas/TemplateExpansion}}

Can this be fixed on the mediawiki end?

http://en.wikipedia.org/wiki/List_of_longest_bridges_in_the_world

The Lua-based number formatting and conversions look perfect. However some
of
the non-Lua based references seem incorrect (e.g. #54).

This is the same problem as described above: unexpanded templates after an expandTemplates mediawiki API call.

Complex, non-Lua test:

http://en.wikipedia.org/wiki/Barack_Obama

On wiki there are 335 numbered notes, the PDF skips many of the numbers so
that
only 58 are actually present, and the numbers that are present don't match
the
wiki numbering.

I haven't checked that thoroughly but I am pretty sure it is the same problem as above.

Sidenote about the numbering being off (despite the bug that is present): In order to present links in an unobtrusive way they are also put in the reference section of the PDF. This is the cause that the numbering of references in the article and the PDF is different (whoever, in the PDF the numbering should be "correct")

To be clear this is testing the Barack Obama article with
_traditional template-based citations_. So whatever change was made appears
to
have caused problems with normal article rendering.

So altogether these changes look like good progress towards handling Lua, BUT
there also seem to be negative impacts the presentation of other unrelated
page
elements, especially references. This includes the traditional {{cite}}
template-based references.

Thanks for the work on this!

(In reply to comment #32)

http://en.wikipedia.org/wiki/Climate_of_Russia
The Lua-based climate boxes look great in PDF, but all the references
embedded in the templates are absent from the References section.

The problem is that the expandTemplates API call returns with some unexpanded
templates:

Minimal Example:
http://en.wikipedia.org/wiki/User:Volker.haas/TemplateExpansion

Result:
https://en.wikipedia.org/w/api.
php?action=expandtemplates&format=jsonfm&text={{User:Volker.haas/
TemplateExpansion}}

Can this be fixed on the mediawiki end?

Would be good to file a separate ticket for this, so it doesn't get lost.
This report is about code changes in the "Collection" extension.

(In reply to comment #33)

Thanks for the work on this!

(In reply to comment #32)

http://en.wikipedia.org/wiki/Climate_of_Russia
The Lua-based climate boxes look great in PDF, but all the references
embedded in the templates are absent from the References section.

The problem is that the expandTemplates API call returns with some unexpanded
templates:

Minimal Example:
http://en.wikipedia.org/wiki/User:Volker.haas/TemplateExpansion

Result:
https://en.wikipedia.org/w/api.
php?action=expandtemplates&format=jsonfm&text={{User:Volker.haas/
TemplateExpansion}}

Can this be fixed on the mediawiki end?

Would be good to file a separate ticket for this, so it doesn't get lost.
This report is about code changes in the "Collection" extension.

It wouldn't surprise me if there's already a bug about it somewhere. Note the problem is in the parser, not the API.

The problem is that the underlying parser logic behind expandtemplates treats anything inside a tag hook's tags (e.g. between "<ref>" and "</ref>") as opaque text to be handled by the tag hook's callback. Some of those callbacks turn right around and parse the contents, but that happens at a later stage than where expandtemplates stops.

volker.haas wrote:

Thanks for the suggestions! I created a new ticket (despite searching I couldn't find a prior ticket with that issue):

https://bugzilla.wikimedia.org/show_bug.cgi?id=46115

Alright, since it seems this bug is now done, I'm resolving it. Let's work on any other lingering issues with templates and PDF export in separate/new bugs.

Thanks all!

i do not think it's done - Reopening.
(this is done based on the assumption that the fix is already deployed on enwiki. if it isn't deployed yet, please close again and let me know, so i can slap myself).

See page

https://en.wikipedia.org/wiki/User:%D7%A7%D7%99%D7%A4%D7%95%D7%93%D7%A0%D7%97%D7%A9/sandbox

(if this page, which is a user sandbox, will lose its content anytime soon: the problematic content is a single {{chess diagram-fen}} on enwiki. no other content on the page)

the page displays an image of chessboard and chess pieces, in game-opening stance. it's created by Lua, by placing several div-wrapped images on top of another image, using z-index and "position:absolute" in the style of the divs wrapping the pieces.

the page shows the chess board with the pieces in opening stance.
the pdf seem to ignore image sizes (i.e., it scales the board image using different scaling factor than the one used for the pieces), and ignore the "position:absolute" (hence also the "top" and "left") style attributes of the divs wrapping the pieces.

as a side, i'll mention that this is an improvement over yesterday (or was it a day before?) when the pdf entirely missed all the images that were placed on the page by Lua code.

peace.

volker.haas wrote:

(In reply to comment #37)

i do not think it's done - Reopening.
(this is done based on the assumption that the fix is already deployed on
enwiki. if it isn't deployed yet, please close again and let me know, so i
can
slap myself).

See page

https://en.wikipedia.org/wiki/User:
%D7%A7%D7%99%D7%A4%D7%95%D7%93%D7%A0%D7%97%D7%A9/sandbox

(if this page, which is a user sandbox, will lose its content anytime soon:
the
problematic content is a single {{chess diagram-fen}} on enwiki. no other
content on the page)

the page displays an image of chessboard and chess pieces, in game-opening
stance. it's created by Lua, by placing several div-wrapped images on top of
another image, using z-index and "position:absolute" in the style of the divs
wrapping the pieces.

the page shows the chess board with the pieces in opening stance.
the pdf seem to ignore image sizes (i.e., it scales the board image using
different scaling factor than the one used for the pieces), and ignore the
"position:absolute" (hence also the "top" and "left") style attributes of the
divs wrapping the pieces.

as a side, i'll mention that this is an improvement over yesterday (or was
it a
day before?) when the pdf entirely missed all the images that were placed on
the page by Lua code.

peace.

The bug you describe is not related to lua templates. The PDF rendering engine ignores the position attribute of tags - therefore the output is probably not what you would expect.

if i understand correctly, you say that the same problem will appear whenever style="position:something" is used. i was surprised to hear that - we use "position:absolute" in more than one place: specifically for "pushpins" in map displays.

i tried the PDF of [[en:Buenos Aires]], and found (to my dismay) that this is indeed the case: the pushpin is shoved from its correct location completely outside of the map.

is there an open bug about it? either way, i am reverting my "reopen" status change and return the status to "resolved", because my previous change is really not related to Lua but to PDF rendering. at large.

peace.

volker.haas wrote:

(In reply to comment #39)

if i understand correctly, you say that the same problem will appear whenever
style="position:something" is used. i was surprised to hear that - we use
"position:absolute" in more than one place: specifically for "pushpins" in
map
displays.

i tried the PDF of [[en:Buenos Aires]], and found (to my dismay) that this is
indeed the case: the pushpin is shoved from its correct location completely
outside of the map.

is there an open bug about it?

yes. Unfortunately it is not possible to solve this in a reasonable amount of time.

either way, i am reverting my "reopen" status
change and return the status to "resolved", because my previous change is
really not related to Lua but to PDF rendering. at large.

peace.

Actually setting this to Resolved again. Please open a new bug about any further issues you are facing with the PDF export tool.

Thanks

(In reply to comment #12)

...
Another thing that's not going to work then is template blacklisting/print
templates. volker took a look at that some time ago, and it should not have a
big effect on pdf output. note that some elements will still be filtered via
the class=noprint attribute.

https://www.mediawiki.org/wiki/Thread:Extension_talk:Collection/Hide_in_print_broken

https://en.wikipedia.org/wiki/User:Unjedai/Hide_in_print_test

volker.haas wrote:

(In reply to comment #42)

(In reply to comment #12)

...
Another thing that's not going to work then is template blacklisting/print
templates. volker took a look at that some time ago, and it should not have a
big effect on pdf output. note that some elements will still be filtered via
the class=noprint attribute.

https://www.mediawiki.org/wiki/Thread:Extension_talk:Collection/
Hide_in_print_broken

https://en.wikipedia.org/wiki/User:Unjedai/Hide_in_print_test

Also see: https://en.wikipedia.org/wiki/Template_talk:Hide_in_print#Possible_Fix

(In reply to comment #41)

Actually setting this to Resolved again. Please open a new bug about any
further issues you are facing with the PDF export tool.

Thanks

not sure why it was set to "resolved", as it seems that the pdf creator still does not respect "position:" attribute. however, i will act on the advise, and will open a new ticket.

note that i do not have this extension installed on my wiki, and the bug report is based on enwiki.
for instance, look at [[en:Brussels]]: in section "Languages", there is a cute pie chart, whose implementation depends on various things, including position:absolute. the pdf completely misses it.
again, all the location maps that use pushpins (such as [[en:Buenos Aires]] mentioned above). in deference to your request, i'll open a new ticket.

peace.

(In reply to comment #44)

not sure why it was set to "resolved", as it seems that the pdf creator still
does not respect "position:" attribute.

Because that's a different bug. It's clearer for everyone involved if a new bug is opened for a new bug instead of repurposing an existing bug about a different issue.

(In reply to comment #45)

(In reply to comment #44)

not sure why it was set to "resolved", as it seems that the pdf creator still
does not respect "position:" attribute.

Because that's a different bug. It's clearer for everyone involved if a new
bug
is opened for a new bug instead of repurposing an existing bug about a
different issue.

my bad. see Bug 50178 for further details about the other issue.
peace.