Page MenuHomePhabricator

Add date to download in ZIM and PDF
Closed, DeclinedPublic

Description

Author: jwild

Description:
It would be good if a time/date stamp could be added to the bottom of all articles downloaded. I think this stamp should be the time/date of the last revision that was extracted (rather than the time/date that the file was downloaded)


Version: unspecified
Severity: enhancement

Details

Reference
bz30511
TitleReferenceAuthorSource BranchDest Branch
WIP: handle updated merge requestsrepos/releng/gitlab-webhooks!9brennenreview/brennen/mr-updatemain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:50 PM
bzimport set Reference to bz30511.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

I think this stamp should be the time/date of the last
revision that was extracted (rather than the time/date that the file was
downloaded)

That very same timestamp notion is already used on the bottom of each HTML page:

"This page was last modified on 16 August 2011 at 21:40."

So I guess Collection can just steal that.

volker.haas wrote:

I had this discussion with someone else a while ago. And the conclusion was that the link to the revision of the article in the back of the PDF is sufficient (at least for licensing reasons).

jwild wrote:

Mostly, this is for use of the ZIM downloads, rather than PDF...I don't see a date in there, but I could be missing it?

Also, the reason for this is less licensing, I think, but rather more as a helpful and informative disclaimer of sorts. For example, if an article marked as "13 November, 2006" claims that someone *is* a current president, I could take this into context.

(In reply to comment #3)

Mostly, this is for use of the ZIM downloads, rather than PDF...

The Collection extension doesn't produce ZIM dumps, does it? Tomasz, could you put this in the right component?

The collections(In reply to comment #4)

(In reply to comment #3)

Mostly, this is for use of the ZIM downloads, rather than PDF...

The Collection extension doesn't produce ZIM dumps, does it? Tomasz, could you
put this in the right component?

The collections extension DOES create zim collections. This is the correct place for it to be.

(In reply to comment #5)

The collections(In reply to comment #4)

(In reply to comment #3)

Mostly, this is for use of the ZIM downloads, rather than PDF...

The Collection extension doesn't produce ZIM dumps, does it? Tomasz, could you
put this in the right component?

The collections extension DOES create zim collections. This is the correct
place for it to be.

My bad. As Tomasz explained to me on IRC, we at WMF configure Collection to include ZIM as a format, and the PediaPress backend understands this and produces ZIM dumps. Those dumps would need to include the timestamps, and presumably Kiwix would have to display them somewhere too.

volker.haas wrote:

I guess this ticket needs further clarification...

If I understand correctly this hold ticket is only about zim files produced with the collection extension [1]. Therefore I won't do anything for PDFs.

Regarding the time stamps in the zim files: the zim files contain a timestamp of the export date of the zim file. Is this sufficient? Or do you want individual time stamps for each article? If so, should they be displayed below the article text?

[1] Let me clarify one thing about the "architecture: The Collection Extension indeed does *not* create zim files or PDFs. It allows the user to collect wikipedia (mediawiki) articles. This article collection can then be saved for example (e.g. [2]). The collection extension also knows (through its config) about the different export options (PDF, zim, odf...) *and* the render server which does the actual work of rendering. The render server is (strictly speaking) part of the mwlib module [3]. That component parses the MediaWiki text and exports it using submodules mwlib.rl (rl=reportlab=PDF) or mwlib.zim.

> The collection Extension does not render anything, but it's ok if anybody except the PediaPress developers do *not* make this distinction and talk about the Collection extension exporting zim files. (The only danger is that other devs, maybe like Roan above, can't find anything related to zim files in the Collection Extension code and therefore think that it can't produce zim files - which is correct or not, depending on the viewpoint ;))

tl;dr The Collection Extension happily produces PDFs, zim files etc.!

[2] http://en.wikipedia.org/wiki/Book:Hadronic_Matter
[3] https://github.com/pediapress/mwlib

jwild wrote:

Thanks for the clarifications, Volker! Sorry for my ambiguity and limited architectural knowledge...

So, for the time stamp: I think it should be in two additional places, besides just the "Created" stamp in the file info:
(1) At the bottom of the table of contents ("This collection was created <xx date>")
(2) At the top of each article text ("The article as seen at <xx time/date>")

volker.haas wrote:

Jessie, I don't blame you (or anybody else) for not knowing all architectural details of the Collection Extension (and mwlib) ;). I was just trying to give a brief overview of how all this stuff is linked together.

Anyways, I'll use your suggestion and add the timestamps. I'll do that in the coming days.

PDF by the new OCG/rdf2latex still have permalink, but not date. ZIM and ePub I didn't test, but they just consume MediaWiki's HTML.

Can parsoid's HTML be modified to add annotations like this for all the consumers/renderers to benefit from?

20.20 < Kelson> Nemo_bis: ZIM/PDF already have the date in the medatada

We don't think this is really needed nowadays. ZIM and PDF metadata contain dates; printers usually print the date of the printing; people looking for very exact information can check the oldid/permalink.