Page MenuHomePhabricator

Export and PDF generator returns "file not found" error on completion
Closed, ResolvedPublic

Description

Author: neil

Description:
Using the "download as PDF" option on en.wikibooks (create a PDF from a Collection) completes successfully. The URL that is then provided to link to the generated PDF returns "file not found" error. E.g.:

  1. For this book: https://en.wikibooks.org/wiki/Fundamentals_of_Transportation
  2. Request collection rendering: https://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=rendering&return_to=Wikibooks%3ACollections%2FFundamentals+of+Transportation&collection_id=7c6b44dc136f2d81&writer=rl
  3. Generates this download link: https://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=download&collection_id=7c6b44dc136f2d81&writer=rl&return_to=Wikibooks%3ACollections%2FFundamentals+of+Transportation
  4. Returns error "The file you are trying to download does not exist: Maybe it has been deleted and needs to be regenerated. "

Impacting at least two users on multiple books. Believed to have been working yesterday.


Version: master
Severity: normal

Details

Reference
bz36950

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:22 AM
bzimport set Reference to bz36950.
bzimport added a subscriber: Unknown Object (MLST).

martin_kraus_germany wrote:

It didn't work yesterday for me; but I think it worked 2 days ago.

martin_kraus_germany wrote:

de.wikibooks.org is also affected.

just confirmed by requesting a pdf of front page of enwikib. rendered, got 404.

Thuvack wrote:

Yes. This is the same at Wikiversity too.

I sent an email to the PediaPress guys to see what can be done about this.

Ok, so, this isn't actually Collection specifically at fault. It's seemingly some change in MediaWiki

1.20wmf2 MediaWiki with 1.20wmf2 branched Collection works fine.

1.20wmf2 MediaWiki with head Collection works fine.

1.20wmf3 MediaWiki with head collection doesn't work.

1.20wmf3 MediaWiki with 1.20wmf3 branched collection doesn't work.

1.20wmf3 Mediawiki with 1.20wmf2 branched collection doesn't work.

In the meantime, I have put wikibooks back to 1.20wmf2

So the code is falling over at:

			$info = self::mwServeCommand( 'download', array(
				'collection_id' => $wgRequest->getVal( 'collection_id' ),
				'writer' => $wgRequest->getVal( 'writer' ),
			) );
			$content_type = $info['content_type'];
			$content_length = $info['download_content_length'];
			$content_disposition = null;
		}
		if ( !$info ) {
			$wgOut->showErrorPage( 'coll-download_notfound_title', 'coll-download_notfound_text' );
			return;
		}

Ok, so the problem is this code recently added to our HTTP code:

		if ( isset( $this->respHeaders['content-length'] ) ) {
			if ( strlen( $this->content ) < $this->getResponseHeader( 'content-length' ) ) {
				$this->status->fatal( 'http-truncated-body' );
			}
		}

We're getting 0 content, but the header gives a number, so this is then classed as a fatal error..

url: http://pdf1.wikimedia.org:8080//08/084377b53fc2a31c/output.rl
2012-05-18 20:19:00 mw10 frwikibooks: content-length-header: 700197
content-length-actual: 0
url: http://pdf3.wikimedia.org:8080//f8/f84485f7ffd576cf/output.rl
2012-05-18 20:19:25 srv267 dewikibooks: content-length-header: 57811
content-length-actual: 0
url: http://pdf1.wikimedia.org:8080//ce/ce9d841ca7076699/output.rl

I've disabled that code for the moment, and now all wikis that were previously on 1.20wmf3 are back on it.

The question here, is why are we apparently getting no content. In this case, at least, it doesn't seem to make any difference, as we can still download the file fine.

Is it safe/sensible to add an option for the curl downloader that we just ignore this check? Collection seems to be the main (but not the only) offender

To many things might use Content-Length for purposes other than the body length. Like POSTing or PUTing to Swift/S3 will give a content-length of the data stored, not the response. Also, HEAD requests, of course, shouldn't have this check. Really, only the caller of the Http class can determine if the headers make sense for the body in this case.

(In reply to comment #9)

To many things might use Content-Length for purposes other than the body
length.

You mean violating RFC 2616?

" The Content-Length entity-header field indicates the size of the

entity-body, in decimal number of OCTETs, sent to the recipient or,
in the case of the HEAD method, the size of the entity-body that
would have been sent had the request been a GET."

Also, HEAD requests, of course, shouldn't have this check.

It seems HEAD requests weren't skipped from the check, which is a bug.

You could also argue not adding it for status codes 1xx, 204, and 304 (but they shouldn't have a non-zero Content-length. anyway)

Like POSTing or PUTing to Swift/S3 will give a content-length of the
data stored, not the response.

That's the content-length of the client request, not of the server reply, which is what we're dealing with, here.
If the server replies the POST with the request size, that violates the HTTP specification. How do you know the response length, then?

Really, only the caller of the Http class can determine if the
headers make sense for the body in this case.

The caller shouldn't need to manually check the content.

Reedy, can you provide the server headers in the reply to the output.rl POST?

(In reply to comment #11)

Like POSTing or PUTing to Swift/S3 will give a content-length of the
data stored, not the response.

That's the content-length of the client request, not of the server reply, which
is what we're dealing with, here.
If the server replies the POST with the request size, that violates the HTTP
specification. How do you know the response length, then?

No it's the response to the POST that has this header used like this. The client knows that it gets only headers back for such things (and Swift will use statuses like 204 on success).

(In reply to comment #12)

No it's the response to the POST that has this header used like this. The
client knows that it gets only headers back for such things (and Swift will use
statuses like 204 on success).

Well, HTTP status 204 means "No content", so that would make it a slightly lesser violation.
Still, I see no good reason for doing it that way.

I don't see such behavior documented nor reflected in the doc samples, though:
http://docs.openstack.org/api/openstack-object-storage/1.0/content/create-update-object.html

(In reply to comment #13)

(In reply to comment #12)

No it's the response to the POST that has this header used like this. The
client knows that it gets only headers back for such things (and Swift will use
statuses like 204 on success).

Well, HTTP status 204 means "No content", so that would make it a slightly
lesser violation.
Still, I see no good reason for doing it that way.

I don't see such behavior documented nor reflected in the doc samples, though:
http://docs.openstack.org/api/openstack-object-storage/1.0/content/create-update-object.html

Right. I think I mixed this with something else.

This is Still Issue, I am a "New-Be" Here, and I Can't Figure Out Where to Report this BUG, So Here Goes, (Please Pass this ON as Necessary, As I Can't Figure Out How to Proceed) Here's My Issue... Original Message 04/19/2016 16:55 -

Al Adams wrote:

Hello ???:  I am Having a Problem Trying to Download a PDF Copy of a WiKiPedia
Document is Totally Repeatable, on Several Browsers...  of this Web Document ...
Robert Fuller (actor) - Wikipedia, the free encyclopedia  It Says it Has the
Document Ready to Download, BUT When I Goto Click to Save As... It Can't Find the
File ... The document file has been generated. Download the file to the computer
Fails Every Time And the Create a book Doesn't Seem to Generate a Book either ??? 
Please Help ... 

This is the WiKiPedia item I was Accessing: Robert Fuller (actor) – Wikipedia,

the free encyclopediaLeonard Leroy “Buddy” Lee(born July 29, 1933), better known by his stage name of Robert Fuller, is an American horse rancher and retired actor.

View on en.wikipedia.org

My Answer Came Back as...

Dear Al Adams

Unfortunately that "download PDF" function is buggy and ill-maintained. You are welcome to report the bug

(see https://en.wikipedia.org/wiki/Wikipedia:Bug_reports_and_feature_requests
on how to do so), but it's considered a low priority.
The best advice I can give you is to try again later. I apologize for the inconvenience.

Yours sincerely, Ingo Schröder
-- 

Wikipedia - https://en.wikipedia.org/

Disclaimer: all mail to this address is answered by volunteers, and responses are not to be considered an official statement of the Wikimedia Foundation. For official correspondence, please contact the Wikimedia Foundation by certified mail at the address listed onhttps://www.wikimediafoundation.org/