Page MenuHomePhabricator

commons fails to purge large djvu files
Closed, ResolvedPublic

Description

Since a few days trying to purge large djvu file fail, so the text layer of djvu file is not accessible. After a successful purge, creating a page should show the text layer here: http://fr.wikisource.org/wiki/Livre:Croiset_-_Histoire_de_la_litt%C3%A9rature_grecque,_t4.djvu


Version: 1.16.x
Severity: normal
URL: http://commons.wikimedia.org/wiki/File:Croiset_-_Histoire_de_la_litt%C3%A9rature_grecque,_t4.djvu

Details

Reference
bz21809

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:50 PM
bzimport set Reference to bz21809.

thomasV1 wrote:

The bug can be seen for the following files:

http://commons.wikimedia.org/wiki/File:Burnouf_-_Lotus_de_la_bonne_loi.djvu
http://commons.wikimedia.org/wiki/File:Croiset_-_Histoire_de_la_litt%C3%A9rature_grecque,_t4.djvu
http://commons.wikimedia.org/wiki/File:Michaud_-_Biographie_universelle_ancienne_et_moderne_-_1843_-_Tome_10.djvu

the first two of them were uploaded recently, and the djvu text layer has not been successfully extracted,
because of this bug; or maybe it is the text layer extraction that causes the bug.

the last file was uploaded a long time ago, and at that time the file could be purged, so
the djvu text was successfully extracted; it is thus still available in the metadata.

I tested the first file on my machine, with a recent mediawiki install and it worked fine:
the file can be purged and the text layer is correctly extracted.

lars wrote:

When I try to "purge" the large djvu file from Commons, I get an HTTP 500 internal server error response after exactly 30 seconds. Why is purge taking so long? It should just remove old stuff (supposedly a quick operation), and then schedule a queued job for reindexing (a slower operation, depending on the job queue length).

thomasV1 wrote:

Purge takes time because of the djvu text layer extraction.

The bug should be fixed in r61258.

thomasV1 wrote:

Reopening this bug because the fix is not live.

Bryan.TongMinh wrote:

Fix has been deployed, but purging still doesn't work.

Purging work fine now, Bryan, what File: fails to purge for you ?

Bryan.TongMinh wrote:

I got a 403 "Wikimedia has an error" error page trying to purge http://commons.wikimedia.org/wiki/File:Uppslagsbok_f%C3%B6r_alla_1910.djvu. Presumably it times out, because it takes a long while to load the page.