Page MenuHomePhabricator

InstantCommons doesn't display DjVu on some installs
Closed, ResolvedPublic

Description

Reference the following examples:
http://www.wikilivres.ca/wiki/File:PhotographyTheoryAndPracticeOCRed.djvu#filehistory
http://wiki.doughendrick.com/wiki/File:Three_Books_of_Occult_Philosophy_(De_Occulta_Philosophia)_(1651).djvu
http://commons.wikimedia.beta.wmflabs.org/wiki/Help:DjVu

DjVu files are not thumbnailing.
For file space, the description page is displaying but the image is replaced with a gray box and the words: "Error creating thumbnail: Unable to fetch XML for DjVu file". A similar result is obtained when the djvu is called inline as a
There is know documentation for this issue.

However, compare:
http://en.wikisource.org/wiki/File:Three_Books_of_Occult_Philosophy_(De_Occulta_Philosophia)_(1651).djvu

Use of DjVus on all Wikisources via InstantCommons is standard and does not exhibit this problem.

This issue has not been tested yet with pdfs or paged tiffs.


Version: unspecified
Severity: normal

Details

Reference
bz37764

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:25 AM
bzimport added a project: MediaWiki-DjVu.
bzimport set Reference to bz37764.

Assigning to Antoine, as I think this is part of the ongoing beta cluster configuration.

The files are there. Looks like an issue with DjVuImage::retrieveMetaData(). That method relies on command line utilities to get metadata as a XML:

  • djvudump (output converted to XML)
  • djvutoxml

Looks like $wgDjvuDump is properly setup:

$ mwscript eval.php commonswiki

var_dump( $wgDjvuDump );

string(17) "/usr/bin/djvudump"

var_dump( $wgDjvuToXML );

NULL

The first utility seems to be indeed installed on beta apaches:

deployment-apache30:~$ djvudump
DJVUDUMP --- DjVuLibre-3.5.24
Describes DjVu and IFF85 files

Will need a bit more investigation.

So I eventually a call ForeignAPIFile->getMetaData() returns null. Need to find out why there is no metadata.

ForeignAPIFile does a query to the api.php, something like:

curl 'http://commons.wikimedia.org/w/api.php?titles=File%3AAlice_in_Wonderland.djvu&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata%7Cmime&prop=imageinfo&iimetadataversion=2&format=json&action=query&redirects=true'

The json returned does not have any metadata apparently:

{

"query": {
  "normalized": [{
    "from": "File:Alice_in_Wonderland.djvu",
    "to": "File:Alice in Wonderland.djvu"
  }],
  "pages": {
    "3567014": {
      "pageid": 3567014,
      "ns": 6,
      "title": "File:Alice in Wonderland.djvu",
      "imagerepository": "local",
      "imageinfo": [{
        "timestamp": "2007-10-01T21:06:20Z",
        "user": "Yann",
        "size": 3548027,
        "width": 2550,
        "height": 3562,
        "pagecount": 114,
        "comment": "
                  {
                      {Information\n|Description=Alice in Wonderland\n|Source=http:\/\/gutenberg.cc\/DjVu.htm\n|Date=1865\n|Author=Lewis Carroll\n|Permission=\n|other_versions=\n}
                  }
                  \n\n[[Category:En Wikisource book djvu]]\n",
        "url": "http:\/\/upload.wikimedia.org\/wikipedia\/commons\/2\/29\/Alice_in_Wonderland.djvu",
        "descriptionurl": "http:\/\/commons.wikimedia.org\/wiki\/File:Alice_in_Wonderland.djvu",
        "sha1": "0c0fff99f7c61272a602edb870ff59f3564e3723",
        "metadata": null,
        "mime": "image\/vnd.djvu"
      }]
    }
  }
}

}

Created attachment 10792
djvdump of the Alice .djvu file

djvdump of the Alice .. .djvu file from the Wikimedia apache server srv247.

Attached:

Stepping out of this bug and bringing it back to the pool. I am totally unfamiliar with our image metadata system and the specific dejavu system.

Patch received a -1. Tpt, would you have time to rework your patch?

Change 99544 had a related patch set uploaded by Brian Wolff:
Make DjVu metadata be stored as serialized PHP array.

https://gerrit.wikimedia.org/r/99544

Change 24660 abandoned by Tpt:
(bug 37764) Allow imageInfo API query to give XML DJVU metadata

Reason:
I5c1d2d2434f70b57137837bade797d4133c47b70 is a far better solution.

https://gerrit.wikimedia.org/r/24660

Adding bug 35925 (tracking), since this bug may block interested people to run Wikisource forks (DjVu is mostly used on Wikisource)

Change 99544 merged by jenkins-bot:
Make DjVu metadata be stored as serialized PHP array.

https://gerrit.wikimedia.org/r/99544

All patches mentioned in this report are either merged or abandoned - is there more work left to do here (if yes: please reset the bug report status to NEW or ASSIGNED), or can you close this ticket as RESOLVED FIXED?

No reply to comment 13 - assuming this bug is FIXED.
If that is not the case: Please reopen and elaborate what is left to do here to get this report fixed.