Page MenuHomePhabricator

Show metadata information from SVGs on file description page
Closed, ResolvedPublic

Description

Author: thomas

Description:
Add ability to show metadata to SVGs

Inkscape and other SVG editors let you add a ton of useful metadata to SVGs which duplicates the stuff that we currently put on the image page.

Example:
http://commons.wikimedia.org/wiki/Image:Asl_alphabet_gallaudet_ann.svg
contains the licence, language, author, description, etc.

The attached patch does two things:

  1. adds the ability to show SVG metadata on image pages, and
  2. renames wgShowEXIF to wgShowImageMetadata. But I'm not sure how to auto-detect this as we did before, so it's hardcoded to 1 instead for the moment.

Version: 1.12.x
Severity: enhancement

Attached:

Details

Reference
bz12649

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:02 PM
bzimport set Reference to bz12649.
bzimport added a subscriber: Unknown Object (MLST).

thomas wrote:

Other things I need to fix up, looking over it again:

  • The equivalent function for finding the height and width of an SVG uses a regex and not an XML parser; I need to write a comment explaining why an XML parser is really needed here
  • I think everyone uses the "cc" and "dc" prefix for those namespaces, but really we need to ask the XML parser to turn them back into the full namespace names for disambiguation
  • something elsewhere requires getMetadataType() to return "exif" when there's useful information; we should fix that and not just work around it here
  • I need to find out all the URLs for all the licences.

A warning -- XMLReader isn't included in PHP 5.0, and while it's enabled by default in 5.1 it's possible that not all distro builds include it.

If it's not convenient to rewrite it to the uglier event-based XML parser, you should ensure that the functionality degrades gracefully when the extension class isn't present.

+ wfMsg('license='.$xml->getAttribute('rdf:resource')));
[snip]
+'license=http://web.resource.org/cc/PublicDomain' => 'Public domain',

^^ That's probably not such a good idea; while it might work at the moment, you can reasonably expect it to break. Message keys shouldn't contain arbitrary URLs...

This technique will also show really ugly output for anything not already thought of and added to a list. It might be better to show a nicely formatted default instead for unknown values.

thomas wrote:

Good point: I'll use a hash table to map the licence URLs to unique identifiers in the i18n table, and something similar will do very nicely for mapping the namespace names to unique identifiers too.

What sort of nicely formatted output are you thinking of? It looks something like

<svgmeta-foobar>     42

at the moment.

What about XPath?

There's a PHP Library on Sourceforge which does XPath fairly nicely. You can find things inside of a XML document giving it the text of the document, and it works on PHP 4 through 5 without dependencies on any of the XML libraries.

http://sourceforge.net/projects/phpxpath/

thomas wrote:

That's a really good plan, and would make everything much cleaner. I'll give it a whirl.

Basics for metadata collection are now in r75968.

Still needs a lot of work though. I held off on the RDF parsing, because I'd first like to see the new metadata branch merged into trunk.