tags:

views:

296

answers:

2

I am writing an open-source library to extract image metadata and store as XMP serialized in an XML sidecar (ideally identically to how Adobe's apps serialize their metadata).

My problem is it appears that BitmapMetadata has all of the values that I need, but the keys are mangled. Many of them are just integers, rather than their corresponding XMP namespaced XML-style names. Microsoft claims they are using XMP to store reads/writes many different formats of metadata within media, but I do not see any way to reconstruct some of the standard XMP names from these.

For example, Name="/{ushort=272}", Format="ifd" is what I have but I need is <tiff:Model> where xmlns:tiff="http://ns.adobe.com/tiff/1.0/". For this, I can use the ExifTags from my ExifUtils library to map some of the keys because I know what it is. I'm not sure about many of the others though.

My question:

  1. Anyone familiar enough with BitmapMetadata to know if I'm headed down a dead-end?

  2. Is there a standardized mapping that Microsoft is encoding to? I haven't found one yet in Adobe's official XMP specs.


Update: the open source code for this library is now available at Google Code as XmpUtils library. It supports reading/writing XMP metadata as the standard RDF-based XML.

A: 

I seem to have stumbled upon the key mapping on page 18 in Part 3 of the XMP spec. It looks like BitmapMetadata is simply exposing the JPEG encoded XMP data sections:

The marker types FFE0-FFEF are generally used for application data, named APPn. By convention, an APPn marker begins with a string identifying the usage, called a namespace or signature string. An APP1 marker identifies Exif and TIFF metadata; an APP13 marker designates a Photoshop Image Resource (PSIR) that contains IPTC metadata; another APP1 marker designates the location of the XMP packet.

Not sure where the definitive list comes from as this seems incomplete.


Update:

I just stumbled upon an MSDN set of pages ("Photo Metadata Policy") which links to a fairly comprehensive list of Microsoft Metadata Query Language paths (on left) for each property they support. It is a absolutely horrible format with one path per page but it seems to be a lot of the data I need. Unfortunately it looks like there are different paths for JPEG and TIFF...


Update:

Also this page is key, as it defines the craziness which is this XPath-like syntax: Metadata Query Language Overview

McKAMEY
+1  A: 

As it turns out, the Windows Imaging Component (WIC) used by BitmapMetadata reads/writes many different types of metadata blocks including TIFF, EXIF, IPTC, and XMP. This explains why unfortunately their object model does not correspond very closely with the XMP serialization model; it is highly generalized.

The key mapping which I was looking for depends on which section is being decoded, and even in the case of XMP, it isn't exactly a clean conversion. The MSDN links in the other answer give detailed descriptions of Metadata Query Language which is the XPath-like syntax that WIC uses to reference metadata sections within media. This is useful for parsing each path segment into a key which then may be used to determine the corresponding XMP namespace and property name.

As I mentioned in the question, I've now built this library which correctly converts a very high percentage of the metadata properties from the TIFF, EXIF, and XMP blocks.

See the XmpUtils library source code to see the details of how I ended up extracting this data in a standardized manner.

McKAMEY