views:

155

answers:

4

I am using a library called EXIFextractor to extract metadata information from images. This lib in part is using System.Drawing.Imaging.PropertyItem to do all the hard work. Some of the data in PropertyItem, such as Image Details etcetera, are fetched as an ASCII-string stored in a byte[] according to the Microsoft documentation.

My problem is that international characters (å, ä, ö, etcetera) are dropped and replaced by questionmarks. When I debug the code it is apparent that the byte[] is a representation of an UTF-8.

I'd like to parse the byte[] as an UTF8-string, how can I do this without loosing any information in the process?

Thanks in advance!


Update:

I have been asked to provide a snippet from my code:

The first snippet is from the class I use, namely the EXIFextractor.cs written by Asim Goheer

foreach( System.Drawing.Imaging.PropertyItem p in parr )
{
 string v = ""; 

                // ...

 else if( p.Type == 0x2 )
 {
  // string     
  v = ascii.GetString(p.Value);
 }

And this is my code where I try my best to handle the results of the above.

                try {
  EXIFextractor exif = new EXIFextractor(ref bmp, "");
  object o;
                    if ((o = exif["Image Description"]) != null)
                        MediaFile.Description = Tools.UTF8Encode(o.ToString()); 

I have also tried a couple of other ways of getting my precious å, ä, ö from the data, but nothing seems to do the trick. I am starting to think Hans Passant is right about his conclusions in his answer below.

+3  A: 

Use the GetString method on the Encoding.UTF8 object.

Tim Robinson
+10  A: 
string yourText = Encoding.UTF8.GetString(yourByteArray);
Scoregraphic
Thanks for the swift answer. However I have already tried this. No luck. I am starting to wonder if the sources (image files) are correctly encoded in the first place.
dotmartin
If you can share an example, we may check or try on our own.
Scoregraphic
Of course. Since I am new at this, shall I provide it as an answer or in a comment or what is the preferred way of doing this?
dotmartin
You should edit and update your question. A bold "Update" label in the text with the "new" stuff should do.
Scoregraphic
Thanks for the guidance! :)
dotmartin
Please see my comment in Hans Passant's answer
Scoregraphic
Alright, this seems to be the solution after all. Sort of, at least. I was just a bit of regarding the encoding. The metadata seems to be encoded using ISO-8859-1, which makes sense since we are using windows across all our sites. So I simply create an encoder: Encoding enc = Encoding.GetEncoding("ISO-8859-1");Then I use it to decode the byte array:v = enc.GetString(p.Value,0,p.Len - 1); Where p is the ProperyItem. This seems to work! Thanks for all your help! I am impressed by your enthusiasm and your helpfulness. Sure hope I can tribute in the same way! Again, thanks!
dotmartin
You're welcome :D
Scoregraphic
A: 

Maybe you could try another encoding? UTF16, Unicode? If you aren't sure if it got encodes right in the first place try to view the exif metadata with another exif reader.

codymanix
+1  A: 

Yes, this is a problem with the app or camera that originated the image. The EXIF standard has horrible support for text, it has to be encoded in ASCII. That only ever works out well when the photographer speaks English. No doubt the software that encoded the image is ignoring this requirement. Which is what the PropertyItem class is doing as well, it encodes a string to byte[] with Marshal.StringToHGlobalAnsi(), which assumes the system's default code page.

There's no obvious fix for this, you'll get mojibake when the photo was made too far away from your machine.

Hans Passant
This was what I expected. How ever I was still hoping that Photoshop and the built in tool by XMP would be able to get things straight. Are there any suggestions on what one could do to resolve the issue? My company has a lot of files with bad encoding, so a batch processor would be preferred.
dotmartin
Is it still true that in the byte-array all bytes are correct according to your locale? If it is, you may try encoding/decoding using your locale instead of UTF8 / ascii. See http://msdn.microsoft.com/en-us/library/system.text.encoding.getencoding.aspx
Scoregraphic
No luck. I still get the questionmarks.
dotmartin
I downloaded an application called GeoSetter which is used to geotag photos, but it also have the capabilities to read and write EXIF- and IPTC-metadata. It tells me that the metadata is UTF-8 encoded and displays the Swedish characters correctly.
dotmartin
I wonder if you could add an example of such a picture (if allowed). You may edit the picture as well, as long as the EXIF data is still written.
Scoregraphic
I might be on the right course towards a solution. I have managed to edit the EXIFextractor class to translate the byte-array to a correctly encoded string right away. I will conduct some more research and soon be able to tell if my theories holds up!
dotmartin