tags:

views:

2888

answers:

5

Hi,

I need to create a file that embeds an image as text within some records. I'm having some trouble writing the images as text. What I'm doing is gathering the image as a byte array from a SQL database (image type) then I'm writing that image to a text file by going through each byte and writing that byte's ASCII equivalent to the file.

Before I can write that image to a text file, I must convert it to a TIFF (it was formerly a jpeg) in CCITT4 format. To double check that this is being done correctly, I also save the stream as a TIFF and view it in "AsTiffTagViewer," which shows that the compression is correct. I AM able to view the tiff in a proper viewer; however, when gathering the text from the file, I am unable to view the image.

Here's the code:

  byte[] frontImage = (byte[])imageReader["front_image"];
    MemoryStream frontMS = new MemoryStream(frontImage);
    Image front = Image.FromStream(frontMS);
    Bitmap frontBitmap = new Bitmap(front);
    Bitmap bwFront = ConvertToBitonal(frontBitmap);
    bwFront.SetResolution(200, 200);
    MemoryStream newFrontMS = new MemoryStream();
    bwFront.Save(newFrontMS, ici, ep);
    bwFront.Save("c:\\Users\\aarong\\Desktop\\C#DepositFiles\\" + checkReader["image_id"].ToString() + "f.tiff", ici, ep);
    frontImage = newFrontMS.ToArray();   
    String frontBinary = toASCII(frontImage); 

        private String toASCII(byte[] image)
        {
            String returnValue = "";
            foreach (byte imageByte in image)
            {
                returnValue += Convert.ToChar(imageByte);
            }

            return returnValue;
        }

It is frontBinary that's being written to the file. Does anyone have an idea as to what is wrong? The tiff that's saved is correct, yet the exact same byte array, when written as ASCII text, is not being written correctly.

Thank you.

EDIT This issue has been corrected by using a BinaryWriter(byte[]) to correctly write the images as text. Thank you all for your help!

+10  A: 

Well ASCII is only seven-bit, for one thing. However, I don't believe your code actually uses ASCII. It sort of uses ISO-8859-1, implicitly.

Never treat text as binary or vice versa. It will always lead to problems.

The best way of converting binary to ASCII text is to use Base64:

string text = Convert.ToBase64String(frontImage);
byte[] data = Convert.FromBaseString(text);

Also note that if your code did work, it would still be painfully inefficient - read up on StringBuilders and then consider that your code is semi-equivalent to

Encoding.GetEncoding(28591).GetString(data);

However, base64 is definitely the way to go to convert between text and binary data losslessly. You'll need to convert it back to binary in order to view the TIFF again, of course.

Note that you haven't shown how you're saving or loading your data - you may have problems there too. In fact, I suspect that if you were able to save the string accurately, you might have been lucky and preserved the data, depending on exactly what you're doing with it... but go with base64 anyway.

Jon Skeet
great explanation +1
Demi
Hi Jon,Thank you very much for your explanation. Thanks also for the info on StringBuilders...I'll definitely use those instead of my needlessly inefficient concatenations.I'll be trying out your suggestions soon and will let you know how it goes.
Aaron
Hi Jon. I tried the Base64 String, and it certainly worked. I was able to write the image to the file and produce the image again when reading the file. Thank you for your help, and I'm sure that your explanation will help others with similar problems. :)
Aaron
Cool - glad it worked out for you :)
Jon Skeet
Hi again Jon. The vendor that I'm sending the file to just informed me that base64 is not the correct format, and it should be in "Intel Format" instead. My contact told me that the text should start with "II" which it what it started with when I was using my old method. If I go back to the old method, the text will still not be correct. I am using a StreamWriter object to actually write to the file, and I did some research saying that maybe a BinaryWriter is the way to go. Perhaps this is an underlying issue?
Aaron
See if you can get an absolutely precise file format - not just the name, but a full specification. If it's meant to be a *text* file then it shouldn't have arbitrary bits of binary data in it.
Jon Skeet
I do have the exact specifications. It states "All characters and symbols must be represented using 8-bit EBCDIC encoding with the exception of the Image Data field, which is binary data." It further states that the image must be TIFF 6.0 with CCITT4 compression, which I tested to be correct when I saved just the image to a file. Perhaps what you said earlier about getting the image from the database is incorrect? Maybe I'm losing data somewhere when I'm loading it.
Aaron
Ah, so you're using EBCDIC for the rest. Well if you were trying to write out binary data converted into a string using ISO-8859-1 and then writing that string out with an EBCDIC encoding that would certainly be a problem. BinaryWriter may well be exactly what you want in this case. However, you'll have to use Write(char[]) instead of Write(string) for your text data, as you don't want the length prefix.
Jon Skeet
Thank you very much again for the response. I really appreciate your help. I'll let you know how it works out.
Aaron
Hi again Jon. So I used the BinaryWriter.Write(byte[]) to write the image to a .data file as well as a .tiff file. I think compared the two files using Beyond Compare, and they show no differences. Just to check, I reverted back to my previous way of writing the image, and there were plenty of differences. Hopefully, I now have this image writing correctly. Thanks again very much for your help! I'm going to have to buy your second edition of C# in Depth when it comes out :).
Aaron
That will definitely dump the data verbatim to the file. Now you've just got to get the rest of it right :) If you're stuck for EBCDIC encodings, have a look at http://pobox.com/~skeet/csharp/ebcdic - but be aware that there are various different forms of EBCDIC.
Jon Skeet
Great. Thanks for the info! I'll let you know how everything works out.
Aaron
A: 

One approach to taking binary data and converting it to text data is to use a StreamReader and provide the desired encoding. Like Jon mentioned above it is unwise to use ASCII, but in case any one DOES want to stream binary to some other text encoding, here is some code to do it.

public static String GetString(System.IO.Stream inStream)
{
    string str = string.Empty;
    using (StreamReader reader = new StreamReader(inStream, System.Text.ASCIIEncoding.ASCII)) // or any other encoding.
    {
        str = reader.ReadToEnd();
    }
    return str;
}
Will Charczuk
No, that's still a very bad idea - because text and binary data just don't play together nicely like that, particularly with ASCIIEncoding which will clear the top bit of every byte.
Jon Skeet
I'm just giving the dude what he asked for which is binary data converted to ascii, not passing judgment on the purpose or the end result.
Will Charczuk
Whereas I don't think it's a good idea to give someone exactly what they ask for if I know it's going to lose data (without even saying that it'll happen). Clearly that goes against the OP's bigger goal.
Jon Skeet
its understandable to want to keep useless or detritus posts off s.o., i'll clean this post up to show how to stream any binary to text encoding, admittedly ascii is bad in this case.
Will Charczuk
A: 

Is there a specific reason why you use text instead of a binary file?

Storing binary data in text files is always a bad idea since encodings may convert the bytes to another representation and special characters like linefeed may also be treated specially and converted.

Either store the data as byte array in a binary file or use proper binary to ascii conversion like Jon's Base64 proposal or maybe a list of comma separated hex-values is also possible.

codymanix
A: 

If you are writing only the image data to the file, you should not write it as text at all, but as binary data.

If you are mixing text and binary data in the file, you should not convert the binary data to text. It might work with some specific encodings to convert it back and forth, but it certainly doesn't work with any encoding to convert it to unicode characters (using Convert.ToChar).

Do it the other way around. Encode the text into binary data using the GetBytes method of the proper Encoding object, so that you only have binary data to write to the file.

Guffa
Converting binary data to text with base64 (or similar - you could use base16, for example) is fine. It sounds like the output of this has to be a record format that another tool can load - which suggests that he can't really define the whole file format, as would be suggested by changing to write everything in binary.
Jon Skeet
A: 

You're probably reading the database back as Unicode, which will alter some of the binary values in the image.

You can use methods on the System.IO.File class to read/save as binary and text. These might help along with the Base64 options mentioned above.

Adam Ruth