ansaurus

Question

Answer 1

+2 A:

Yes, that's because ASCII is only 7-bit - it doesn't define any values above 127. Encodings typically decode unknown binary values to '?' (although this can be changed using DecoderFallback).

If you're about to mention "extended ASCII" I suspect you actually want Encoding.Default which is "the default code page for the operating system"... code page 1252 on most Western systems, I believe.

What characters were you expecting?

EDIT: As per the accepted answer (I suspect the question was edited after I added my answer; I don't recall seeing anything about JPEG originally) you shouldn't convert binary data to text unless it's genuinely encoded text data. JPEG data is binary data - so you should be checking the actual bytes against the expected bytes.

Any time you convert arbitrary binary data (such as images, music or video) into text using a "plain" text encoding (such as ASCII, UTF-8 etc) you risk data loss. If you have to convert it to text, use Base64 which is nice and safe. If you just want to compare it with expected binary data, however, it's best not to convert it to text at all.

EDIT: Okay, here's a class to help image detection method for a given byte array. I haven't made it HTTP-specific; I'm not entirely sure whether you should really fetch the InputStream, read just a bit of it, and then fetch the stream again. I've ducked the issue by sticking to byte arrays :)

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;

public sealed class SignatureDetector
{
    public static readonly SignatureDetector Png =
        new SignatureDetector(0x89, 0x50, 0x4e, 0x47);

    public static readonly SignatureDetector Bmp =
        new SignatureDetector(0x42, 0x4d);

    public static readonly SignatureDetector Gif =
        new SignatureDetector(0x47, 0x49, 0x46);

    public static readonly SignatureDetector Jpeg =
        new SignatureDetector(0xff, 0xd8);

    public static readonly IEnumerable<SignatureDetector> Images =
        new ReadOnlyCollection<SignatureDetector>(new[]{Png, Bmp, Gif, Jpeg});

    private readonly byte[] bytes;

    public SignatureDetector(params byte[] bytes)
    {
        if (bytes == null)
        {
            throw new ArgumentNullException("bytes");
        }
        this.bytes = (byte[]) bytes.Clone();
    }

    public bool Matches(byte[] data)
    {
        if (data == null)
        {
            throw new ArgumentNullException("data");
        }
        if (data.Length < bytes.Length)
        {
            return false;
        }
        for (int i=0; i < bytes.Length; i++)
        {
            if (data[i] != bytes[i])
            {
                return false;
            }
        }
        return true;
    }    

    // Convenience method
    public static bool IsImage(byte[] data)
    {
        return Images.Any(detector => detector.Matches(data));
    }        
}

Jon Skeet 2009-05-31 14:47:23

Why the downvote?

Jon Skeet 2009-05-31 18:30:45

lol, not this again... downvote removed due to your edit. Given the new information the author added now - it'd be best to properly write the IsFileImage method for him. You working on that? I'm not wasting my time if you are...

TheSoftwareJedi 2009-05-31 18:44:33

note, I didn't see who answered it this way. If i knew it was you, I would have commented and explained what he was trying to do... maybe still a downvote tho :P I thought it was a bad answer originally - but then again, it was a bad question too. :)

TheSoftwareJedi 2009-05-31 18:48:43

I'm not currently implementing IsFileImage, but might do later. Before the edit which talked about the JPEG part, the author just asked why it was showing question-marks instead of the expected characters (IIRC). Once again, I think a downvote is almost *always* worth a comment. Why would anyone else have gained less from the comment than I would have done?

Jon Skeet 2009-05-31 19:08:46

I downvote a lot - I see bad answers and I want them below the better ones. Its just how it works. Requiring a comment would be annoying. Some answers aren't worthy of a comment. You can take that to uservoice, but it always gets rejected. Personally, I'll try to look to see if I'm downvoting Jon Skeet and write a little note. Most times it's not needed though - it was clear (even to you) what was wrong with your answer.

TheSoftwareJedi 2009-05-31 19:41:03

Also note Jon, there was no edit adding the JPEG comment. It was always there. You just missed it. I made a mistake once before too - it was painful. lol

TheSoftwareJedi 2009-05-31 19:42:33

@TSJ: Are you absolutely sure? Don't forget that if a question or answer is edited within a short time frame, we don't get to see it as an edit. (Moderators may do.)

Jon Skeet 2009-05-31 19:44:39

(And I believe I answered this question very soon after it was originally posted, btw.) As for the downvotes - I believe it's almost *always* helpful to add a comment. I know that the various requests to make this mandatory have been declined, but I believe it's because people would just add rubbish instead of writing useful comments. That in no way disproves the usefulness of comments.

Jon Skeet 2009-05-31 19:46:20

Too bad i'm not a C# programmer, so i don't understand all that great stuff this guy is writing :(

Johannes Schaub - litb 2009-05-31 20:06:00

Answer 2

A:

Are you sure "????" is the result?

What is the result of:

(int)ascii[0]
(int)ascii[1]

On the other hand, pure ASCII is 0-127 only...

Philippe Leybaert 2009-05-31 14:48:56

Answer 3

+1 A:

If you then wrote:

Console.WriteLine(ascii)

And expected "FFD8" to print out, that's not the way GetString work. For that, you would need:

 string ascii = String.Format("{0:X02}{1:X02}", header[0], header[1]);

James Curran 2009-05-31 14:55:23

It would then print "3F3F" - the biggest problem (IMO) is the fact that it's converted into text at all.

Jon Skeet 2009-05-31 18:37:09

Answer 4

A:

I'm trying to write a function which determines if a file is an image based only on header information....

Currently if I pass a jpeg in the header array contains 255, 216, 255, 225 but isImageHeader always returns false....

public static bool IsFileImage(HttpPostedFileBase postedFile)
        {
            bool isImageHeader = false; 
            if (postedFile.ContentLength > 0)
            {
                byte[] header = new byte[4];
                string[] imageHeaders = new[]{
                "\xFF\xD8", // JPEG
                "BM",       // BMP
                "GIF",      // GIF
                Encoding.ASCII.GetString(new byte[]{137, 80, 78, 71})}; // PNG

                postedFile.InputStream.Read(header, 0, header.Length);

                string ascii =  Encoding.d.ASCII.GetString(header);

                isImageHeader = imageHeaders.Count(str => ascii.StartsWith(str)) > 0;
            }

            return isImageHeader;
        }

ListenToRick 2009-05-31 15:04:38

Encoding.d.ASCII.GetString(header) should read:Encoding.ASCII.GetString(header)

ListenToRick 2009-05-31 15:05:09

I suspect that this issue related to the fact I'm not using the extended ascii character set?

ListenToRick 2009-05-31 15:24:39

There's no single "extended ASCII" character set. There are *lots* of character encodings that share the first 128 values with ASCII. The main problem is that you're converting into text at all, when you only really want binary data. See my answer for more information.

Jon Skeet 2009-05-31 18:38:02

Edit your question and include this - don't add supplementary info in answers which aren't answers. One of us can easily show you how to implement that method now that we have all the details.

TheSoftwareJedi 2009-05-31 18:45:29

Answer 5

+4 A:

In this case you'd be better to compare the byte arrays rather than converting to string.

If you must convert to string, I suggest using the encoding Latin-1 aka ISO-8859-1 aka Code Page 28591 encoding, as this encoding will map all bytes with hex values are in the range 0-255 to the Unicode character with the same hex value - convenient for this scenario. Any of the following will get this encoding:

Encoding.GetEncoding(28591)
Encoding.GetEncoding("Latin1")
Encoding.GetEncoding("ISO-8859-1")

Joe 2009-05-31 15:36:30

Cheers for the suggestions. Why would you compare the byte arrays?

ListenToRick 2009-05-31 15:48:28

Because it's binary data. JPEGs aren't text, so shouldn't be converted to text.

Jon Skeet 2009-05-31 18:32:36

Answer 6

A:

I once wrote a custom encoder/decoder that encoded bytes 0-255 to unicode characters 0-255 and back again.

It was only really useful for using string functions on something that isn't actually a string.

Joshua 2009-05-31 20:25:20

ansaurus

tags:

views:

answers:

c# and Encoding.ASCII.GetString

related questions