views:

1389

answers:

1

I have a SQL file added to my VS.NET 2008 project as an embedded resource. Whenever I use the following code to read the file's content, the string returned always starts with three junk characters and then the text I expect. I assume this has something to do with the Encoding.Default I am using, but that is just a guess. Why does this text keep showing up? Should I just trim off the first three characters or is there a more informed approach?

public string GetUpdateRestoreSchemaScript()
{
    var type = GetType();
    var a = Assembly.GetAssembly(type);
    var script = "UpdateRestoreSchema.sql";
    var resourceName = String.Concat(type.Namespace, ".", script);
    using(Stream stream = a.GetManifestResourceStream(resourceName))
    {
        byte[] buffer = new byte[stream.Length];
        stream.Read(buffer, 0, buffer.Length);
        // UPDATE: Should be Encoding.UTF8
        return Encoding.Default.GetString(buffer);
    }
}

Update: I now know that my code works as expected if I simply change the last line to return a UTF-8 encoded string. It will always be true for this embedded file, but will it always be true? Is there a way to test any buffer to determine its encoding?

+2  A: 

Probably the file is in utf-8 encoding and Encoding.Default is ASCII. Why don't you use specific necoding?

Edit to answer a comment:

In order to guess the file encoding you could look for BOM(http://en.wikipedia.org/wiki/Byte-order_mark) at the start of the stream. If it exists, it helps, if not then you can only guess or ask user

Alex Reitbort
Is there a way to test the encoding? Since I didn't specify an encoding when I saved the file, I assumed it was the default.
flipdoubt
Default encoding in different editors isn't the same.
Alex Reitbort