ansaurus

Question

In C#, how to pre-pend text string to a file without changing original encoding

Answer 1

A:

The code at the bottom of this page appears to let you Read a file in, retain the encoding and then write it back out using the same encoding. You should be able to modify this to do what you need.

Jacob G 2010-02-08 19:56:07

also, special thanks to Daniel Bruckner

Chicago 2010-02-09 15:59:52

Answer 2

+1 A:

Just read the file without ever decoding the bytes. And you have to choose an encoding for your header - ASCII seems to be the safest choice in your situation.

const String fileName = "test.file";
const String header = "MyHeader";

var headerBytes = Encoding.ASCII.GetBytes(header);
var fileContent = File.ReadAllBytes(fileName);

using (var stream = File.OpenWrite(fileName))
{
    stream.Write(headerBytes, 0, headerBytes.Length);
    stream.Write(fileContent, 0, fileContent.Length);
}

Note that the code directly operates on the stream and does not use a TextReader or TextWriter (this are the abstract base classes of StreamReader and StreamWriter). You can use a BinaryReader and BinaryWriter to simplify accessing the stream if you have to deal with tasks complexer then just reading and writing array.

Daniel Brückner 2010-02-08 20:02:21

Note that if the original file has a header, this may not be what you want. Given an original file formatted as `OLDHEADER+OLDDATA`, you may want to end up with `OLDHEADER+NEWDATA+OLDDATA` rather than `NEWHEADER+NEWDATA+OLDHEADER+OLDDATA` or `NEWDATA+OLDHEADER+OLDDATA`.

Brian 2010-02-08 20:07:55

Doesn't the text added to the file need to be encoded in the same manner as the original file? This won't do that.

Jacob G 2010-02-08 20:12:25

I might need to replace oldheader with newheaer.

Chicago 2010-02-08 20:21:11

The opener didn't state that this is a requirement. If it is required to use the same encoding, he has to solve a very hard problem - he has to detect the encoding of the file. There is no simple solution in the general case and one has to use statistical methods and heuristics to detect the correct encoding with a high probability.

Daniel Brückner 2010-02-08 20:21:51

How can you extract the old header without knowing the encoding? Is it fixed size or terminated with a special code? Or is it a binary header?

Daniel Brückner 2010-02-08 20:24:24

header is contained within predefined start and end tags, that's how I find it

Chicago 2010-02-08 20:30:28

So you know the encoding of the header?

Daniel Brückner 2010-02-08 20:34:45

I was really looking for a way to do what I want to do without decoding/encoding anything, but now, I see that it is almost impossible. Can you take a look at my answer on the bottom and let me know if that is a workable solution?

Chicago 2010-02-08 20:43:42

Answer 3

+1 A:

Text encoding is a funny thing. If you are writing code that is going to be interpreted as encoded characters, the compiler must be told (or it might assume) how the bytes ought to be arranged to represent particular characters. If your program is agnostic of the encoding of the files with which you will be working, then how do you intend to indicate to the compiler how it should interpret the array of bytes in that file?

You could consider reading the characters as a stream of bytes instead of a stream of characters if you don't know how to instruct the compiler to interpret those bytes (i.e. you don't know what encoding will be used at run-time). In that case, you have to hope that whatever basic characters you chose to append to the beginning of the file can be universally recognized in any encoding; this is because you will be appending bytes and not characters to the beginning of the file.

Ben McCormack 2010-02-08 20:09:32

Answer 4

A:

I found this which let's me get correct encoding of the file: http://www.personalmicrocosms.com/Pages/dotnettips.aspx?c=15&t=17

then I can use that encoding in the streamwriter.

Is it safe to use?

Chicago 2010-02-08 20:35:30

That's the page that I linked to in my answer. If your files are unicode-y, it should work.

Jacob G 2010-02-08 20:46:42

thanks, Jacob for your time

Chicago 2010-02-08 20:59:03

It will work for a very limited number of encoding - Unicode encodings with byte order mark. That are three encodings out of more then one hundred. You can have a look at http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html but I doubt it is worth the afford to implement it (but it would be great because I am not aware of any .NET library providing this kind of functionality). The way to go is probably just requiring that the file is in Unicode - if not, the user (knowing the encoding) must convert it. Not the best possible usability but given the complex problem, acceptable.

Daniel Brückner 2010-02-08 21:15:11

ansaurus

tags:

views:

answers:

In C#, how to pre-pend text string to a file without changing original encoding

related questions