is there a way i can convert a .txt file into unicode by using c#?
Only if you know the original encoding used to produce the .txt
file (and that's not a restriction of C# or the .NET language either, it's a general problem).
Read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) to learn why "plain text" is meaningless if you don't know the encoding.
Provided you're only using ASCII characters in your text file, they're already Unicode, encoded as UTF-8.
In you want a different encoding of the characters (UTF16/UCS2, etc), any language that supports Unicode should be able to read in one encoding and write out another.
The System.Text.Encoding
stuff will do it as per the following example - it outputs UTF16 as both UTF8 and ASCII and then back again (code gratuitously stolen from here).
using System;
using System.IO;
using System.Text;
class Test {
public static void Main() {
using (StreamWriter output = new StreamWriter("practice.txt")) {
string srcString = "Area = \u03A0r^2"; // PI.R.R
// Convert the UTF-16 encoded source string to UTF-8 and ASCII.
byte[] utf8String = Encoding.UTF8.GetBytes(srcString);
byte[] asciiString = Encoding.ASCII.GetBytes(srcString);
// Write the UTF-8 and ASCII encoded byte arrays.
output.WriteLine("UTF-8 Bytes: {0}",
BitConverter.ToString(utf8String));
output.WriteLine("ASCII Bytes: {0}",
BitConverter.ToString(asciiString));
// Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded
// string and write.
output.WriteLine("UTF-8 Text : {0}",
Encoding.UTF8.GetString(utf8String));
output.WriteLine("ASCII Text : {0}",
Encoding.ASCII.GetString(asciiString));
Console.WriteLine(Encoding.UTF8.GetString(utf8String));
Console.WriteLine(Encoding.ASCII.GetString(asciiString));
}
}
}
If you do really need to change the encoding (see Pax's answer about UTF-8 being valid Unicode), then yes, you can do that quite easily. Check out the System.Text.Encoding class.
There is a nice page on MSDN about this, including a whole example:
// Specify the code page to correctly interpret byte values
Encoding encoding = Encoding.GetEncoding(737); //(DOS) Greek code page
byte[] codePageValues = System.IO.File.ReadAllBytes(@"greek.txt");
// Same content is now encoded as UTF-16
string unicodeValues = encoding.GetString(codePageValues);