



Do you know any way that I could programmatically or via scrirpt transform a set of text files saved in ansi character encoding, to unicode encoding?

I would like to do the same as I do when I open the file with notepad and choose to save it as an unicode file.


Use the System.IO.StreamReader(To read the file contents) class together with the System.Text.Encoding.Encoding(To create the Encoder object which does the encoding) base class.


pseudo code...

Dim system, file, contents, newFile, oldFile

Const ForReading = 1, ForWriting = 2, ForAppending = 3 Const AnsiFile = -2, UnicodeFile = -1

Set system = CreateObject("Scripting.FileSystemObject...

Set file = system.GetFile("text1.txt")

Set oldFile = file.OpenAsTextStream(ForReading, AnsiFile)

contents = oldFile.ReadAll()


system.CreateTextFile "text1.txt"

Set file = system.GetFile("text1.txt")

Set newFile = file.OpenAsTextStream(ForWriting, UnicodeFile)

newFile.Write contents


Hope this approach will work..

+2  A: 

You can use iconv. On Windows you can use it under Cygwin.

iconv -f from_encoding -t to_encoding file
Why's the accepted answer related to Cygwin? The question is tagged as powershell...
Yes, at the begining I was looking for a powershell solution, but turns out that this worked really good for me and I could also use cygwin.Anyway all the reponses given seem to be valid approaches
+3  A: 

The easiest way would be Get-Content 'path/to/text/file' | out-file 'name/of/file'.

Out-File has an -encoding parameter, the default of which is Unicode.

If you wanted to script a batch of them, you could do something like

$files = get-childitem 'directory/of/text/files' 
foreach ($file in $files) 
  get-content $file | out-file $file.fullname
Steven Murawski

You could create a new text file and write the bytes from the original file into the new one, placing a '\0' before each original byte (assuming the original text file was in English).

Danny Varod
+5  A: 

This could work for you, but notice that it'll grab every file in the current folder:

Get-ChildItem | Foreach-Object { $c = (Get-Content $_); `
Set-Content -Encoding UTF8 $c -Path ($ + "u") }

Same thing using aliases for brevity:

gci | %{ $c = (gc $_); sc -Encoding UTF8 $c -Path ($ + "u") }

Steven Murawski suggests using Out-File instead. The differences between both cmdlets are the following:

  • Out-File will attempt to format the input it receives.
  • Out-File's default encoding is Unicode-based, whereas Set-Content uses the system's default.

Here's an example assuming the file test.txt doesn't exist in either case:

PS> [system.string] | Out-File test.txt
PS> Get-Content test.txt

IsPublic IsSerial Name                                     BaseType          
-------- -------- ----                                     --------          
True     True     String                                   System.Object     

# test.txt encoding is Unicode-based with BOM

PS> [system.string] | Set-Content test.txt
PS> Get-Content test.txt


# test.txt encoding is "ANSI" (Windows character set)

In fact, if you don't need any specific Unicode encoding, you could as well do the following to convert a text file to Unicode:

PS> Get-Content sourceASCII.txt > targetUnicode.txt

Out-File is a "redirection operator with optional parameters" of sorts.
