questions about file-encodings

C#: "Swedish" characters in Xpath when parsing Lating1Encoded docs.

I've a set of html docs that I need to parse. They are encoded in Latin1Encoded. I'm using HtmlAgiliy pack for "parsing". I have a Xpath query (with swedish characters) that I can't get to work because of different encodings between the docs and the encoding VS stores the XPath query in?? Xpath query: doc.DocumentNode.SelectNodes(@"/...

c#

xpath

latin1

file-encodings

How do I set file.encoding for a junit test in ant?

I'm not quite done with file.encoding and ant. How do I set the file.encoding for junit tests in ant? The junit ant task doesn't support the encoding attribute like the javac task does. I've tried running «ant -Dfile.encoding=UTF-8» and «ANT_OPTS="-Dfile.encoding=UTF-8" ant» without success. System.getProperty("file.encoding") wit...

Java Charset problem on linux

Hi, problem: I have a string containing special characters which i convert to bytes and vice versa..the conversion works properly on windows but on linux the special character is not converted properly.the default charset on linux is UTF-8 as seen with Charset.defaultCharset.getdisplayName() however if i run on linux with option -Dfil...

java

charset

file-encodings

Perl and reading files with different encodings

I am using a perl script to read in a file, but I'm not sure what encoding the file is in. Basically, my file is a list of book titles, but each book has other info associated with it (author, publication date, etc). So each book title is within a discrete chunk of data for the book. So I iterate through the file line by line until I fin...

perl

input

file-encodings

File encodings with ruby

Hi, I'm having a bit problems with file encodings. I'm receiving a url-encoded string like "sometext%C3%B3+more+%26+andmore", unescape it, process the data, and save it with windows-1252 encoding. The conversions are these: irb(main) >> value => "sometext%C3%B3+more+%26+andmore" irb(main) >> CGI::unescape(value) => "sometext\303\263 ...

ruby

file-encodings

File.listFiles() mangles unicode names with JDK 6 (Unicode Normalization issues)

I'm struggling with a strange file name encoding issue when listing directory contents in Java 6 on both OS X and Linux: the File.listFiles() and related methods seem to return file names in a different encoding than the rest of the system. Note that it is not merely the display of these file names that is causing me problems. I'm mainl...

file encoding generating blank character in ruby -- why?

I'm using this little bit of ruby: File.open(ARGV[0], "r").each_line do |line| puts "encoding: #{line.encoding}" line.chomp.split(//).each do |char| puts "[#{char}]" end end And I have a sample file that I'm feeding in the file just contains three periods and a newline. When I save this file with a fileencoding of utf-8 ...