I'm using this little bit of ruby:
File.open(ARGV[0], "r").each_line do |line|
puts "encoding: #{line.encoding}"
line.chomp.split(//).each do |char|
puts "[#{char}]"
end
end
And I have a sample file that I'm feeding in the file just contains three periods and a newline.
When I save this file with a fileencoding of utf-8 (in vim: set fileencoding=utf-8
) and run this script on it I get this output:
encoding: UTF-8
[]
[.]
[.]
[.]
And then if I change the fileencoding to latin1 (in vim: set fileencoding=latin1
) and run the script, I don't get that first blank char:
encoding: UTF-8
[.]
[.]
[.]
What's going on here? I understand that the utf8 encoding puts some bytes at the start of the file to mark the file as utf8 encoded, but I thought they were supposed to be invisible when processing the text (i.e.: the ruby runtime was supposed to process them). What am I missing?
btw:
ubuntu:~$ ruby --version
ruby 1.9.2p0 (2010-08-18 revision 29034) [i686-linux]
Thanks!
Update:
Hex dump of the file with the extra char (the BOM):
ubuntu:~$ hexdump new.board
0000000 bbef 2ebf 2e2e 0a0d 0a0d
000000a