views:

847

answers:

1

I need to get the ASCII character for every character in a string. Actually its every character in a (small) file. The following first 3 lines successfully pull all a file's contents into a string (per this recipe):

set fp [open "store_order_create_ddl.sql" r]
set data [read $fp]
close $fp

I believe I am correctly discerning the ASCII code for the characters (see http://wiki.tcl.tk/1497). However I'm having a problem figuring out how to loop over every character in the string.

First of all I don't think the following is an especially idiomatic way of looping over characters in a string with Tcl. Second and more importantly, it behaves incorrectly, inserting an extra element between every character.

Below is the code I've written to act on the contents of the "data" variable set above, followed by some sample output.

CODE:

for {set i 0} {$i < [string length $data]} {incr i} {
  set char [string index $data $i]
  scan $char %c ascii
  puts "char: $char (ascii: $ascii)"
}

OUTPUT:

char: C (ascii: 67)
char:  (ascii: 0)
char: R (ascii: 82)
char:  (ascii: 0)
char: E (ascii: 69)
char:  (ascii: 0)
char: A (ascii: 65)
char:  (ascii: 0)
char: T (ascii: 84)
char:  (ascii: 0)
char: E (ascii: 69)
char:  (ascii: 0)
char:   (ascii: 32)
char:  (ascii: 0)
char: T (ascii: 84)
char:  (ascii: 0)
char: A (ascii: 65)
char:  (ascii: 0)
char: B (ascii: 66)
char:  (ascii: 0)
char: L (ascii: 76)
char:  (ascii: 0)
char: E (ascii: 69)
+4  A: 

The following code should work:

set data {CREATE TABLE}
foreach char [split $data ""] {
    lappend output [scan $char %c]
}
set output ;# 67 82 69 65 84 69 32 84 65 66 76 69

As far as the extra characters in your output, it seems like the problem is with your input data from the file. Is there some reason there would be null characters (\0) in between every character in the file?

RHSeeger
I'd begun to suspect that it might be an issue with the input, though there is no good reason for null characters between every character, except that it was generated with a Microsoft (SQL Server) tool ;)
George Jempty
Then that's your answer. Most Microsoft tools (as well as Apple's, by the way), use UTF-16 as their internal encoding; UTF-16LE being far more widespread because that's the native Intel endianness. You need to tell Tcl to interpret the input file as UTF-16. Again, no idea how to do that, sorry, but you should look for keywords like “encoding” or “character set” or, generally speaking, Unicode, in the docs.
Arthur Reutenauer
Think you might want to do: fconfigure $fp -encoding unicodeafter opening the file but before reading from it.
Colin Macleod