views:

69

answers:

2

I need to save a few informations about some files. Nothing too fancy so I thought I would go with a simple one line per item text file. Something like this :

# write
io.print "%i %s %s\n" % [File.mtime(fname), fname, Digest::SHA1.file(fname).hexdigest]
# read
io.each do |line|
  mtime, name, hash = line.scanf "%i %s %s"
end

Of course this doesn't work because a file name can contain spaces (breaking scanf) and line breaks (breaking IO#each).

The line break problem can be avoided by dropping the use of each and going with a bunch of gets(' ')

while not io.eof?
  mtime = Time.at(io.gets(" ").to_i)
  name = io.gets " "
  hash = io.gets "\n"
end

Dealing with spaces in the names is another matter. Now we need to do some escaping.
note : I like space as a record delimiter but I'd have no issue changing it for one easier to use. In the case of filenames though, the only one that could help is ascii nul "\0" but a nul delimited file isn't really a text file anymore...

I initially had a wall of text detailing the iterations of my struggle to make a correct escaping function and its reciprocal but it was just boring and not really useful. I'll just give you the final result:

def write_name(io, val)
  io << val.gsub(/([\\ ])/, "\\\\\\1") # yes that' 6 backslashes !
end

def read_name(io)
  name, continued = "", true
  while continued
    continued = false
    name += io.gets(' ').gsub(/\\(.)/) do |c|
      if c=="\\\\"
        "\\"
      elsif c=="\\ "
        continued=true
        " "
      else
        raise "unexpected backslash escape  : %p (%s %i)" % [c, io.path, io.pos]
      end
    end
  end
  return name.chomp(' ')
end

I'm not happy at all with read_name. Way too long and akward, I feel it shouldn't be that hard.

While trying to make this work I tried to come up with other ways :

  • the bittorrent encoded / php serialize way : prefix the file name with the length of the name then just io.read(name_len.to_i). It works but it's a real pita to edit the file by hand. At this point we're halfway to a binary format.

  • String#inspect : This one looks expressly made for that purpose ! Except it seems like the only way to get the value back is through eval. I hate the idea of eval-ing a string I didn't generate from trusted data.

So. Opinions ? Isn't there some lib which can do all this ? Am I missing something obvious ? How would you do that ?

+1  A: 

When you say "save" do you mean store the information in a file?

You could use the CSV module from the Ruby Standard Library. This would mean that your delimiter is comma rather than space but it would handle all the escaping and unescaping for you.

  • If a value contains spaces that value is enclosed in "quotes"

  • If a value contains quotes then a quote character is escaped as 2 quote characters e.g. "hello" would become """hello"""

To write the details to a file:

require 'csv'

outfile = File.open('csvout', 'wb')
CSV::Writer.generate(outfile) do |csv|
  csv << [File.mtime(fname), fname, Digest::SHA1.file(fname).hexdigest]
end
outfile.close

To read them back:

CSV::Reader.parse(File.open('csvout', 'rb')) do |row|
  p row
end
mikej
ah csv !Hadn't thought about it. Probably because I'm no fan of the format and its baroque (and many times redefined) escaping rules. But it's there, and it works. And since I get to pick the writer and reader I won't have any compatibility problems right ?Seriously though, I'm testing the module and it seems pretty solid. I'd need to rewrite all the logic around the reader but that's to be expected when doing this kind of framework switch.
module managed to handle everything I threw at it.
+1  A: 

CSV, as mentioned, is a good choice. Another is YAML ("Yaml Ain't a Markup Language"), which can handle more arbitrary data than can CSV. Here's some data:

require 'pp'
require 'yaml'

h = {
  :first_name => 'Fred',
  :last_name => 'Flinstone',
  :children => ['Bam Bam', 'Pebbles'],
  :exclamation => 'Yabba Dabba Doo',
}

Let's write the data to a file in YAML format:

File.open('/tmp/foo.yaml', 'w') do |file|
  file.write h.to_yaml
end

Now let's see what the YAML looks like:

$ cat /tmp/foo.yaml
---
:exclamation: Yabba Dabba Doo
:first_name: Fred
:last_name: Flinstone
:children:
- Bam Bam
- Pebbles

And finally let's reconstitute the data from the YAML file:

pp YAML.load_file('/tmp/foo.yaml')
# => {:exclamation=>"Yabba Dabba Doo",
# =>  :first_name=>"Fred",
# =>  :last_name=>"Flinstone",
# =>  :children=>["Bam Bam", "Pebbles"]}
Wayne Conrad
Hmmm, I like yaml and I've even used it once in a situation where json wasn't going to cut it (damn those graph cycles!). But it's not really what I need here. Pretty poor solution for a one table database in text format. Actually I don't really see any point in using yaml for non hierarchical data.