ansaurus

Question

Most Efficient Way to Write to Fixed Width File (Ruby)

Answer 1

A:

#!/usr/bin/ruby
# replace_at_pos.rb
pos, char, infile, outfile = $*
pos = pos.to_i
File.open(outfile, 'w') do |f|
  File.foreach(infile) do |line|
    line[pos] = char
    f.puts line
  end
end

and you use it as:

replace_at_pos.rb 140 X inputfile.txt outputfile.txt

For replacing set of values, you can use a hash:

replace = {
  100 => 'a',
  155 => 'c',
  151 => 't'
}
. . .
replace.each do |k, v|
  line[k] = v
end

Mladen Jablanović 2010-05-12 14:03:16

Great, I'll have to try this out and see what kind of performance boost I receive. Just one more quick question for you... How would I modify this if more than one position would need changed. E.G I need to update the date in positions 100..107. Thank you once again for the help!

Ruby Novice 2010-05-12 14:09:08

Hmmm just used the code you provided and all it appears to do is delete every line in the file.

Ruby Novice 2010-05-12 14:23:21

First part or the second? I tried the first, works ok. The second is just an idea, not a working code.

Mladen Jablanović 2010-05-13 00:41:16

The first part (I'm working from Windows)

Ruby Novice 2010-05-13 13:59:37

I changed the code (and invoking syntax), try it now. Not sure why it wouldn't work in Windows.

Mladen Jablanović 2010-05-13 14:28:21

Answer 2

+1 A:

If it's a fixed width file, you can open the file for read/write and use seek to move to the start of the data you want to write, and only write the data you're changing and not the whole line. This would probably be more efficient than rewriting the entire file to replace one field.

Here's a crude example. It reads the last field (10,20,30) increments it by 1, and writes it back:

tha_file (10 characters per line, including newline)

12 3 x 10
23 4 x 20
78 9 x 30

seeker.rb

#!/usr/bin/env ruby
fh=open("tha_file", "r+")

$RECORD_WIDTH=10
$POS=8
$FIELD_WIDTH=2

# seek to first field
fh.seek($POS - 1, IO::SEEK_CUR)

while !fh.eof?

  cur_val=fh.read($FIELD_WIDTH).to_i
  puts "read #{cur_val}"
  fh.seek(-1 * $FIELD_WIDTH, IO::SEEK_CUR)
  cur_val = cur_val + 1

  fh.write(cur_val)
  puts "wrote #{cur_val}"

  # Move to start of next field in the middle of next record
  fh.seek($RECORD_WIDTH - $FIELD_WIDTH, IO::SEEK_CUR)
end

Shin 2010-05-12 14:14:38

I attempted this before going with the method used above and it unfortunately caused all kinds of problems. I suppose I had only been using Ruby for a week or so at that point though, so maybe I'll give it another shot.

Ruby Novice 2010-05-12 14:24:30

Could you possibly give me an example of what the code would look like? I can't seem to stop seek from changing the formatting of the file any time I insert new values. I've tried looking around for some more in depth guides on how to use it, but every site seems to give the same example. Thanks

Ruby Novice 2010-05-12 14:34:24

The problem is you have to always remember /exactly/ where you are in the file and must make sure to write the fields in the same width. My code above doesn't check width and will break wnen going from 99 to 100.

Shin 2010-05-12 17:58:15

Awesome, this gives me a much better idea about how to go about implementing this approach. Thanks for taking the time to write up a more comprehensive example, I'll tinker with it a bit and see if I can't get it working =D

Ruby Novice 2010-05-12 19:32:43

So I managed to get this method working with the files that I'm using (was actually much more simple than expected), but I've found an odd problem. The original method is still completing faster than the IO method (tried flushing the buffer, etc.) and I cannot figure out why. I haven't had time to do intensive benchmarking to solve what's causing it yet, so I'm just curious if you have any idea what could be slowing it down?Thanks!

Ruby Novice 2010-05-14 14:31:41

It really depends on your data and what you're trying to do. In your example, you read the whole file at once, and write a whole new file. Roughly N IO operations. My example does a read, seek, write, seek, so 4*N IO operations. One or the other might be faster depending on the size of your data. Another alternative would be to write the file processing logic in a faster language (C, Java, etc).

Shin 2010-05-17 21:10:24

Answer 3

A:

You will certainly save some time and quite a lot of memory by reworking the programs to read from the file a line at a time (You are currently reading the whole file into memory). You then write to a backup copy of the file within the loop and then rename the file at the end. Something like this.

  def self.writefiles2(file_name, positions, update_value)
    @file_name = file_name
    @new_file = file_name + ".bak"
    @positions = positions.to_i
    @update_value = update_value

    line_number = 0
    reader = File.open(@file_name, 'r')
    writer = File.open(@new_file, 'w')

    while (line = reader.gets() and not line.nil? )
      line[@positions] = @update_value
      writer.puts(line)
    end
    reader.close
    writer.close
    # Rename the file
  end

This would of course want some error handling around the rename element which could result in the loss of your input data.

Steve Weet 2010-05-12 14:39:16

Just benchmarked both methods, and unfortunately the original method I have been using is quite a bit faster. Thanks for the input though!

Ruby Novice 2010-05-12 15:50:10

Well that's odd as mine showed exactly the opposite. i.e. Mine ran in about 2/3 of the time. (100 iterations over a 256k lines file) Gave 102s vs 161s. Did you run then within the same process? I tried that but there was very little memory left after the first run so I tried them in separate processes.

Steve Weet 2010-05-12 17:18:38

Hmmm I'll have to try it once again then, sorry I missed the update to your post yesterday. Thanks!

Ruby Novice 2010-05-13 15:29:21

You will almost certainly find that shin's solution is the quickest. The version above must end up faster as it is doing a single scan of the input file and a consecutive write of the output whereas your original is reading the input file into an array and then iterating over the array to write the output.

Steve Weet 2010-05-13 15:44:46

ansaurus

tags:

views:

answers:

Most Efficient Way to Write to Fixed Width File (Ruby)

related questions