IO#gets works the same as if you were capturing input from the command line: the "Enter" isn't sent as part of the input; neither is it passed when #gets is called on a File or other subclass of IO, so the numbers are definitely not going to match up.
See the relevant Pickaxe section
May I enquire why you're so concerned about the line lengths summing to the file size? You may be solving a harder problem than is necessary...
Aha. I think I get it now.
Lacking a handy iPod (or any other sort, for that matter), I don't know if you want exactly 4K chunks, in which case IO#read(4000) would be your friend (4000 or 4096?) or if you're happier to break by line, in which case something like this ought to work:
class Chunkifier
def Chunkifier.to_chunks(path)
chunks, current_chunk_size = [""], 0
File.readlines(path).each do |line|
line.chomp! # strips off \n, \r or \r\n depending on OS
if chunks.last.size + line.size >= 4_000 # 4096?
chunks.last.chomp! # remove last line terminator
chunks << ""
end
chunks.last << line + "\n" # or whatever terminator you need
end
chunks
end
end
if __FILE__ == $0
require 'test/unit'
class TestFile < Test::Unit::TestCase
def test_chunking
chs = Chunkifier.to_chunks(PATH)
chs.each do |chunk|
assert 4_000 >= chunk.size, "chunk is #{chunk.size} bytes long"
end
end
end
end
Note the use of IO#readlines to get all the text in one slurp: #each or #each_line would do as well. I used String#chomp! to ensure that whatever the OS is doing, the byts at the end are removed, so that \n or whatever can be forced into the output.
I would suggest using File#write, rather than #print or #puts for the output, as the latter have a tendency to deliver OS-specific newline sequences.
If you're really concerned about multi-byte characters, consider taking the each_byte or unpack(C*) options and monkey-patching String, something like this:
class String
def size_in_bytes
self.unpack("C*").size
end
end
The unpack version is about 8 times faster than the each_byte one on my machine, btw.