views:

106

answers:

5

I have an array of integers

a = [1,2,3,4]

When I do

a.join

Ruby internally calls the to_s method 4 times, which is too slow for my needs.

What is the fastest method to output an big array of integers to console?

I mean:

a = [1,2,3,4........,1,2,3,9], should be: 

1234........1239

A: 

The slowness in you program does not come from to_s being called 4 times, but from printing to the console. Console output is slow, and you can't really do anything about it.

FRotthowe
Not really, sure, the console output is slow, but it exists many tricks to improve it, have a look for example at this: http://stackoverflow.com/questions/3334294/ruby-fast-reading-from-std
astropanic
I would have to agree with FRotthowe on this one. The limitation is the console printing itself, not how the information gets to the console. By the looks of both posts, are you trying to send data from one script to another via input/output redirection?
Anh-Kiet Ngo
@Ank-Kiet Ngo , yes, I manipulate an array of integers and I need to return it to stdout, the array are really big, so the to_s operations take the most of the time (ruby-prof)
astropanic
Of course there are ways to improve the console output speed, but the OP's question was how to avoid using `join` because he thought that's what causes the slowness.
FRotthowe
I have clearly said, the many to_s are problematic, ruby-prof shows me exactly that, not the IO#write is the bottleneck here.Shure, when the array is small, the bottleneck will be the stdout, but with a big array, the overhead is taken by the many to_s calls
astropanic
I see. You're not really writing to the console but just to stdout for consumption in another program. Then of course the console speed isn't an issue.
FRotthowe
If you are concerned about the speed of a to_s call then perhaps ruby is not the correct tool for this job.
Steve Weet
A: 

For single digits you can do this

[1,2,3,4,5].map{|x|(x+48).chr}.join

If you need to speed up larger numbers you could try memoizing the result of to_s

gnibbler
I doubt very much that this is actually faster. And as it turns out a quick benchmark shows, that map+chr+join takes about 1.75 times as long as a plain join.
sepp2k
+2  A: 

As stated in the comments above if Fixnum.to_s is not performing quickly enough for you then you really need to consider whether Ruby is the correct tool for this particular task.

However, there are a couple of things you could do that may or may not be applicable for your situation.

If the building of the array happens outside the time critical area then build the array, or a copy of the array with strings instead of integers. With my small test of 10000 integers this is approximately 5 times faster.

If you control both the reading and the writing process then use Array.pack to write the output and String.unpack to read the result. This may not be quicker as pack seems to call Fixnum.to_int even when the elements are already Integers.

I expect these figures would be different with each version of Ruby so it is worth checking for your particular target version.

Steve Weet
Indeed, I saved over 2 seconds with an array of over 1mln elements (Ruby 1.9.1) puts [1,2,3.....n].map{|c| c+48}.pack("c*")
astropanic
+3  A: 

If you want to print an integer to stdout, you need to convert it to a string first, since that's all stdout understands. If you want to print two integers to stdout, you need to convert both of them to a string first. If you want to print three integers to stdout, you need to convert all three of them to a string first. If you want to print one billion integers to stdout, you need to convert all one billion of them to a string first.

There's nothing you, we, or Ruby, or really any programming language can do about that.

You could try interleaving the conversion with the I/O by doing a lazy stream implementation. You could try to do the conversion and the I/O in parallel, by doing a lazy stream implementation and separating the conversion and the I/O into two separate threads. (Be sure to use a Ruby implementation which can actually execute parallel threads, not all of them can: MRI, YARV and Rubinius can't, for example.)

You can parallelize the conversion, by converting separate chunks in the array in separate threads in parallel. You can even buy a billion core machine and convert all billion integers at the same time in parallel.

But even then, the fact of the matter remains: every single integer needs to be converted. Whether you do that one after the other first, and then print them or do it one after the other interleaved with the I/O or do it one after the other in parallel with the I/O or even convert all of them at the same time on a billion core CPU: the number of needed conversions does not magically decrease. A large number of integers means a large number of conversions. Even if you do all billion conversions in a billion core CPU in parallel, it's still a billion conversions, i.e. a billion calls to to_s.

Jörg W Mittag
A: 

Unless you really need to see the numbers on the console (and it sound like you do not) then write them to a file in binary - should be much faster.

And you can pipe binary files into other programs if that is what you need to do, not just text.

Rob Jones