ansaurus

Question

Increasing the Loading Speed of Large Files

Answer 1

A:

I don't know too much about Ruby but I have had to deal with the problem before. I found the best way was to split the file up into chunks or separate files then spawn threads to read each chunk in at a single time. Once the partitioned files are in memory combining the results should be fast. Here is some information on Threads in Ruby:

http://rubylearning.com/satishtalim/ruby_threads.html

Hope that helps.

Michael Bazos 2010-07-22 20:49:07

Would splitting it up really help? Because, as I kind of mentioned, when I used a thread for each file it only went slower.

Jesse J 2010-07-22 20:54:50

Answer 2

+2 A:

Why not use the solution devised through decades of experience: a database, say SQLlite3?

Marc-André Lafortune 2010-07-22 20:51:17

+1, Although this might not fair better after the "once-loaded" phase for simple key/values. Another option is a BDB-style (Berkley-DB) style back-end, if it's just a simple key/value store and not needing additional SQL-relationships and joins.

pst 2010-07-22 21:02:04

Answer 3

+1 A:

(To be different, although I'd first recommend looking at (Ruby) BDB and other "NoSQL" backend-engines, if they fit your need.)

If fixed-sized records with a deterministic index are used then you can perform a lazy-load of each item through a proxy object. This would be a suitable candidate for a mmap. However, this will not speed up the total access time, but will merely amortize the loading throughout the life-cycle of the program (at least until first use and if some data is never used then you get the benefit of never loading it). Without fixed-sized records or deterministic index values this problem is more complex and starts to look more like a traditional "index" store (eg. a B-tree in an SQL back-end or whatever BDB uses :-).

The general problems with threading here are:

The IO will likely be your bottleneck around Ruby "green" threads
You still need all the data before use

You may be interested in the Widefinder Project, just in general "trying to get faster IO processing".

pst 2010-07-22 21:07:30

The time it took to create a database was unbearable.

Jesse J 2010-08-09 17:21:05

Answer 4

+1 A:

In my usage, reading all or part the file into memory before parsing usually goes faster. If the database sizes are small enough this could be as simple as

buffer = File.readlines(database)
buffer.each do |line|
    ...
end

If they're too big to fit into memory, it gets more complicated, you have to setup block reads of data followed by parse, or threaded with separate read and parse threads.

Digikata 2010-07-22 21:44:35

This did shave roughly 30 seconds off, but it still takes over 2 minutes.

Jesse J 2010-07-22 21:59:25

I found out that after doing this, making other alterations to the method decreased the time. With this and other improvements, it's now down to an acceptable time.

Jesse J 2010-08-09 17:23:09

ansaurus

tags:

views:

answers:

Increasing the Loading Speed of Large Files

related questions