views:

61

answers:

2

if not whats the maximum while still remaining efficient?

im creating 14 threads, each of which opens a list of URLs(about 500) creates a new thread for each one, which then downloads it, and adds it to a MySQL db. The MySQL pool size is set to 50.

This is a rake task in RoR

Would this work better using Kernal#fork or some other method?

+2  A: 

With Ruby 1.8, it's practically limited to how much memory you have. You can create tens of thousands of thread per process. The Ruby interpreter handles the management of the threads and only one or two native thread are created. It isn't true multitasking where the CPU switches between threads.

Ruby 1.9 uses native threads. The limit seems to be what is allowed by the OS. Just for testing, I can create over 2000 threads on my mac with Ruby 1.9 before the OS disallows any more.

Note that having thousands of threads for a process isn't a good idea. Thread scheduling becomes a burden long before that.

Alkaline
ok thanks! maybe ill just use those first 14, and not all the sub threads. Thanks!
loosecannon
you mean practically unlimited? (and yes, having more threads makes your app runnn...slower...on 1.8.6 because of the shared ref's for the GC, though I suppose you could use REE to avoid those).
rogerdpack
No, it's certainly not unlimited. I'm sure you agree that it's "limited in practice" by how much RAM is available.
Alkaline
A: 

Well, since your threads are going to be IO bound, the good news is that both Ruby 1.8 and 1.9 threads will work for this. Ruby 1.8 uses "userspace threads," meaning no real new OS threads are created when you create new threads in Ruby. This is bad for CPU multitasking, since only one Ruby thread is actually running at a time, but good for IO multitasking. Ruby 1.9 uses real threads, and will be good for either.

The number of threads you can create really depends on your system. There are of course practical limits, but you probably don't want to get anywhere near them. First, unless the servers you're downloaidng from are very slow and your connection is very fast, just a few threads is going to saturate your Internet connection. Also, if you're grabbing a lot of pages from a single server, throwing 500 requests at it at once from 500 threads isn't going to do any good either.

I'd start pretty small: 10 or 20 threads running at once. Increase or decrease this depending on server load, your bandwidth, etc. There's also the issue of concurrent connections to the MySQL database. Depending on how your tables are set up and how large they are, trying to insert too much data at the same time isn't going to work very well.

AboutRuby