views:

235

answers:

3

A prof once told us in class that Windows, Linux, OS X and UNIX scale on threads and not processes, so threads would likely benefit your application even on a single processor because your application would be getting more time on the CPU.

I tried with the following code on my machine (which only has one CPU).

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

pthread_t xs[10];

void *nop(void *ptr) {
    unsigned long n = 1UL << 30UL;
    while(n--);
    return NULL;
}

void test_one() {

    size_t len = (sizeof xs) / (sizeof *xs);
    while(len--)
        if(pthread_create(xs+len, NULL, nop, NULL))
            exit(EXIT_FAILURE);

    len = (sizeof xs) / (sizeof *xs);
    while(len--)
        if(pthread_join(xs[len], NULL))
            exit(EXIT_FAILURE);

}

void test_two() {

    size_t len = (sizeof xs) / (sizeof *xs);
    while(len--) nop(NULL);

}

int main(int argc, char *argv[]) {

    test_one();
//  test_two();
    printf("done\n");
    return 0;
}

Both tests were identical in terms of speed.

real    0m49.783s
user    0m48.023s
sys     0m0.224s

real    0m49.792s
user    0m49.275s
sys     0m0.192s

This me think, "Wow, threads suck". But, repeating the test on a university server with four processors close to quadrupled the speed.

real    0m7.800s
user    0m30.170s
sys     0m0.006s

real    0m30.190s
user    0m30.165s
sys     0m0.004s

Am I overlooking something when interpreting the results from my home machine?

+4  A: 

A prof once told us in class that Windows, Linux, OS X and UNIX scale on threads and not processes, so threads would likely benefit your application even on a single processor because your application would be getting more time on the CPU.

Not necessarily. If your app is the only CPU-intensive thing running, more threads won't magically make more CPU time available - all that will result is more CPU time wasted in context switches.

This me think, "Wow, threads suck". But, repeating the test on a university server with four processors close to quadrupled the speed.

That's because with four threads, it can use all four processors.

Anon.
+1. What you want to test is that, given that some other CPU-intensive process is running (say with only 1 thread) your test program gets more CPU time with 4 threads than with 1 on your home machine. The total time for both processes may increase overall; but (if your professor's claim is correct) the extra threads will bias the division of CPU time towards the process with more threads, so that it finishes faster.
RMorrisey
+15  A: 

In order to understand within the bowels of tasks/threads...lets look at this toy kernel code...

struct regs{
  int eax, ebx, ecx, edx, es, ds, gs, fs, cs, ip, flags;
  struct tss *task_sel;
}

struct thread{
    struct regs *regs;
    int parent_id;
    struct thread *next;
}
struct task{
   struct regs *regs;
   int *phys_mem_begin;
   int *phys_mem_end;
   int *filehandles;
   int priority;
   int *num_threads;
   int quantum;
   int duration;
   int start_time, end_time;
   int parent_id;
   struct thread *task_thread;
     /* ... */
   struct task *next;
}

Imagine the kernel allocates memory for that structure task, which is a linked-list, look closely at the quantum field, that is the timeslice of the processor-time based on the priority field. There will always be a task of id 0, which never sleeps, just idle, perhaps issuing nops (No OPerationS)...the scheduler spins around ad nauseum until infinity (that is when the power gets unplugged), if the quantum field determines the task runs for 20ms, sets the start_time and end_time + 20ms, when that end_time is up, the kernel saves the state of the cpu registers into a regs pointer. Goes onto the next task in the chain, loads the cpu registers from the pointer to regs and jumps into the instruction, sets the quantum and time duration, when the duration reaches zero, goes on to the next...effectively context-switching...this is what gives it an illusion that is running simultaneously on a single cpu.

Now look at the thread struct which is a linked-list of threads...within that task structure. The kernel allocates threads for that said task, sets up the cpu states for that thread and jumps into the threads...now the kernel has to manage the threads as well as the tasks themselves...again context switching between a task and thread...

Move on to a multi-cpu, the kernel would have been set up to be scalable, and what the scheduler would do, load one task onto one cpu, load another onto another cpu (dual core), and both jump into where the instruction pointer is pointing at...now the kernel is genuinely running both tasks simultaneously on both cpu's. Scale up to 4 way, same thing, additional tasks loaded onto each CPU, scale up again, to n-way...you get the drift.

As you can see the notion how the threads would not be perceived as scalable, as the kernel has quite frankly a mammoth job in keeping track of what cpu is running what, and on top of that, what task is running which threads, which fundamentally explains why I think threads are not exactly scalable...Threads consumes a lot of resources...

If you really want to see what is happening, take a look at the source code for Linux, specifically in the scheduler. No hang on, forget about the 2.6.x kernel releases, look at the prehistoric version 0.99, the scheduler would be more simpler to understand and easier to read, sure, its a bit old, but worth looking at, this will help you understand why and hopefully my answer also, in why threads are not scalable..and shows how the toy-os uses time division based on processes. I have strived to not to get into the technical aspects of modern-day cpu's that can do more then just what I have described...

Hope this helps, Best regards, Tom.

tommieb75
Whoever downvoted and not bothered to comment is against the spirit of SO!
tommieb75
+1  A: 

I'm not sure exactly what you're asking, but here is an answer which may help.

Under Linux, processes and threads are essentially exactly the same. The scheduler understands things called "tasks" which it doesn't really care whether they share address space or not. Sharing or not sharing things really depends on how they were created.

Whether to use threads or processes is a key design decision and should not be taken lightly, but performance of the scheduler is probably not a factor (of course things like IPC requirements will vary the design wildly)

MarkR