ansaurus

Question

Multithreading not faster than single thread (simple loop test)

Answer 1

+1 A:

You don't do anything with i, so your loop is probably just optimised away.

Adrian Mouat 2010-09-29 10:26:12

Actually, I printed the value of i at the bottom (but it's not shown in the code).

RemiX 2010-09-29 11:13:35

The times are consistent with it being optimised, but not optimised away. I'd like to see the test repeated (without restarting the process). one issue threads can have in this context is that HotSpot runs in a different thread, and the additional thread may end up running the unoptimised code for some time.

Tom Hawtin - tackline 2010-09-29 22:21:57

Another Thread doing exactly the same as t2 (only then 10000x10000) is completed in 107 ms (faster than t1 and t2 together), or isn't that what you meant?

RemiX 2010-09-30 10:46:53

Answer 2

+2 A:

I'm not at all surprised at the difference. You are using Java's concurrency framework to create your threads (although I don't see any guarantee that two threads are even created since the first job might complete before the second even starts.

There's probably all sorts of locking and synchronisation going on behind the scenes which you don't actually need for your simple test. In short I do think the problem is the overhead of multithreading.

JeremyP 2010-09-29 10:34:10

I also tested it with just two Threads and using thread1.start(), showing the same result. Also, one Runnable in the ExecutorService works very quickly and finally, another machine works fine with this code.

RemiX 2010-09-29 11:15:10

Answer 3

+4 A:

Try increasing the size of the array somewhat. No, really.

Small objects allocated sequentially in the same thread will tend to be initially allocated sequentially. That's probably in the same cache line. If you have two cores access the same cache line (and then micro-benhcmark is essentially just doing a sequence of writes to the same address) then they will have to fight for access.

There's a class in java.util.concurrent that has a bunch of unused long fields. Their purpose is to separate objects that may be frequently used by different threads into different cache lines.

Tom Hawtin - tackline 2010-09-29 11:07:49

I'm using a different array for each Thread, so I don't think they have to fight for access... or did I misunderstand?

RemiX 2010-09-29 11:17:06

@RemiX: they're both allocated on the heap, i2 is allocated right after i1. There's a pretty high probability of them ending up in the same cacheline.

snemarch 2010-09-29 11:38:40

+1 - 2200 ms to 280 ms by just increasing the size of the arrays to 10. Unfortunately, using your other suggestions the effect isn't that great anymore. Good to remember, though.

RemiX 2010-09-30 10:40:01

Answer 4

+9 A:

You definitely don't want to keep polling Thread.isAlive() - this burns a lot of CPU cycles for no good reason. Use Thread.join() instead.

Also, it's probably not a good idea having the threads increment the result arrays directly, cache lines and all. Update local variables, and do a single store when the computations are done.

EDIT:

Totally overlooked that you're using a Pentium 4. As far as I know, there's no multi-core versions of the P4 - to give the illusion of multicore, it has Hyper-Threading: two logical cores share the execution units of one physical core. If your threads depend on the same execution units, your performance will be the same as (or worse than!) single-threaded performance. You'd need, for instance, floating-point calculations in one thread and integer calcs in another to gain performance improvements.

The P4 HT implementation has been criticized a lot, newer implementations (recent core2) should be better.

snemarch 2010-09-29 11:35:14

+1 - The first paragraph is probably where most of the difference is.

Stephen C 2010-09-29 11:59:22

+1 - Actually, both suggestions speed up the process significantly, thanks. But there's something strange: using Thread.isAlive() in combination with incrementing arrays directly, is faster (800 ms) than using Thread.join() (2200 ms), but using isAlive() in combination with your second suggestion, is slower (190 ms) than join() (114 ms). Anyway, using both your suggestions speeds the system up from 2200 ms to 114 :D. However, your second suggestion also speeds up the single thread to about 110 ms so now there's just no difference yet.

RemiX 2010-09-30 10:35:39

A difference of less than 10ms doesn't really tell you anything when running on a multitasking OS - you'll need to increase the iterations to measure speed difference more reliably :)

snemarch 2010-09-30 10:53:54

I know, that's why I said there's no difference. But I'll look into the Pentium 4 issue. So what you're saying is that, even though dxdiag says there are 2 cores, it actually has one physical core and it cannot really speed up by multithreading? Sounds like a good explanation why there was in fact a speed increase on that other machine.

RemiX 2010-09-30 11:25:16

Yes, unless there's a P4 version I don't know about, all you get is HyperThreading. The normal APIs don't report logical cores any differently from physical ones. If you want to detect HT, you'll need to go pretty specific - CPUID x86 instruction or the win32 NUMA APIs.

snemarch 2010-09-30 11:45:06

Answer 5

+1 A:

Have you checked the number of available cores on your PC with Runtime.getRuntime().availableProcessors() ?

Damien 2010-09-29 11:35:27

Just did, and it says 2 processors. Also, I can see them working in the Task Manager.

RemiX 2010-09-30 10:42:19

ansaurus

tags:

views:

answers:

Multithreading not faster than single thread (simple loop test)

EDIT:

related questions