A: 

Sorry, if this is total bogus, but I am pretty sure that Java doesn't do green threads anymore since Java 1.1. At least Wikipedia says so, too.

This would limit you to use priorities - but in most of the cases I couldn't achieve observable performance improvements either.

Marcel J.
No, I agree - I didn't make any progress that way! Is there any alternative to priorities?
Paul Morrison
+2  A: 

Green threads are gone (maybe Solaris supports it still but I doubt that). Additionally Java does not switch threads, the OS does that. The only thing Java does is signalling to the OS, that a thread is idle/waits/blocks by using OS functions. So if your program hits any synchronisation points, does Thread.wait/sleep, it will signal, that it does not need the cpu anymore.

Besides that, the OS maintains time slices and will take away the cpu from a thread, even so it could still run, when other threads wait for the cpu.

Can you publish some more code here?

ReneS
Just forgot to mention, memory acquisition is always a moment, when the OS might push your thread off the cpu...
ReneS
Is there any way to increase the length of OS time slices?
Paul Morrison
Yes, depends on your OS and what mode it is running in. For instance Ubuntu server is running other slices than Ubuntu Desktop. (ask at severfault.com). But try to bind your process to one cpu first and check if it is running better.
ReneS
A: 

Hi ReneS, thanks for asking! I have been discussing the same question on the Sun forum, and here is my last post on that forum:

Our best guess right now is that this effect results from Windows' scheduling logic.

Microsoft seems to be acknowledging that this area needs some improvement as it is introducing UMS - I quote: "UMS is recommended for applications with high performance requirements that need to efficiently run many threads concurrently on multiprocessor or multicore systems. ... UMS is available starting with 64-bit versions of Windows 7 and Windows Server 2008 R2. This feature is not available on 32-bit versions of Windows." Hopefully, Java will take advantage of UMS in some later release.

Thanks for your help!

Paul Morrison
A: 

The plot thickens! I thought of editing my previous answer, but thought that would be confusing.

As ReneS suggested, we ran a test on Linux, and got the same result! The behavior seems to be exactly the same as on Windows: no significant difference between small and big buffer sizes. This means that it is either a Java problem, or Windows and Linux use the same algorithm. But I also saw the same behaviour with C#FBP.

However, when I tried the same test on my laptop, with only 1 processor, I got a fairly consistent difference depending on the size of the connection. With size = 10, it took 6.6 secs.; with size=100, it took 5.8 secs.- so that is a 12% difference. Which is not to be sneezed at!

My gut feeling is that some piece of software is having trouble allocating work among multiple processors. Clearly, this is not a simple task - but FBP networks already balance work, so whatever software is doing it is second-guessing FBP. Does anyone out there have any idea what could be doing it? Thanks in advance.

Paul Morrison
A: 

I'm a bit embarrassed - it suddenly occurred to me this afternoon that maybe the network whose performance I was worried about was just too simple, as I only had two process**es**, and two process**ors**. So Windows may have been trying too hard to keep the processors balanced! So I wondered what would happen if I gave Windows lots of processes.

I set up two networks:

a) 50 Generate components feeding 50 Discard components - i.e. highly parallel network - so that's 100 threads in total

b) 50 Generate components feeding 1 Discard component - i.e. highly "funnelled" network - so that's 51 threads

I ran each one 6 times with a connection capacity of 10, and 6 times with a connection capacity of 100. Every run generated a total of 50 * 20,000 information packets, for a total of 1,000,000 packets, and ran for about 1 minute..

Here are the averages of the 4 cases: a) with connection capacity of 10 - 59.151 secs. a) with connection capacity of 100 - 52.008 secs.

b) with connection capacity of 10 - 76.745 secs. b) with connection capacity of 100 - 60.667 secs.

So it looks like the connection capacity does make a difference! And, it looks like JavaFBP performs reasonably well... I apologize for being a bit hasty - but maybe it made us all think a bit more deeply about multithreading in a multicore machine... ;-)

Apologies again, and thanks to everyone who contributed thoughts on this topic!

Paul Morrison