Does anyone know approximately what the minimum work size is needed in order for a goroutine to be beneficial (assuming that there are free cores for the work to be offloaded to)?
I've been plodding through project euler with Go. While I don't have a definite answer for you I found the goroutine-based primality sieve in the Go docs to be an order of magnitude slower than simply checking each number for primality. Setting GOMAXPROCS to a higher value didn't help, either.
goroutine is an abstraction that you use if it helps you model your application better. You're doing concurrency oriented programming, so think about the parts of your application that have concurrency within them.
Think about an OO system and imagine asking the same question about whether you should instantiate an object.
Do the thing that makes sense first.
goroutines are lightweight and don't take up much resources. You should use them where ever it is appropriate to the problem. Currently go doesn't seem to be exceptionally good at using multiple cores (it seems there is a bit too much overhead in allocating additional host threads.)
I think the real question is when to use multiple cores rather than when to use goroutines. The answer to that is probably the same as for other languages and additional host processes. (Unfortunately you can't easily specify when a goroutine should occupy a new host process or which process it should occupy.)
Using goroutines isn't just about hardware efficiency. Sometimes they make the software easier to write and make it easier to keep bugs out. The language allows the programmer to express concurrency naturally and simply. That's worth a lot to me.
My own experience with problems that are natural candidates for concurrency is that go easily allows me to max out all the available cores on CPU bound problems using a trivial "scatter/gather" approach. Your mileage may vary.
Hotei