views:

42

answers:

1

Since distcc cannot keep states and just possible to send jobs and headers and let those servers to use only the data just sent and preprocess and compile, I think the lastest distcc has problem in scalability.
In my local build environment which has appx. 10,000 c/c++ files to build, I could only make 2 times faster than not using distcc (but using make -j) when having 20 build servers.
What do you think is the problem?

If anyone has achieved scalability more than 10 - 20 times using make -j and distcc, please let me know.

The following product claims that it is impossible to scale make -j and distcc faster than 5 times. http://www.electric-cloud.com/products/electricaccelerator.php

I think this can be improved by:

  • Letting the distccd server to maintain sessions
  • Tied to those sessions, they will cache their own header directories
  • Preprocess will be done demand base from the distccd server
  • This will be done through a LD_PRELOADed library libdistcc.so which will replace stat/open syscalls and fetches the header files over network. ...

Has anyone done this kind of thing?

A: 

Yes, distcc can scale up well above 5x.

We have to work out what the limiting factor is in your environment.

  1. One common problem is that your makefiles won't allow it to actually dispatch more than a couple of files at a time. You can just have a look at how many compiler processes are running. If this is the problem you may need to debug your makefiles to allow more parallelism.

  2. Perhaps many of the jobs the client is running can't be distributed for some reason. The distcc client log will tell you if this is the case.

  3. Perhaps for some reason the client is overloaded and not able to pass out jobs fast enough; however it's very likely you would get above 2 jobs before hitting this.

  4. Perhaps the servers are overloaded and can't accept any more jobs. But if you have 20 servers they should be able to take at least one each.

Thinking about hacks to keep sessions open is premature until you know that the limiting factor is starting the sessions.

It's probably #1 or #2. Post an excerpt from your log.

poolie