One suggestion I have is in relation to Hadoop. I think this is in a similar vein to the answer by Eric J.
When you submit a job, ie MapReduce job, to Hadoop its standard response is to queue up that job and submit it as resources on the cluster become available. This is an area of research and development that is quite active as simple scheduling systems like this are not always going to meet the requirements of the users.
For example, your cluster might have a number of critical jobs which need to be run at certain times, but you also use the cluster for ad-hoc analysis of the data. How does the scheduler deal with situations like this? A FIFO type queue might mean that your critical jobs don't get run due to their position in the FIFO, as less important jobs are using the clusters resources.
Two organizations who have found this to be a problem, and thus contributed their own schedulers are:
Also on this subject are:
This isn't a simple problem to solve and I doubt any solution would ever meet every requirement, but I'd say there is a lot of scope for different approaches. The Facebook approach is actually based on the Completely Fair Scheduler in Linux for example.
However, you would have a number of different schedulers to work from, as its easy to get the code and thus see how they work.
Just a thought, hope it helps.