views:

61

answers:

1

I am creating a tool for predicting the time and cost of software projects based on past data. The tool uses a neural network to do this and so far, the results are promising, but I think I can do a lot more optimisation just by changing the properties of the network. There don't seem to be any rules or even many best-practices when it comes to these settings so if anyone with experience could help me I would greatly appreciate it.

The input data is made up of a series of integers that could go up as high as the user wants to go, but most will be under 100,000 I would have thought. Some will be as low as 1. They are details like number of people on a project and the cost of a project, as well as details about database entities and use cases.

There are 10 inputs in total and 2 outputs (the time and cost). I am using Resilient Propagation to train the network. Currently it has: 10 input nodes, 1 hidden layer with 5 nodes and 2 output nodes. I am training to get under a 5% error rate.

The algorithm must run on a webserver so I have put in a measure to stop training when it looks like it isn't going anywhere. This is set to 10,000 training iterations.

Currently, when I try to train it with some data that is a bit varied, but well within the limits of what we expect users to put into it, it takes a long time to train, hitting the 10,000 iteration limit over and over again.

This is the first time I have used a neural network and I don't really know what to expect. If you could give me some hints on what sort of settings I should be using for the network and for the iteration limit I would greatly appreciate it.

Thank you!

+1  A: 

First of all, thanks for providing so much information about your network! Here are a few pointers that should give you a clearer picture.

  • You need to normalize your inputs. If one node sees a mean value of 100,000 and another just 0.5, you won't see an equal impact from the two inputs. Which is why you'll need to normalize them.
  • Only 5 hidden neurons for 10 input nodes? I remember reading somewhere that you need at least double the number of inputs; try 20+ hidden neurons. This will provide your neural network model the capability to develop a more complex model. However, too many neurons and your network will just memorize the training data set.
  • Resilient backpropagation is fine. Just remember that there are other training algorithms out there like Levenberg-Marquardt.
  • How many training sets do you have? Neural networks usually need a large dataset to be good at making useful predictions.
  • Consider adding a momentum factor to your weight-training algorithm to speed things up if you haven't done so already.
  • Online training tends to be better for making generalized predictions than batch training. The former updates weights after running every training set through the network, while the latter updates the network after passing every data set through. It's your call.
  • Is your data discrete or continuous? Neural networks tend to do a better job with 0s and 1s than continuous functions. If it is the former, I'd recommend using the sigmoid activation function. A combination of tanh and linear activation functions for the hidden and output layers tend to do a good job with continuously-varying data.
  • Do you need another hidden layer? It may help if your network is dealing with complex input-output surface mapping.
Zaid
Thanks for all the info! 1. I thought about normalising the inputs, but I don't know how to do that when a value has no maximum and some values can differ so much.2. I tried with 20 hidden neurons and it was taking 5+ hours instead of seconds for some data. It seemed like over 10 was not good.4. The number of datasets depends on how many the user puts in. We have been using around 5.5. We are using a neural network framework called Encog, I will look into changing the momentum, but I don't know if it is possible.
danpalmer
6. Again, I don't know how the framework deals with online training, I will look into this.7. The data is all continuous and I have set Linear activations on everything because I would imagine that is how the results should vary with output.8. Like point 2, I found that with 2 layers it was hitting my imposed limit a lot more and taking a LOT longer to run.
danpalmer
Lol. I should be in bed too, College tomorrow. I have added a bit of normalisation. Its not great, but all dataset values will be between 0 and 1 and predictions wont be that much higher. I got an increase in training speed of 18000% on one thing. I will keep on testing and try implementing some other measures, but for now its a great result. Thank you very much for your help!
danpalmer
@danpalmer : (1), (7) It may be easy to rationalize in your head why you should use the linear activation function, but I doubt you'll get good results using it. Try sigmoid and you may get surprised! It should also sort out your issue with the neurons that have no fixed maximum (`f(100)` and `f(100,000)` are essentially the same!). (4) Are you sure you can't get any more datasets? Sometimes networks don't get sufficient training with 100s or 1,000s of datasets!
Zaid
I will stick to linear activation for the moment and see how it goes. I might experiment with Sigmoid activation in the future. For the moment, our results are looking great. Predictions are exactly what we are expecting them to be (or within less than 1%). We are now normalising all of the data to between 0 and 1 and after doing this we saw it train 18000 times faster. I didn't think this would be the case, but it has really solved the issues and we will be fine until we get a considerable number of users. Thank you for your help! If you would like to have a go with the site, message me.
danpalmer
@danpalmer : Time is the fire in which we all burn... I doubt I'll have time to try it out. Thanks for the offer though!
Zaid