I've had an idea for a non-linear regression algorithm that I haven't seen before:
We fit a simple parametric function, such as a radial basis function, to the data using gradient descent. We find the residual from this and then fit a function to this, repeating this process to reduce the error and build up a collection of superimposed functions. (I'm assuming that the search can be persuaded to find the functions that fit the most points first)
As presented this algorithm would overfit. I think there are a few ways to overcome that but probably the most obvious is to limit the number of functions that are fitted.
I think it should be faster than a neural or rbf network because it doesn't have to tune so many parameters at once. There is no network architecture to choose. It should be more accurate than a decision tree algorithm such as M5 because it could follow a continuous curve more closely and doesn't have to choose attributes on which to split.
Has it been tried before? If so, why wasn't it successful?