Sounds to me like you are struggling with backprop and what you describe above doesn't quite match how I understand it to work, and your description is a bit ambiguous.
You calculate the output error term to backpropagate as the diffrence between the prediction and the actual value multiplied by the derivative of the transfer function. It is that error value which you then propagate backwards. The derivative of a sigmoid is calculated quite simply as y(1-y) where y is your output value. There are lots of proofs of that available on the web.
For a node on the inner layer, you multiply that output error by the weight between the two nodes, and sum all those products as the total error from the outer layer being propagated to the node in the inner layer. The error associated with the inner node is then multiplied by the derivative of the transfer function applied to the original output value. Here's some pseudocode:
total_error = sum(output_errors * weights)
node_error = sigmoid_derivative(node_output) * total_error
This error is then propagated backwards in the same manner right back through the input layer weights.
The weights are adjusted using these error terms and the output values of the nodes
weight_change = outer_error * inner_output_value
the learning rate is important because the weight change is calculated for every pattern/row/observation in the input data. You want to moderate the weight change for each row so the weights don't get unduly changed by any single row and so that all rows have an effect on the weights. The learning rate gives you that and you adjust the weight change by multiplying by it
weight_change = outer_error * inner_output_value * learning_rate
It is also normal to remember these changes between epochs (iterations) and to add a fraction of it to the change. The fraction added is called momentum and is supposed to speed you up through regions of the error surface where there is not much change and slow you down where there is detail.
weight_change = (outer_error * inner_output_value * learning_rate) + (last_change * momentum)
There are algorithms for adjusting the learning rate and momentum as the training proceeds.
The weight is then updated by adding the change
new_weight = old_weight + weight_change
I had a look through your code, but rather than correct it and post that I thought it was better to describe back prop for you so you can code it up yourself. If you understand it you'll be able to tune it for your circumstances too.
HTH and good luck.