The first thing to realize is that, as far as the numbers go, there is no difference between soldiers lost and soldiers left behind. So we can reduce the castle properties to soldiers lost and required.
The second thing to realize is that if you go down a branch of the tree, you must complete the whole branch for returning. This allows us to reduce the entire branch to a single "mega castle" with aggregate soldiers required and lost.
So, assuming we can compute the costs of branches, we're left with two problems: where to start, and how to choose which branch to descend first. I'm just going to brute force the start position, but it might be possible to do better. Choosing which branch to descend is a bit harder. The number of soldiers of lost is trivial, but the number required is not. There are n! possibilities, so we can't just try them all.
Instead of thinking about how many soldiers are lost/required at each castle, I'm going to go backwards. Start with 0 soldiers, and add them when you attack a castle, ensuring we end up with at least the required amount. There are two cases: either there is a castle which we meet the requirement for, or there is not. If there is, (un)do that castle (this is optimal, because we used the minimum number of soldiers). If there isn't, add an additional soldier and try again (this is optimal, because we must add a soldier to continue). Now it should become obvious: we want to (un)do castle with requirements closest to the number lost first. Just sort by (required minus lost) and that's your order.
So the final algorithm looks like this:
- Brute force the starting point
- Recursively reduce branches into aggregate castles (memoize this result, for the other starting points)
- Visit branches in descending (required minus lost) order.
The running time is O(n * c^2 * lg(c)), where n is the number of castles and c is the maximum connectivity of any single castle. This worse because there are at most n*c 'branches', and a node takes at most c*lg(c) time to evaluate after its branches have been evaluated. [The branches and nodes are computed at most once thanks to memoization]
I think it's possible to do better, but I'm not sure how.