The following assumes the sets are stored as a sorted container (as std::set does).
There's a common algorithm for merging two ordered lists to produce a third. The idea is that when you look at the heads of the two lists, you can determine which is the lower, extract that, and add it to the tail of the output, then repeat.
There are variants which detect the case where the two heads are equal, and treat this specially. Set intersections and unions are examples of this.
With a set asymmetric difference, the key point is that for A-B, when you extract the head of B, you discard it. When you extract the head of A, you add it to the input unless the head of B is equal, in which case you extract that too and discard both.
Although this approach is designed for sequential-access data structures (and tape storage etc), it's sometimes very useful to do the same thing for a random-access data structure so long as it's reasonably efficient to access it sequentially anyway. And you don't necessarily have to extract things for real - you can do copying and step instead.
The key point is that you step through the inputs sequentially, always looking at the lowest remaining value next, so that (if the inputs have no duplicates) you will the matched items. You therefore always know whether your next lowest value to handle is an item from A with no match in B, and item in B with no match in A, or an item that's equal in both A and B.
More generally, the algorithm for the set difference depends on the representation of the set. For example, if the set is represented as a bit-vector, the above would be overcomplex and slow - you'd just loop through the vectors doing bitwise operations. If the set is represented as a hashtable (as in the tr1 unordered_set) the above is wrong as it requires ordered inputs.
If you have your own binary tree code that you're using for the sets, one good option is to convert both trees into linked lists, work on the lists, then convert the resulting list to a perfectly balanced tree. The linked-list set-difference is very simple, and the two conversions are re-usable for other similar operations.
EDIT
On the complexity - using these ordered merge-like algorithms is O(n) provided you can do the in-order traversals in O(n). Converting to a list and back is also O(n) as each of the three steps is O(n) - tree-to-list, set-difference and list-to-tree.
Tree-to-list basically does a depth-first traversal, deconstructing the tree as it goes. Theres a trick for making this iterative, storing the "stack" in part-handled nodes - changing a left-child pointer into a parent-pointer just before you step to the left child. This is a good idea if the tree may be large and unbalanced.
Converting a list to a tree basically involves a depth-first traversal of an imaginary tree (based on the size, known from the start) building it for real as you go. If a tree has 5 nodes, for instance, you can say that the root will be node 3. You recurse to build a two-node left subtree, then grab the next item from the list for that root, then recurse to build a two-node right subtree.
The list-to-tree conversion shouldn't need to be implemented iteratively - recursive is fine as the result is always perfectly balanced. If you can't handle the log n recursion depth, you almost certainly can't handle the full tree anyway.