A lot of this is going to depend on what type of data you have. You mention sorting, so I take it the elements are comparable. With sets of size m
and n
, this will take O(m lg m + n lg n)
to sort, and that will dominate. (Asymptotically, it won't matter if you do binary search or walk both lists. Walking both lists should be O( m + n)
.) Of course, if you are using data with a better sort algorithm available, such as integers with radix-sort, you should be able to get down to O( m + n)
.
Using sets (as others are suggesting) implicitly suggests using hashing, which will definitely make your problem easier. If you hash all the elements in A ( O(m)
) and store all the hashes in a hash set in memory, then hash B ( O(n)
) and detect where collisions may occur in the hash set. This becomes a matter for optimization: you have to evaluate a classic speed-memory trade-off. The larger your hash set, the quicker the collision-checks will be. This will run in O( m + n )
.
It's worth noting that you can prove that any algorithm that does what you ask will run in at least m + n
time, since all the inputs need to be looked at.