You have your basic list of tests. You just provided them to us. At a bare minimum, you should test that every functional requirement has been met (for example, the four points you gave us in the question).
Adding to that are the edge cases, things like empty lists (on one and both sides), identical lists and so forth.
The simplest way to start is to add the following:
- empty list on both sides.
- identical lists.
- empty left list with a one-element right list to add.
- one-element left list with an empty right list to remove.
- previous two tests but with five-element lists on the one side.
- replacing of one element in a one-element left list.
- replacing of one element in a five-element left list.
- replacing of three elements in a five-element left list.
- checking that no replacements done on identical version tags.
and then add more as you strike individual problems.
And I cannot stress this enough: automate your testing! You will find that testing is a lot easier when you can just press a button and look over the results. Every time you strike a bug, add a test which would have caught that bug to the test suite above and press the button to verify it.
We have our testing down to a fine art. With one command, an entire process is put into place which blows away databases, loads them up with known data, runs our tests, compares the output with previous successful tests and so forth.
If we had to do all that manually whenever we made a change, we'd soon toss in the whole idea. By automating everything, testing is a breeze.