The first part of your algorithm is counting the characters - this is just generating keys to sort by.
If you know that you are only using alphabetical characers [A-Za-z ]* then you could optimize your algorithm by reducing the number of buckets used, but this is only a minor tweak.
The second part is just a stable sort - there are many ways of doing this - the wikipedia page on sorting gives a good summary. If you are only interested in the character that occurs least, then the ("Phase 2") method you describe is probably as efficient as you can get.
The only other way I can think of improving this is if you can split your letters in to a fixed number of buckets (say, 16) evenly across the character range, and then recursing on each bucket. Any buckets with no characters can be thrown away, saving some time in the scanning/sorting phase. Similarly if a bucket has one character then it is done. You also want to make sure that you only split a bucket into 16 more when you know that there is more than one different character in it.
Using the word test as an example (assuming 4 buckets and only lower case characters:
- generate 4 buckets (A-G, H-M, N-T, U-Z)
- split the word test:
- A-G: e,
- H-M:
- N-T: tst
- U-Z:
- recurse to the other buckets - (A-G has one character - this must be the least so we can stop
- If this wasn't the case (As for the word "testes"), we can see H-M and U-Z are empty, so we would only need to check N-T (which would contain tsts).
- We create 4 buckets (N-O, P-Q, R-S and T).
- Split up the letters
- etc.
The advantage of this method is that we have not had to scan every letter. If the character range is the same size, then both these methods are at best O(n) where n is the length of the string (this is inevitable since we always have to look at every character), although constructing the lists of characters in my example may make the algorithm as bad as O(n^2). However as the character range increases, especially for short strings, using sub buckets will greatly increase the performance. For a unicode string you could use a hybrid approach - for example seperating all the non-ascii characters in the first phase, and using your simpler method for the ascii portion.