views:

188

answers:

3

hi, i googled around and see lots of discussion about radix sort on binary string, but they are all with same lenght, how aobut binary string with arbitrary lenght?

say i have {"001", "10101", "011010", "10", "111"}, how do i do radix sort on them ? Thanks!

+2  A: 

Find the max length and pad them all to that length. Should still perform well provided there's some upper bound on the length of the longest string.

Kyle Butt
In principle the same thing, but... convert the strings to integers?
Steve314
+2  A: 

You could pad them all to be the same length, but there's no real reason to run a sorting algorithm to determine that a length 5 number in binary is larger than a length 2 one. You would likely get better performance by grouping the numbers by length and running your radix sort within each group. Of course, that's dependent upon how you group them and then on how you sort your groups.

An example of how you might do this would be to run through all the items once and throw them all into a hash table (length --> numbers of that length). This takes linear time, and then let's say nlogn time to access them in order. A radix sort runs in O(nk) time where n is the number of items and k is their average length. If you've got a large k, then the difference between O(nk) and O(nlogn) would be acceptable.

karenc
Not bad but...Wouldn't re-grouping them require a pre-sort operation to sort all of the strings into the proper group?
FrustratedWithFormsDesigner
Yes, and for a small k it may not be worth it. Radix sort is a "linear" time sort algorithm if you assume k is a constant or at least small. But for a big k the pre-sort would be worthwhile. And there's probably a better way to do the pre-sort than what I suggested above, but that was the first reasonable way that came to mind.
karenc
A: 

If creating a ton of new string instances leaves a nasty taste, write the comparison yourself.

Compare what the lengths of the strings would be without the leading 0's (ie. find the firstIndexOf("1")); the longer string is larger.
If both are the same length, just continue comparing them, character-by-character, until you find two characters that differ - the string with the "1" is the larger.

BlueRaja - Danny Pflughoeft