views:

337

answers:

7

I found out about this question recently.

Given a N number range Eg. [1 to 100], sort the numbers in digit order (i.e) For the numbers 1 to 100, the sorted output wound be 1 10 100 11 12 13 . . . 19 2 20 21..... 99

This is just like Radix Sort but just that the digits are sorted in reversed order to what would be done in a normal Radix Sort.

I tried to store all the digits in each number as a linked list for faster operation but it results in a large Space Complexity.

I need a working algorithm for the question. Thanks.

From all the answers, "Converting to Strings" is an option, But is there no other way this can be done??? Also An algorithm for Sorting Strings as mentioned above can also be given. Thanks again.

+11  A: 

Use any sorting algorithm you like, but compare the numbers as strings, not as numbers. This is basically lexiographic sorting of regular numbers. Here's an example gnome sort in C:

#include <stdlib.h>
#include <string.h>

void sort(int* array, int length) {
    int* iter = array;
    char buf1[12], buf2[12];
    while(iter++ < array+length) {
        if(iter == array || (strcmp(itoa(*iter, &buf1, 10), itoa(*(iter-1), &buf2, 10) >= 0) {
            iter++;
        } else {
            *iter ^= *(iter+1);
            *(iter+1) ^= *iter;
            *iter ^= *(iter+1);
            iter--;
        }
    }
}

Of course, this requires the non-standard itoa function to be present in stdlib.h. A more standard alternative would be to use sprintf, but that makes the code a little more cluttered. You'd possibly be better off converting the whole array to strings first, then sort, then convert it back.

Edit: For reference, the relevant bit here is strcmp(itoa(*iter, &buf1, 10), itoa(*(iter-1), &buf2, 10) >= 0, which replaces *iter >= *(iter-1).

You
+1 Thats it, lexicographical ordering.
AraK
Can anyone give an algorithm for it??? And also, is there no other way this can be done other than converting to strings???
Shyam
You could compare the numbers digit-by-digit as well, but that's pretty tedious.
You
That wa what I was trying to do all this time!! :-)
Shyam
`itoa` doesn't act the way you use it here.
Steve Jessop
Right, forgot the `base` argument. Fixed.
You
No, it doesn't return a new string. You have to pass in a buffer, and it writes to that buffer and returns it. So declare a couple of char arrays as local variables, then `strcmp(itoa(iter[0],bufone,10), itoa(iter[-1],buftwo,10))`.
Steve Jessop
True, was looking over http://www.cplusplus.com/reference/clibrary/cstdlib/itoa/ a bit to quickly. Fixed.
You
+2  A: 

i think if you convert numbers to string, you can use string comparison to sort them. you can use anny sorting alghorighm for it.

"1" < "10" < "100" < "11" ...

mohammad shamsi
is there no other way this can be done other than converting to strings???
Shyam
+4  A: 

I have a solution but not exactly an algorithm.. All you need to do is converts all the numbers to strings & sort them as strings..

Shady M. Najib
is there no other way this can be done other than converting to strings???
Shyam
I guess "You"'s solution above is the best you can get.. if not may be be you need to do the same using your own own way of storing your numbers, bun you shouldn't keep them as integers, may be you better keep them as integer arrays.. for eg 100 will be save in a int[3] as {1,0,0}All the answers hear seem reasonable (didn't have the time to read them fully.. but "overriding" comparison operator, if your PL supports that, would be more readable (I'm talking about your code not your algorithm here)
Shady M. Najib
So you'll still sort them as strings but you'd implement the string sorting (for eg Radix) yourself
Shady M. Najib
+1  A: 

Edit: I missed that it's a contiguous range. That being the case, all the answers which talk about sorting an array are wrong (including your idea stated in the question that it's like a radix sort), and True Soft's answer is right.

just like Radix Sort but just that the digits are sorted in reversed order

Well spotted :-) If you actually do it that way, funnily enough, it's called an MSD radix sort.

http://en.wikipedia.org/wiki/Radix_sort#Most_significant_digit_radix_sorts

You can implement one very simply, or with a lot of high technology and fanfare. In most programming languages, your particular example faces a slight difficulty. Extracting decimal digits from the natural storage format of an integer, isn't an especially fast operation. You can ignore this and see how long it ends up taking (recommended), or you can add yet more fanfare by converting all the numbers to decimal strings before sorting.

Of course you don't have to implement it as a radix sort: you could use a comparison sort algorithm with an appropriate comparator. For example in C, the following is suitable for use with qsort (unless I've messed it up):

int lex_compare(void *a, void *b) {
    char a_str[12];  // assuming 32bit int
    char b_str[12];
    sprintf(a_str, "%d", *(int*)a);
    sprintf(b_str, "%d", *(int*)b);
    return strcmp(a_str,b_str);
}

Not terribly efficient, since it does a lot of repeated work, but straightforward.

Steve Jessop
Extracting the Digits and suitably organizing them for Searching is a problem. Thats where I tried to Use Linked Lists to store each and every digit in a number and then use them for searching because instead of calling a function to get the digit for comparing each time, I thought that would be simpler... Can U suggest an efficient way for this to be done.
Shyam
I wouldn't use linked lists of digits - too many problems with memory use, extra indirections, and non-locality of reference. What programming language are you using? Just storing them as strings should be pretty good. But actually, extracting a specific digit is just a modulus and a division, so if you already know radix sort, by all means just look at that Wikipedia article and modify what you've done before very slightly. Performance will not be bad, because for each number, you only have to pick each digit out of it once in a radix sort.
Steve Jessop
I am using C... But the tought of Converting to Strings didnt even cross my mind!!
Shyam
yes, but for each and every time we need to perform the modulus and division to extract the character. I thought it wud add to the complexity since each of the numbers can be of varied length(no. of digits) and so decided to go wit linked lists..
Shyam
Well, assuming we're talking about `int` here rather than arbitrary-precision integers, to get the "first" digit of a number, you first get the order of magnitude (perhaps with a set of comparisons), in order to work out whether the first digit is the thousands digit, 100k digit, or whatever. Then you do the mod and division. That's not an enormous amount of work, and it certainly doesn't add to the complexity since it's all constant-time. Linked lists aren't usually fast. You could easily find that a single memory allocation takes longer than extracting all 12 digits one at a time.
Steve Jessop
Also bear in mind that as long as the basic algorithm is sound, a bit of arithmetic per item will rarely make your program run so slow that you'll notice. Radix sort is O(n), so the algorithm is sound. Write the simplest thing that works, test it, and if it's too slow change it. Personally, I would first write the code to use qsort as above, and only bother changing if I was unhappy with the resulting speed. Which may be quite likely in this case, but there's no point spending more time worrying about it in advance, than I would spend re-writing it if the first effort's no good.
Steve Jessop
I think U r correct.. I should stop thinking about accessing individual digits and using linked lists instead. Thank U Steve Jessop...
Shyam
@Steve: "you first get the order of magnitude (perhaps with a set of comparisons)" — `log10` would be a better choice there.
You
@Steve.. y r all the other solutions wrong??? they work fine for me!!
Shyam
@You: maybe, but it deals in doubles. This introduces a set of uncertainties that I don't want to deal with in an integer problem - for example it's legal for `log10(1000)` to return `2.99999999`. Why bother handling that kind of nonsense? @Shyam: well, perhaps "wrong" is overstating the case. But you said you're worried about memory usage, and True Soft provides a solution which outputs the correct results as a stream. If you don't need the result as an array, it therefore offers the possibility of O(1) memory use.
Steve Jessop
+3  A: 

Here is how you can do it with a recursive function (the code is in Java):

void doOperation(List<Integer> list, int prefix, int minimum, int maximum) {
    for (int i = 0; i <= 9; i++) {
        int newNumber = prefix * 10 + i;
        if (newNumber >= minimum && newNumber <= maximum) {
            list.add(newNumber);
        }
        if (newNumber > 0 && newNumber <= maximum) {
            doOperation(list, newNumber, minimum, maximum);
        }
    }
}

You call it like this:

List<Integer> numberList = new ArrayList<Integer>();
int min=1, max =100;
doOperation(numberList, 0, min, max);
System.out.println(numberList.toString());

EDIT:

I translated my code in C++ here:

#include <stdio.h> 

void doOperation(int list[], int &index, int prefix, int minimum, int maximum) {
    for (int i = 0; i <= 9; i++) {
        int newNumber = prefix * 10 + i;
        if (newNumber >= minimum && newNumber <= maximum) {
            list[index++] = newNumber;
        }
        if (newNumber > 0 && newNumber <= maximum) {
            doOperation(list, index, newNumber, minimum, maximum);
        }
    }
}

int main(void) { 
        int min=1, max =100;
        int* numberList = new int[max-min+1];
        int index = 0;
        doOperation(numberList, index, 0, min, max);
        printf("["); 
        for(int i=0; i<max-min+1; i++) {
                printf("%d ", numberList[i]); 
        }
        printf("]"); 
        return 0; 
}

Basically, the idea is: for each digit (0-9), I add it to the array if it is between minimum and maximum. Then, I call the same function with this digit as prefix. It does the same: for each digit, it adds it to the prefix (prefix * 10 + i) and if it is between the limits, it adds it to the array. It stops when newNumber is greater than maximum.

True Soft
+1 Good point. I missed that it's a contiguous range. Your way potentially uses a lot less memory in cases where you can replace "list.add" with "System.out.println", or any other op which means you don't need the whole list at once.
Steve Jessop
Yes, there is no initial list of values. To write the algorithm in C, the OP could replace the List with an array of ints, and add the current index as a parameter to the function.
True Soft
A better initial value for `prefix` might be `min/10` rather than `0`.
J.F. Sebastian
True Soft... I am sorry but I am not able to comprehend the concept behind this. I would be very thankful if u can clearly explain the logic behind this. Sorry and Thanks in Advance...
Shyam
@J.F. Sebastian: It's not working if I set `prefix` to `min/10`, because, for `[40, 100]` the first number in array would be 100, which is done in the second step of the loop (at `i`=1); if the initial value of `prefix` would be 4, it would look through 4, 40, 400, then 5, 50, 500... and it will never get to 100.
True Soft
Thanks a lot True Soft.... U have given a very efficient Algorithm!!
Shyam
+1  A: 

If you do not want to convert them to strings, but have enough space to store an extra copy of the list I would store the largest power of ten less than the element in the copy. This is probably easiest to do with a loop. Now call your original array x and the powers of ten y.

int findPower(int x) {
   int y = 1;
   while (y * 10 < x) {
      y = y * 10;
   }
   return y;
}

You could also compute them directly

y = exp10(floor(log10(x)));

but I suspect that the iteration may be faster than the conversions to and from floating point.

In order to compare the ith and jth elements

bool compare(int i, int j) {
  if (y[i] < y[j]) {
    int ti = x[i] * (y[j] / y[i]);
    if (ti == x[j]) {
      return (y[i] < y[j]);  // the compiler will optimize this
    } else {
      return (ti < x[j]);
    }
  } else if (y[i] > y[j]) {
    int tj = x[j] * (y[i] / y[j]);
    if (x[i] == tj) {
      return (y[i] < y[j]);  // the compiler will optimize this
    } else {
      return (x[i] < tj);
    }
  } else {
     return (x[i] < x[j];
  }
}

What is being done here is we are multiplying the smaller number by the appropriate power of ten to make the two numbers have an equal number of digits, then comparing them. if the two modified numbers are equal, then compare the digit lengths.

If you do not have the space to store the y arrays you can compute them on each comparison.

In general, you are likely better off using the preoptimized digit conversion routines.

deinst
+2  A: 

Optimize the way you are storing the numbers: use a binary-coded decimal (BCD) type that gives simple access to a specific digit. Then you can use your current algorithm, which Steve Jessop correctly identified as most significant digit radix sort.

I tried to store all the digits in each number as a linked list for faster operation but it results in a large Space Complexity.

Storing each digit in a linked list wastes space in two different ways:

  1. A digit (0-9) only requires 4 bits of memory to store, but you are probably using anywhere from 8 to 64 bits. A char or short type takes 8 bits, and an int can take up to 64 bits. That's using 2X to 16X more memory than the optimal solution!
  2. Linked lists add additional unneeded memory overhead. For each digit, you need an additional 32 to 64 bits to store the memory address of the next link. Again, this increases the memory required per digit by 8X to 16X.

A more memory-efficient solution stores BCD digits contiguously in memory:

  1. BCD only uses 4 bits per digit.
  2. Store the digits in a contiguous memory block, like an array. This eliminates the need to store memory addresses. You don't need linked lists' ability to easily insert/delete from the middle. If you need the ability to grow the numbers to an unknown length, there are other abstract data types that allow that with much less overhead. For example, a vector.

One option, if other operations like addition/multiplication are not important, is to allocate enough memory to store each BCD digit plus one BCD terminator. The BCD terminator can be any combination of 4 bits that is not used to represent a BCD digit (like binary 1111). Storing this way will make other operations like addition and multiplication trickier, though.

Note this is very similar to the idea of converting to strings and lexicographically sorting those strings. Integers are internally stored as binary (base 2) in the computer. Storing in BCD is more like base 10 (base 16, actually, but 6 combinations are ignored), and strings are like base 256. Strings will use about twice as much memory, but there are already efficient functions written to sort strings. BCD's will probably require developing a custom BCD type for your needs.

Leftium
Wonsungi.. Thank U very much identifying the drawbacks in my idea. I ll probably use strings to solve the problem...
Shyam