views:

255

answers:

7

Given two numbers a, b such that 1 <= a , b <= 10000000000 (10^10). My problem is to check whether the digits in them are permutation of each other or not. What is the fastest way of doing it? I was thinks of using hashing but unable to find any suitable hash function. Any suggestions?

For e.g - 123 is a valid permutation of 312

Also I don't want to sort the digits in the numbers.

+4  A: 

Is it homework?

Calculate number of appearances of each digit and compare them, if they are same then one number can be converted to other using permutation.

Artyom
no it is not .... i am a software developer and trying to learn new things.
Ravi Gupta
Artyom: sounds more like an appetizer problem of a programming contest like ICPC.
Joey
+17  A: 

If you mean the characters of the numbers (such as 1927 and 9721), one approach is to simply sprintf them to two buffers, sort the characters in the buffers, then see if the strings are equal.


Another alternative (that matches your new non-sorting requirement) would be to set up a ten-element array, with all elements initially set to zero, then process each digit in the first number, incrementing the relevant element.

Then do the same with the second number but decrementing.

If, at the end, it's still all zeros, the numbers were a permutation of each other.

This is efficient in that it's an O(n) algorithm where n is the number of digits in the two numbers.

In C, the following complete program illustrates how this can be done:

#include <stdio.h>
#include <stdlib.h>

#define FALSE (1==0)
#define TRUE  (1==1)

int hasSameDigits (long num1, long num2) {
    int digits[10];
    int i;

    for (i = 0; i < 10; i++)      // Init all counts to zero.
        digits[i] = 0;

    while (num1 != 0) {           // Process all digits.
        digits[num1%10]++;        // Increment for least significant digit.
        num1 /= 10;               // Get next digit in sequence.
    }

    while (num2 != 0) {           // Same for num2 except decrement.
        digits[num2%10]--;
        num2 /= 10;
    }

    for (i = 0; i < 10; i++)
        if (digits[i] != 0)       // Any count different, not a permutation.
            return FALSE;

    return TRUE;                  // All count identical, was a permutation.
}

 

int main (int c, char *v[]) {
    long v1, v2;

    if (c != 3) {
        printf ("Usage: %s <number1> <number2>\n", v[0]);
        return 1;
    }

    v1 = atol (v[1]);
    v2 = atol (v[2]);
    if (hasSameDigits (v1, v2)) {
        printf ("%d and %d are permutations\n", v1, v2);
    } else {
        printf ("%d and %d are not permutations\n", v1, v2);
    }

    return 0;
}

Simply pass it two (positive) numbers and, assuming they fit in a long, it'll tell you whether they have the same digit counts.

paxdiablo
is it possible to do it without doing any type of sorting?
Ravi Gupta
Yes, @Ravi, see the update.
paxdiablo
can you also suggest something on the line of hashing as well?
Ravi Gupta
I'd say this is similar to hashing already. Each digit is stored into a separate bucket based on its hash value (using a trivial hash function).
jalf
@Ravi, there's no need for a hash. As already stated by jalf, it's a simple but perfect hashing function already (of sorts), given the limited number of buckets required. Any attempt to try and add more complicated code will more than likely slow it down, other than obvious optimisations of course (such as `if (num1 == num2) return TRUE;` at the start).
paxdiablo
@paxdiablo, hey, you fooled Ravi by doing radix sort!
Pavel Shved
@Ravi: This is both sorting and hashing. When you index an array by a function of a subset of the bits of a key, that's hashing. When the function preserves order, that's a form of index or radix sorting. The fastest way to sort a set of integers O(N) is to use them as indices into a table.
Mike Dunlavey
@paxdiablo "This is efficient in that it's an O(n) algorithm where n is the number of digits in the two numbers." Or, to look at it another way, it's O(log n) on the numbers themselves, right? Because the number of digits in n is (floor(log10(n)) + 1)... :)
Cowan
+1  A: 

Create an array:

int digitOccurances[2][10];

In digitOccruances[X][N] store the number of times that the digit N appears in the number X. So if you were comparing 8675309 to 9568733, the array would end up looking like:

{ { 1, 0, 0, 1, 0, 1, 1, 1, 1, 1 } , { 0, 0, 0, 2, 0, 1, 1, 1, 1, 1 } }

If the two arrays are equal, then the numbers are permutations.

This is an O(n) algorithm, so asymptotically speaking this is the most efficient it's going to get (you can't solve this problem without examining all of the digits at least once.

You can immediately return false if the numbers have different lengths, so assume that both of are of length n. It will take 2n operations to fill the array, and then exactly 10 comparisons to read the array. 2n + 10 is O(n).

Tyler McHenry
is there any O(1) solution?
Ravi Gupta
Here `n` in the `O(n)` is the number of digits. Many people might call that `O(log x)` where `x` is the number. There is no way to do this in less than `O(log x)` time, but in practice if `x` fits in an integer variable type, `O(log x)` is constant. Anyway, if `n` is large (talking about integer strings not single C variables) there is definitely no algorithm better than `O(n)` since you have to examine each digit at least once...
R..
@R..: What's more, any O(f(n)) becomes O(1) if n is fixed.
Mike Dunlavey
R, I believe those people would be mistaken. Big-O notation is meant to be a function of the data set _size_ (`n`), not the data set _value_ (`x`). I think you're right that `O(n)` is as good as you're going to do, short of trading time for space :-)
paxdiablo
@paxdiablo You're correct in that strictly speaking `n` in big-O is input size and here the input is the digits, not the number itself. However there is precedent for classifying these sorts of problems based on the input value `x` instead. This is normally done when in terms of `n` an algorithm is super-polynomial, but in terms of `x` it is polynomial. These algorithms, often for NP-Complete problems, are said to run in "pseudo-polynomial time" ( http://en.wikipedia.org/wiki/Pseudo-polynomial_time ). So I suppose by analogy we could say that this algorithm runs in "pseudo-logarithmic time".
Tyler McHenry
@Tyler: although note that the title of the AKS paper is "PRIMES is in P", not "PRIMES is pseudo-polynomial-logarithmic" ;-) I'd rather see/say "O(something involving X), where X is ..." than rely on everybody using the same jargon, unless in an academic context.
Steve Jessop
+13  A: 

a and b are anagrams if they have the same number of each digit. So basically the fastest way seems to be, counting the digits for a and b:

int c[10]={0,0,0,0,0,0,0,0,0,0}

while (a) { c[a%10]++; a/=10; }
while (b) { c[b%10]--; b/=10; }

int res=1;
for (int i=0;i<10;i++) res &= c[i]==0;
printf(res?"yes":"no");
Luther Blissett
This is why they say C has all the power of assembly language with all the readability of, well, assembly language :-)
paxdiablo
+1 for best answer. Counting the digits is optimal. Sorting is slow.
R..
By the way, `int c[10] = {0};` is just as good for initializing the array. And you could memcmp with a constant-0 array to check the results with less code. :-)
R..
@R: or use `find_if` - probably more code than the memcmp with 0-array, but more leet.
Steve Jessop
@R: Counting the digits is a type of sorting.
jamesdlin
@jamesdlin: counting the digits is (the main part of) one method of sorting. If you don't actually assemble/output the sorted digits, though, then (1) you haven't sorted them, and (2) you've saved some time. Whether that time is measurable is another issue, but the fact is that this code does *not* sort the digits of `a` and `b`. It does have an intermediate state in which it would be easy to output the sorted digits of `a`.
Steve Jessop
A: 

Well if you can build an 80GB table, you could always do:

int64 table[10000000000] = {0, blah blah..., 9999999999};

if (table[a] == table[b]) ...
Mike Dunlavey
I think there are few enough equivalence classes that you don't need 64 bits. I should give you -1 for writing `int64` (some nonstandard weirdness..?!) instead of `int64_t` (ISO C)...
R..
@R..: Gosh, sorry for being nonstandard (let alone weird). BillG forbid! Anyway, you're right about fewer equivalence classes. In fact, I just did a brute-force count - 92378.
Mike Dunlavey
A: 

{Edited to add additional test)

Assuming you are in the domain of digits, how about

if 
(
    ('1' ^ '2' ^ '3' == '3' ^ '1' ^ '2') &&
    ('1' + '2' + '3' == '3' + '1' + '2')
)
{
    cout << "Yes\n";
}
else
{
    cout << "No\n";
}
EvilTeach
I don't think this works...
R..
Nope. Note that '0' ^ '3' == '1' ^ '2' for example.
Walter Mundt
Nice idea, though. If the digit sums are different, then they are not equivalent.
Mike Dunlavey
A: 

If what i understood from your question correctly a permutation is a combination of the elements, which do not repeat. So if 123 is a valid permutation of 312 then so does

123,
213,
132,
321,
213, 

and so on.

So based on this assumption lets say you got two integers 123456789 and 129837456. (For simplicity i am also assuming that both numbers have equal length). If you understood the point then you might be able to check for different permutations and combination as well.

for that all you need to do is to get the integers of units out of the given number, e.g:

Number 123456789 is
1 * 100000000 + 
2 * 10000000  +
3 * 1000000   +
4 * 100000    +
5 * 10000     +
6 * 1000      +
7 * 100       +
8 * 10        +
9 

or

1 * power(10, 8) +
2 * power(10, 7) +
3 * power(10, 6) +
4 * power(10, 5) +
5 * power(10, 4) +
6 * power(10, 3) +
7 * power(10, 2) +
8 * power(10, 1) +
9 * power(10, 0) 

i have literally given you algorithmic hint of how to do that so this can easily be done. once done you will end up with separate integers (better save these values in an array)

1, 2, 3, 4, 5, 6, 7, 8, 9

Now

do the same for the other given integer so you will end up with another array of integers

1, 2, 9, 8, 3, 7, 4, 5, 6

so now all you need to check is that if all of the integers of the second array are present in the first array of integers, if yes then they are a permutation of the integers of the first array or the first number.

I hope this helps.

Orochi
your solutions is one of the possible solution but not efficient as compare to other solutions posted in this problem.
Ravi Gupta
Well my solution wasn't actually a solution but rather a hint that will point to the solution or how we can achieve it. As i am an advocate of learn by your own practice and by others experience. So if i provide the whole solution i will be forcing him not to use his mind at all. And yes definitely there can be a number of optimization for calculation. But optimizations come only after we know what to do.
Orochi