A: 

it's a permutation if and only if there are no duplicate values in the array, should be easy to check that in O(N)

Chris Card
And how do I do that in O(n) with the restrictions above?:)
Iulian Şerbănoiu
This reminds me of Fermet's Last Theorem :P
Rubys
sorry, I missed the space restriction
Chris Card
+4  A: 

I doubt you would be able to prove that ;)

  (1, 2, 4, 4, 4, 5, 7, 9, 9)

I think that more generally, this problem isn't solvable by processing the numbers in order. Suppose you are processing the elements in order and you are halfway the array. Now the state of your program has to somehow reflect which numbers you've encountered so far. This requires at least O(n) bits to store.

Jules
Thanks! Rules that solution out.
Iulian Şerbănoiu
This is more for a comment than an answer, as it doesn't actually answer the question.
Joren
I agree, but it does rule out half of the "answers" further down as well as the approach that the OP was taking. So I believe it does solve part of the problem: you don't have to keep looking for a way to solve it by processing the elements in order.
Jules
A: 

Depending on how much space you have, relative to N, you might try using hashing and buckets.

That is, iterate over the entire list, hash each element, and store it in a bucket. You'll need to find a way to reduce bucket collisions from the hashes, but that is a solved problem.

If an element tries to go into a bucket with an item identical to it, it is a permutation.

This type of solution would be O(N) as you touch each element only once.

However, the problem with this is whether space M is larger than N or not. If M > N, this solution will be fine, but if M < N, then you will not be able to solve the problem with 100% accuracy.

samoz
Given that the question is O(N) time complexity with O(1) space complexity, there is by definition a big enough N where M < N.
Ants Aasma
@Ants Agreed, but maybe that O(1) space is on the order of gigabytes and N is much smaller. If this is known, he could use my solution. But agreed, this does require knowing a lot of information at the start of things.
samoz
The whole definition of the big-O concept is that N is large enough that the complexity class dominates everything else. Big O is always a theoretical exercise, practical considerations like how many gigabytes is available matters when solving real instances of a problem.
Ants Aasma
+1  A: 

I don't know how to do it in O(N), or even if it can be done in O(N). I know that it can be done in O(N log N) if you (use an appropriate) sort and compare.

That being said, there are many O(N) techniques that can be done to show that one is NOT a permutation of the other.

  1. Check the length. If unequal, obviously not a permutation.
  2. Create an XOR fingerprint. If the value of all the elements XOR'ed together does not match, then it can not be a permutation. A match would however be inconclusive.
  3. Find the sum of all elements. Although the result may overflow, that should not be a worry when matching this 'fingerprint'. If however, you did a checksum that involved multiplying then overflow would be an issue.

Hope this helps.

Sparky
+1  A: 

This isn't going to work due to the complexity being given as a function of N rather than M, implying that N >> M

This was my shot at it, but for a bloom filter to be useful, you need a big M, at which point you may as well use simple bit toggling for something like integers

http://en.wikipedia.org/wiki/Bloom_filter

For each element in the array Run the k hash functions Check for inclusion in the bloom filter If it is there, there is a probability you've seen the element before If it isn't, add it

When you are done, you may as well compare it to the results of a 1..N array in order, as that'll only cost you another N.

Now if I haven't put enough caveats in. It isn't 100%, or even close since you specified complexity in N, which implies that N >> M, so fundamentally it won't work as you have specified it.

BTW, the false positive rate for an individual item should be e = 2^(-m/(n*sqrt(2)))

Which monkeying around with will give you an idea how big M would need to be to be acceptable.

McBeth
Wouldn't that be O(n^2)? You say 'For each element ... compare it to the results ... that'll only cost you another N'. So N elements and then an additional N cost per element, N^2?
samoz
You skipped the "When you are done" bit. The final check is totally optional and would occur after the loop
McBeth
+8  A: 

I'm very slightly skeptical that there is a solution. Your problem seems to be very close to one posed several years ago in the mathematical literature, with a summary given here ("The Duplicate Detection Problem", S. Kamal Abdali, 2003) that uses cycle-detection -- the idea being the following:

If there is a duplicate, there exists a number j between 1 and N such that the following would lead to an infinite loop:

x := j;
do
{
   x := a[x];
}
while (x != j);

because a permutation consists of one or more subsets S of distinct elements s0, s1, ... sk-1 where sj = a[sj-1] for all j between 1 and k-1, and s0 = a[sk-1], so all elements are involved in cycles -- one of the duplicates would not be part of such a subset.

e.g. if the array = [2, 1, 4, 6, 8, 7, 9, 3, 8]

then the element in bold at position 5 is a duplicate because all the other elements form cycles: { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 3}. Whereas the arrays [2, 1, 4, 6, 5, 7, 9, 3, 8] and [2, 1, 4, 6, 3, 7, 9, 5, 8] are valid permutations (with cycles { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 3, 5 } and { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 5 -> 3 } respectively).

Abdali goes into a way of finding duplicates. Basically the following algorithm (using Floyd's cycle-finding algorithm) works if you happen across one of the duplicates in question:

function is_duplicate(a, N, j)
{
     /* assume we've already scanned the array to make sure all elements
        are integers between 1 and N */
     x1 := j;
     x2 := j;
     do
     {             
         x1 := a[x1];
         x2 := a[x2];
         x2 := a[x2];
     } while (x1 != x2);

     /* stops when it finds a cycle; x2 has gone around it twice, 
        x1 has gone around it once.
        If j is part of that cycle, both will be equal to j. */
     return (x1 != j);
}

The difficulty is I'm not sure your problem as stated matches the one in his paper, and I'm also not sure if the method he describes runs in O(N) or uses a fixed amount of space. A potential counterexample is the following array:

[3, 4, 5, 6, 7, 8, 9, 10, ... N-10, N-9, N-8, N-7, N-2, N-5, N-5, N-3, N-5, N-1, N, 1, 2]

which is basically the identity permutation shifted by 2, with the elements [N-6, N-4, and N-2] replaced by [N-2, N-5, N-5]. This has the correct sum (not the correct product, but I reject taking the product as a possible detection method since the space requirements for computing N! with arbitrary precision arithmetic are O(N) which violates the spirit of the "fixed memory space" requirement), and if you try to find cycles, you will get cycles { 3 -> 5 -> 7 -> 9 -> ... N-7 -> N-5 -> N-1 } and { 4 -> 6 -> 8 -> ... N-10 -> N-8 -> N-2 -> N -> 2}. The problem is that there could be up to N cycles, (identity permutation has N cycles) each taking up to O(N) to find a duplicate, and you have to keep track somehow of which cycles have been traced and which have not. I'm skeptical that it is possible to do this in a fixed amount of space. But maybe it is.

This is a heavy enough problem that it's worth asking on mathoverflow.net (despite the fact that most of the time mathoverflow.net is cited on stackoverflow it's for problems which are too easy)


edit: I did ask on mathoverflow, there's some interesting discussion there.

Jason S
This algorithm in the paper requires an array of size n+1, so that it always contains at least one duplicate. This is not the same problem as the OP. Perhaps the algorithm can be adapted, but it cannot be used verbatim.
Jules
+5  A: 

This is impossible to do in O(1) space, at least with a single-scan algorithm.

Proof

Suppose you have processed N/2 of the N elements. Assuming the sequence is a permutation then, given the state of the algorithm, you should be able to figure out the set of N/2 remaining elements. If you can't figure out the remaining elements, then the algorithm can be fooled by repeating some of the old elements.

There are N choose N/2 possible remaining sets. Each of them must be represented by a distinct internal state of the algorithm, because otherwise you couldn't figure out the remaining elements. However, it takes logarithmic space to store X states, so it takes BigTheta(log(N choose N/2)) space to store N choose N/2 states. That values grows with N, and therefore the algorithm's internal state can not fit in O(1) space.

More Formal Proof

You want to create a program P which, given the final N/2 elements and the internal state of the linear-time-constant-space algorithm after it has processed N/2 elements, determines if the entire sequence is a permutation of 1..N. There is no time or space bound on this secondary program.

Assuming P exists we can create a program Q, taking only the internal state of the linear-time-constant-space algorithm, which determines the necessary final N/2 elements of the sequence (if it was a permutation). Q works by passing P every possible final N/2 elements and returning the set for which P returns true.

However, because Q has N choose N/2 possible outputs, it must have at least N choose N/2 possible inputs. That means the internal state of the original algorithm must store at least N choose N/2 states, requiring BigTheta(log N choose N/2), which is greater than constant size.

Therefore the original algorithm, which does have time and space bounds, also can't work correctly if it has constant-size internal state.

[I think this idea can be generalized, but thinking isn't proving.]

Consequences

BigTheta(log(N choose N/2)) is equal to BigTheta(N). Therefore just using a boolean array and ticking values as you encounter them is (probably) space-optimal, and time-optimal too since it takes linear time.

Strilanc
I disagree with your approach. The phrases "you should be able to figure out the set of N/2 remaining elements" and "the algorithm can be fooled by repeating some of the old elements." are vague... if by the former you mean producing a set of the N/2 remaining elements, that is not a requirement of the problem.
Jason S
Why should you be able to figure out the set of N/2 remaining elements? All you need to say is that you have membership in the set of permutations (at the end) within the set of {1..N}^N.
Rex Kerr
What I meant is that, given the internal state of the algorithm, a program without bounded time and space must be able to determine the final N/2 elements. Equivalently, some program given the internal state and the final N/2 elements of the sequence must be able to determine if the entire sequence forms a permutation. [I removed the bounds to create that equivalence.] If an unbounded program can't do it when given the constant-sized internal state, then clearly the original bounded program also can't do it.
Strilanc
@JasonS I clarified the post.
Strilanc
You have proven that the problem is _not subdividable_, but not that it can't be solved in `O(N)` time. How do you know that there exists no strategy where at `N/2` of the way through the list, you might still need to revisit the earlier part of the list to process the rest? As long as you do it rarely enough, it could still be `O(N)`.
Rex Kerr
[NitPick: I'm not trying to prove it's not linear time. That would be silly.] It's true that the proof doesn't naively generalize to non-scanning algorithms, but it does generalize. First, if the algorithm is doing any unnecessary list accesses, those must be removed. It must only read a list item if it requires it for the result. Second, instead of considering the final N/2 items, you consider the last N/2 items which were accessed. So, if the algorithm finishes by reading all the odd elements, P is given the odd items and the algorithm's state just before it would have started reading them.
Strilanc
@RexKerr If I understand what you mean by 'set of {1..N]^N', then you're wrong. {1..N}^N contains sequences which aren't permutations of 1..N, such as 1 repeated N times. [Referring to your first comment]
Strilanc
@Strilanc - The generalization doesn't work, because your last `N/2` items have to be the same the last time you visit them as the first time (and all other times). Thus, depending on what's already happened, you may not have all `C(N,N/2)` possibilities open to you. And I just meant that "you need to show that you live in a small set of size `N!` that is a subset of a larger set of size `N^N` in which you can trivially determine membership. But you've fixed this in your proof (which I like, incidentally--I just think it only covers a subset of algorithms).
Rex Kerr
@RexKerr I agree that the proof, as stated, only applies to a subset of programs. I also agree that not all possibilities may be open when there are N/2 items left to read. I don't know how to properly generalize the proof, but I think it can be done. I feel that if you use state to store consistency information about the last N/2 items, you've wasted state you could have used to help P. P already knows the last N/2 items; it needs help with the other N/2.
Strilanc
Alright, there are definitely some algorithms that can do it in O(1) space. For example, doing the naive N^2 search for duplicate. The proof definitely doesn't generalize.
Strilanc
The problem clearly cannot be solved in a single pass with finite space. I don't think it can be solved without writing to the array. The interesting question is whether it can be solved in linear time, without any spare space, where one is allowed to write numbers to the array, but (1) the original array contents must be restored before the algorithm finishes, and (2) the maximum number that can be written to an array element is O(N).
supercat
A: 

First, an information theoretic reason why this may be possible. We can trivially check that the numbers in the array are in bounds in O(N) time and O(1) space. To specify any such array of in-bounds numbers requires N log N bits of information. But to specify a permutation requires approximately (N log N) - N bits of information (Stirling's approximation). Thus, if we could acquire N bits of information during testing, we might be able to know the answer. This is trivial to do in N time (in fact, with M static space we can pretty easily acquire log M information per step, and under special circumstances we can acquire log N information).

On the other hand, we only get to store something like M log N bits of information in our static storage space, which is presumably much less than N, so it depends greatly what the shape of the decision surface is between "permutation" and "not".

I think that this is almost possible but not quite given the problem setup. I think one is "supposed" to use the cycling trick (as in the link that Iulian mentioned), but the key assumption of having a tail in hand fails here because you can index the last element of the array with a permutation.

Rex Kerr
A: 

You might be able to do this in randomized O(n) time and constant space by computing sum(x_i) and product(x_i) modulo a bunch of different randomly chosen constants C of size O(n). This basically gets you around the problem that product(x_i) gets too large.

There's still a lot of open questions, though, like if sum(x_i)=N(N+1)/2 and product(x_i)=N! are sufficient conditions to guarantee a permutation, and what is the chance that a non-permutation generates a false positive (I would hope ~1/C for each C you try, but maybe not).

Keith Randall
A: 

The sum and the product will not guarantee the correct answer, since these hashes are subject to collisions, i.e. different inputs might potentially produce identical results. If you want a perfect hash, a single-number result that actually fully describes the numerical composition of the array, it might be the following.

Imagine that for any number i in [1, N] range you can produce a unique prime number P(i) (for example, P(i) is the i-th prime number). Now all you need to do is calculate the product of all P(i) for all numbers in your array. The product will fully and unambiguously describe the composition of your array, disregarding the ordering of values in it. All you need to do is to precalculate the "perfect" value (for a permutation) and compare it with the result for a given input :)

Of course, the algorithm like this does not immediately satisfy the posted requirements. But at the same time it is intuitively too generic: it allows you to detect a permutation of absolutely any numerical combination in an array. In your case you need to detect a permutation of a specific combination 1, 2, ..., N. Maybe this can somehow be used to simplify things... Probably not.

AndreyT
A: 

Alright, this is different, but it appears to work!

I ran this test program (C#):

    static void Main(string[] args) {
        for (int j = 3; j < 100; j++) {
            int x = 0;
            for (int i = 1; i <= j; i++) {
                x ^= i;
            }
            Console.WriteLine("j: " + j + "\tx: " + x + "\tj%4: " + (j % 4));
        }
    }

Short explanation: x is the result of all the XORs for a single list, i is the element in a particular list, and j is the size of the list. Since all I'm doing is XOR, the order of the elements don't matter. But I'm looking at what correct permutations look like when this is applied.

If you look at j%4, you can do a switch on that value and get something like this:

    bool IsPermutation = false;
    switch (j % 4) {
        case 0:
            IsPermutation = (x == j);
            break;
        case 1:
            IsPermutation = (x == 1);
            break;
        case 2:
            IsPermutation = (x == j + 1);
            break;
        case 3:
            IsPermutation = (x == 0);
            break;
    }

Now I acknowledge that this probably requires some fine tuning. It's not 100%, but it's a good easy way to get started. Maybe with some small checks running throughout the XOR loop, this could be perfected. Try starting somewhere around there.

Corey Ogburn