ansaurus

Question

Determing if two lists contain the same numeric items without sorting

Answer 1

A:

Make a copy of the first list. Then loop through the second and as you do remove each item from the copy. If you get all the way through the second list and found all elements in the copy, then the lists have the same elements. This is a lot of looping, but with only max 8 elements in the list, you won't get a performance gain by using a different type of collection.

If you had a lot more items, then has a Dictionary/Hashtable for the copy. Keep a unique key of values with a count of how many times they were found in the first list. This will give you a performance boost on larger lists.

Sam 2010-10-08 01:03:48

Answer 2

A:

Given 2 equal sized lists. If the products (multiplying all values together) of the lists are equal then the lists contain the same values, so long as the values are integers greater than 0.

No. Consider the following lists

(9, 9)
(3, 27)

They are the same size and the product of the elements are the same.

Greg 2010-10-08 01:05:46

If the products are different, then the lists are different.

EvilTeach 2010-10-08 01:41:35

@EvilTeach - True but if the products are the same, you don't know if the lists are the same.

Greg 2010-10-08 02:11:03

Answer 3

A:

How fast do you need to process 8 integers? Sorting 8 things in any modern processor is going to take almost no time.

The easy thing is to just use an array of size 66 where index 0 represents value -2. Then you just increment counts across both arrays, and then you just iterate across them afterwards.

Anon 2010-10-08 01:07:01

It's not the single case that I'm worried about. Its because what I need it for will be processing 100k+ lists.

Justin 2010-10-08 02:50:52

Answer 4

+1 A:

Since you only have 66 possible numbers, you can create a bit vector (3 32-bit words or 2 64-bit words) and compare those. You can do it all with just shifts and adds. Since there are no comparisons required until the end (to find out if they are equal), it can run fast because there won't be many branches.

Gabe 2010-10-08 01:09:31

Answer 5

+6 A:

Assuming you know the range ahead of time, you can use a variation on counting sort. Just scan through each array and keep track of how many times each integer occurs.

Procedure Compare-Lists(A, B, min, max)
  domain := max - min
  Count := new int[domain]
  for i in A:
    Count[i - min] += 1
  for i in B:
    Count[i - min] -= 1
    if Count[i - min] < 0:
      // Something was in B but not A
      return "Different"
  for i in A:
    if Count[i - min] > 0:
      // Something was in A but not B
      return "Different"
  return "Same"

This is linear in O(len(A) + len(B))

jleedev 2010-10-08 01:10:30

Note that you can stop early in the `B` part if `Count` gets negative.

schnaader 2010-10-08 01:13:56

@schnaader Good call; I’ve added that.

jleedev 2010-10-08 01:14:42

Couldn't we change `for k in Count{...}` to `for i in A{...},for i in B{...}` for large `domain` values and small arrays like the OP's example? This way, we'd skip untouched entries in `Count`, this should give `O((len(A)+len(B))*2)`.

schnaader 2010-10-08 01:22:45

Scan through each array twice? That sounds quite sane. In fact, you can just loop through `A`, `B`, and then `A` again.

jleedev 2010-10-08 01:25:23

The runtime is O(2*len(A)+len(B)) right? And memory would be O(n). At that point I'd think that just sorting would be faster?

Justin 2010-10-08 02:47:47

@Justin Big-O notation is somewhat irrelevant when you know that there are 8 elements of 66 possible values. But yeah, sorting could be faster.

jleedev 2010-10-08 03:02:50

@Justin: That was my thought, too (I commented on the question.) But as @jleedev pointed out, with so few items, big O becomes less relevant.

JoshD 2010-10-08 22:47:47

Answer 6

A:

If your list has only 8 items then sorting is hardly a performance hit. If you want to do this without sorting you can do so using a hashmap.

Iterate over the first array and for each value N in the array Hash(N) = 1.
Iterate over the second array and for each value M, Hash(M) = Hash(M) + 1.
Iterate over the hash and find all keys K for which Hash(K) = 2.

Sagar V 2010-10-08 01:16:29

Answer 7

+2 A:

You could do this with primes. Keep a prime table for the first 66 primes and use the elements of your arrays (offset by +2) to index into the prime table.

The identity of an array is then just the product of the primes represented by the elements in the array.

Unfortunately, the product must be represented with at least 67 bits:

The 66^th prime is 317, and 317⁸ = 101,970,394,089,246,452,641
log₂(101,970,394,089,246,452,641) = 66.47 (rounded up) is 67 bits

Example pseudocode for doing this (assuming the existence of an int128 data type):

int primes[] = 
{
      2,   3,   5,   7,  11,  13,  17,  19,  23,  29,
     31,  37,  41,  43,  47,  53,  59,  61,  67,  71,
     73,  79,  83,  89,  97, 101, 103, 107, 109, 113,
    127, 131, 137, 139, 149, 151, 157, 163, 167, 173,
    179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
    233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
    283, 293, 307, 311, 313, 317
};

// Assumes:
// Each xs[i] is [-2, 63]
// length is [1, 8]
int128 identity(int xs[], int length)
{
    int128 product = 1;

    for (int i = 0; i < length; ++i)
    {
        product *= primes[xs[i] + 2];
    }

    return product;
}

bool equal(int a[], int b[], int size)
{
    return identity(a, size) == identity(b, size);
}

You might be able to use a long double on GCC to store the product since it is defined as an 80-bit data type, but I'm not sure if the floating-point multiplication error would cause collisions between lists. I haven't verified this.

My previous solution below does not work, see the comments below.

For each list:

Compute the sum of all elements
Compute the product of all elements
Store the length of the list (in your case, since the length is guaranteed to be the same for two lists, you can ignore it entirely)

As you compute the sum and product, each element needs to be adjusted by +3, so your range is now [1, 66].

The (sum, product, length) tuple is the identity for your list. Any lists with the same identity are equal.

You can fit this (sum, product, length) tuple into a single 64-bit number:

For the product: 66⁸ = 360,040,606,269,696, log₂(360,040,606,269,696) = 48.36 (rounded up) is 49 bits
For the sum: 66 * 8 = 528, log₂(528) = 9.04 (rounded up) is 10 bits
Length is in the range [1, 8], log₂(8) = 3 bits
49 + 10 + 3 = 62 bits for representing the identity

Then, you can do direct 64-bit comparisons to determine equality.

Running-time is linear in the size of the arrays with a single pass over each. Memory usage is O(1).

Example code:

#include <cstdint>
#include <stdlib.h>

// Assumes:
// Each xs[i] is [-2, 63]
// length is [1, 8]
uint64_t identity(int xs[], int length)
{
    uint64_t product = 1;
    uint64_t sum = 0;

    for (int i = 0; i < length; ++i)
    {
        int element = xs[i] + 3;
        product *= element;
        sum += element;
    }

    return (uint64_t)length << 59 | (sum << 49) | product;
}

bool equal(int a[], int b[], int size)
{
    return identity(a, size) == identity(b, size);
}

void main()
{
    int a[] = { 23, 0, -2,  6,  3, 23, -1 };
    int b[] = { 0, -1,  6, 23, 23, -2,  3 };

    printf("%d\n", equal(a, b, _countof(a)));
}

Chris Schmich 2010-10-08 02:16:54

This won't work. You can see for yourself that 1*6*6=2*2*9 and 1+6+6=2+2+9. Therefore, your algorithm will declare lists {-2,3,3,-2,-2,-2,-2,-2} and {-1,-1,6,-2,-2,-2,-2,-2} equal to each other.

2010-10-08 03:00:23

@user434507: you're absolutely correct, I'll post an alternate solution shortly.

Chris Schmich 2010-10-08 03:11:43

ansaurus

tags:

views:

answers:

Determing if two lists contain the same numeric items without sorting

My previous solution below does not work, see the comments below.

related questions