views:

226

answers:

5

I tried my hand at this Google Codejam Africa problem (the contest is already finished, I just did it to improve my programming skills).

The Problem:

You are hosting a party with G guests and notice that there is an odd number of guests! When planning the party you deliberately invited only couples and gave each couple a unique number C on their invitation. You would like to single out whoever came alone by asking all of the guests for their invitation numbers.

The Input:

The first line of input gives the number of cases, N. N test cases follow. For each test case there will be:

  • One line containing the value G the number of guests.
  • One line containing a space-separated list of G integers. Each integer C indicates the invitation code of a guest. Output

For each test case, output one line containing "Case #x: " followed by the number C of the guest who is alone.

The Limits:

  • 1 ≤ N ≤ 50
  • 0 < C ≤ 2147483647

Small dataset

3 ≤ G < 100

Large dataset

3 ≤ G < 1000

Sample Input:

3
3
1 2147483647 2147483647
5
3 4 7 4 3
5
2 10 2 10 5

Sample Output:

Case #1: 1
Case #2: 7
Case #3: 5

This is the solution that I came up with:

with open('A-large-practice.in') as f:
    lines = f.readlines()

with open('A-large-practice.out', 'w') as output:
    N = int(lines[0])
    for testcase, i in enumerate(range(1,2*N,2)):
        G = int(lines[i])
        for guest in range(G):
            codes = map(int, lines[i+1].split(' '))
            alone = (c for c in codes if codes.count(c)==1)
        output.write("Case #%d: %d\n" % (testcase+1, alone.next()))

It runs in 12 seconds on my machine with the large input.

Now, my question is, can this solution be improved in Python to run in a shorter time or use less memory? The analysis of the problem gives some pointers on how to do this in Java and C++ but I can't translate those solutions back to Python.

Edit:

Incorporating the tips from Alex Martelli and IVlad below I now have this solution which runs in 0.079s:

with open('A-large-practice.in') as f:
    lines = f.readlines()

with open('A-large-practice.out', 'w') as output:
    N = int(lines[0])
    for testcase, i in enumerate(range(1,2*N,2)):
        codes = map(int, lines[i+1].split(' '))
        alone = 0
        for c in codes: alone ^= c
        output.write("Case #%d: %d" % (testcase+1, alone))
A: 

Consider this:

codes = map(int, lines[i+1].split(' '))

Why resplit the line for each value of guest? The line won't change.

Also, consider this:

for guest in range(G):
    codes = map(int, lines[i+1].split(' '))
    alone = (c for c in codes if codes.count(c)==1)

None of the code in this loop depends on guest. Why is it in a loop?

S.Lott
+5  A: 

I don't know about python, but the problem itself is a classic. Given 2K - 1 numbers, each except one appearing an even number of times, find the one appearing an odd number of times.

Needed formulas:

  1. x xor x == 0 for all x
  2. x xor y == y xor x for all x and y
  3. x xor (y xor z) == (x xor y) xor z (associativity)
  4. x xor 0 == x for all x

So xor all the numbers. The result of xor-ing all the numbers will be your answer. I don't know how you'd do this in python, in the C languages the xor operator is ^.

This is really the most efficient way as you're only doing a simple bitwise operation and you don't even have to store the given numbers.

You can check that: 3 ^ 4 ^ 7 ^ 4 ^ 3 == 7, 2 ^ 10 ^ 2 ^ 10 ^ 5 == 5 etc.

IVlad
Associativity of `xor` is also important here.
Michał Marczyk
@Michal Marczyk: no, xor associativity is not important. For every permutation of n1, n2, …nn, the result of xoring them is the same.
ΤΖΩΤΖΙΟΥ
A: 

This finishes instantly for me with the same results:

src = open('A-large-practice.in')
dst = open('A-large-practice.out', 'w')

n = int(src.readline().rstrip())
for i in xrange(n):
    src.readline() # skip g
    nums = src.readline().split()
    seen = set()
    for n in nums:
        if n in seen:
            seen.remove(n)
        else:
            seen.add(n)
    dst.write("Case #%d: %s\n" % (i+1, tuple(seen)[0]))

src.close()
dst.close()

Damn, I love hash maps!

Max Shawabkeh
+3  A: 

My machine's slower than yours -- your code (put in a main function) took 19 seconds on my machine. The obvious optimization of removing the useless inner for cuts this to 0.46 seconds; changing the heart of the code to

      alone = 0
      for c in codes: alone ^= c

cuts the time to 0.08 seconds. I'm sure further optimizations are possible, but with the code already 235 times faster I think this may be enough;-).

Alex Martelli
+1  A: 

Yes, as the analysis page explains, you can use constant memory, although any approach will run in O(n) time. Here's how the constant memory solution would work in Python:

numbers = [1, 1, 2, 2, 3, 3, 4, 4, 12345]
missingOne = 0
for number in numbers:
    missingOne = missingOne ^ number
assert missingOne == 12345

The key is understanding what the xor operator does. Any number xor'ed with zero is itself. And any number xor'ed with itself is zero. If you look at the truth table for xor, you should be able to convince yourself of both of those facts and then prove that xoring the numbers in a list, starting from zero will produce the non-duplicated number.

msalib