views:

161

answers:

7

I found this on an "interview questions" site and have been pondering it for a couple of days. I will keep churning, but am interested what you guys think

"10 Gbytes of 32-bit numbers on a magnetic tape, all there from 0 to 10G in random order. You have 64 32 bit words of memory available: design an algorithm to check that each number from 0 to 10G occurs once and only once on the tape, with minimum passes of the tape by a read head connected to your algorithm."

A: 

Perform an in-place mergesort or quicksort, using tape for storage? Then iterate through the numbers in sequence, tracking to see that each number = previous+1.

Requires cleverly implemented sort, and is fairly slow, but achieves the goal I believe.

Edit: oh bugger, it's never specified you can write.

Here's a second approach: scan through trying to build up to 30-ish ranges of contiginous numbers. IE 1,2,3,4,5 would be one range, 8,9,10,11,12 would be another, etc. If ranges overlap with existing, then they are merged. I think you only need to make a limited number of passes to either get the complete range or prove there are gaps... much less than just scanning through in blocks of a couple thousand to see if all digits are present.

It'll take me a bit to prove or disprove the limits for this though.

BobMcGee
It fails if you've only got a read head, though.
Anon.
Oh, bugger. Let me fix that then.
BobMcGee
A: 

well you could always find out what 1 + 2 + 3 + ...10gs equals. Sum everything on the tape and see if you get the expected result. I think 64 32bit words would be plenty to store the expected total and keep a running result.

RedDeckWins
1+1+4+4 = 1+2+3+4
Jean-Bernard Pellerin
Stuff sure gets modded up quick around here. What happens if you have 1 + 1 + 3 + 5 + 5 + 6 + 7 + ... + 10g?
danben
good point, I guess when I usually hear/read this interview question there is a condition that says there is either 1 duplicate or no duplicates.
RedDeckWins
Incidentally, how many permutations of 10G unique numbers are there?
Anon.
A: 

The utterly naive algorithm, which takes as many passes as there are numbers to check, would be to walk through and verify that the lowest number is there. Then do it again checking that the next lowest is there. And so on.

This requires one word of storage to keep track of where you are - you could cut down the number of passes by a factor of 64 by using all 64 words to keep track of where you're up to in several different locations in the search space - checking all of your current ones on each pass. Still O(n) passes, of course.

You could probably cut it down even more by using portions of the words - given that your search space for each segment is smaller, you won't need to keep track of the full 32-bit range.

Anon.
You have 2048 bits, so you can check a batch of 2048 at a time. Not as bad, but still pretty slow when you have to make 5242880 passes.
S.Lott
A: 

Do 2 reduces on the numbers, a sum and a bitwise XOR.

The sum should be (10G + 1) * 10G / 2
The XOR should be ... something

RichN
You better be careful doing your sum .. what about overflow?
Michael Anderson
Implementation wise, we can cast the number into double word first before summing, of course the sum is also in double word. On the other hand, this is an algorithm. Algorithm never overflows, implementation does.
RichN
You've got 2048 bits of memory; you can easily do the sum as a 65-bit long-long-long.
S.Lott
+1  A: 

Hi

It's a trick question, as Michael Anderson and I have figured out. You can't store 10G 32b numbers on a 10G tape. The interviewer (a) is messing with you and (b) is trying to find out how much you think about a problem before you start solving it.

Cheers

Mark

High Performance Mark
+1 but i am wondering whether there is a good algorithm for small problems, i.e., say 1G numbers.
Yin Zhu
Could they could be asking if you have a contiginous number set of the size that will fit?
BobMcGee
Well they could be asking anything I suppose, but the OP seemed quite definite about the question.
High Performance Mark
+1  A: 

Apparently I need to earn some sort of points in order to comment, so I'm quoting here instead. Yay, internet!

RichN wrote:

Do 2 reduces on the numbers, a sum and a bitwise XOR.

The sum should be (10G + 1) * 10G / 2

The XOR should be ... something

Care to explain the idea behind this method? Obviously any permutation of the same values will produce identical sums, but how do you know that you won't get any false positives?

At any rate the four-element sequence (0, 0, 3, 3) seems to be a counter-example to the algorithm since it has the same sums as (0, 1, 2, 3).

doynax
A: 

Hi everybody,

It looks like there is a catch in the question that no one has talked about so far; the interviewer has only asked the interviewee to write a program that CHECKS

(i) if each number that makes up the 10G is present once and only once--- what should the interviewee do if the numbers in the given list are present multple times? should he assume that he should stop execting the programme and throw exception or should he assume that he should correct the mistake by removing the repeating number and replace it with another (this may actually be a costly excercise as this involves complete reshuffle of the number set)? correcting this is required to perform the second step in the question, i.e. to verify that the data is stored in the best possible way that it requires least possible passes.

(ii) When the interviewee was asked to only check if the 10G weight data set of numbers are stored in such a way that they require least paases to access any of those numbers; what should the interviewee do? should he stop and throw exception the moment he finds an issue in the algorithm they were stored in, or correct the mistake and continue till all the elements are sorted in the order of least possible passes?

If the intension of the interviewer is to ask the interviewee to write an algorithm that finds the best combinaton of numbers that can be stored in 10GB, given 64 32 Bit registers; and also to write an algorithm to save these chosen set of numbers in the best possible way that require least number of passes to access each; he should have asked this directly, woudn't he?

I suppose the intension of the interviewer may be to only see how the interviewee is approaching the problem rather than to actually extract a working solution from the interviewee; wold any buy this notion?

Regards, Samba

Saasira