views:

362

answers:

5

I came across a little problem that looks very easy but requires a bit more thought than I first assumed. It can also be solved in many different ways and, I think, would be a perfect interview question. So what solution would you write down?

Your input is a stream of pairs (x, y), with each x and y on a separate line. A short example, starting with the pair (fruit, apple):

fruit
apple
book
Harry Potter
book
Bible
vehicle
car

You're asked to print the first n pairs, where adjacent pairs that all have the same x are counted as one. So, for n=2 you should output these lines:

fruit
apple
book
Harry Potter
book
Bible

What's your most elegant way to solve this problem (in a language of your choice)?

+4  A: 

Unless I'm missing something, this seems to be a very simple algorithm. I don't have the time to write up code for it, but assuming you know n ahead of time:

  1. Set i = 0
  2. Read in x
  3. Output x
  4. If x != last x, increment i
  5. Read in y
  6. Output y
  7. If not EOF and ((read in x) == last x or i < n), goto 3

I wouldn't even bother reading in input you're not going to use. You might want to handle the EOF case before step 5 in case of invalid input, however.

Edit: I was missing something - the case when the last key is repeated. I modified the loop condition and loop point to reflect it.

lc
That was my first take too, but it won't work on the example I gave. That's why I said it looks easier than it is! If the nth pair has adjacent duplicates it will print only the first one, so for n=2 it won't print the second book in the example input.
You're right. I just changed the loop condition to fix it. Of course good testing would point that out as well.
lc
+2  A: 

In Python, a (relatively) simple piece of code:

# Open file.
infile = open ("infile.txt","r")

# Only do two groups.
n = 2

# Loop until groups exhausted.
lastkey = None
while  True:
    # Get key and value, exit loop if EOF.
    key = infile.readline()
    val = infile.readline()
    if val == "":
        break

    # Remove newlines.
    if key[-1:] == "\n":
        key = key[:-1]
    if val[-1:] == "\n":
        val = val[:-1]

    # Special handling for first key.
    if lastkey == None:
        lastkey = key

    # New key means new group, print an decrement group counter.
    # Exit loop here if groups exhausted.
    if lastkey != key:
        lastkey = key
        n = n - 1
        if n == 0:
            break

    # Same group or not all groups finished, just print it.
    print "%s"%(key)
    print "%s"%(val)

infile.close()

For the input file:

fruit
Apple
fruit
Banana
fruit
Pear
book
Harry Potter
book
Bible
vehicle
Car
book
Earthsea

it produces:

fruit
Apple
fruit
Banana
fruit
Pear
book
Harry Potter
book
Bible
paxdiablo
It's a bit verbose for python...
Phil H
But it works, as opposed to the simpler ones posted so far.
+1  A: 

I'd adapt the container to the purpose. Don't try and find which language this is, it's a very personal combo (not that far from C++ though) ;-)

1) create a container that associates a string (first item of a pair) to a list of strings (successive second items of the pairs with same first item)

vector<pair<string, list<string> > > container;

int currentIndex = 0;
string currentKey;

forever
{
  string newKey;
  cin >> newKey;
  if( newKey == "" )
  {
    break;
  }

  if( newKey != currentKey )
  {
    ++currentIndex;
    container[currentIndex].first = newKey;
    currentKey = newKey;
  }
  string value;
  cin >> value;
  container[currentIndex].second.push_back( value );
}

2) iterate through the vector to count the number of printed strings (the internal lists won't be counted as more than one)

foreach( index in container ) until (index > n )
{
   foreach( seconditem in container[index].second )
   {
     print( container[index].first, seconditem );
   }
}

Is this elegant enough for you ? :)

Benoît
I have to -1 this as it's overcomplicated and memory inefficient. Why store the entire input dataset in memory when we may only use the first few items? Why store *anything* in memory -- at no point do we need to access an entry older than the previous one.
j_random_hacker
There is no mention of efficiency in the original question. So i did not try and make this program fast (i did not even try to make it compilable !). I consider my answer is elegant, but that's only my point of view I guess !
Benoît
A: 
import os
fp=open(myfile)
n=2 # or whatever
previous = None
count = n
for line in fp:
    print line
    print fp.readline()
    if line!=previous:
        count -= 1
    if count<0:
        break
Phil H
What's the import for?
recursive
Ther should be a previous=line somewhere? And are you sure you want to break if count<0? Or if count==0?
+2  A: 
Svante