tags:

views:

287

answers:

8

I have this, and it works:

# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
  finalList = []
  for item in list1:
    finalList.append(item)
  for item in list2:
    finalList.append(item)
  finalList.sort()
  return finalList
  # +++your code here+++
  return

But, I'd really like to learn this stuff well. :) What does 'linear' time mean?

+1  A: 

Linear time means that the runtime of the program is proportional to the length of the input. In this case the input consists of two lists. If the lists are twice as long, then the program will run approximately twice as long. Technically, we say that the algorithm should be O(n), where n is the size of the input (in this case the length of the two input lists combined).

This appears to be homework, so I will no supply you with an answer. Even though this is not homework, I am of the opinion that you will be best served by taking a pen and a piece of paper, construct two smallish example lists which are sorted, and figure out how you would merge those two lists, by hand. Once you figured that out, implementing the algorithm is a piece of cake.

(If all goes well, you will notice that you need to iterate over each list only once, in a single direction. That means that the algorithm is indeed linear. Good luck!)

Stephan202
This isn't homework.
Serg
@Sergio Tapia But it's a nice answer anyway.
Li0liQ
Useful for theory, but it doesn't explain whatsoever in code how to implement something keeping that in mind.
Serg
@Sergio: I'm sorry, that wasn't clear yet when I answered. Regardless, I think that this is the kind of exercise that is best figured out on one's own. Doing that is much more rewarding and beneficial in the long run.
Stephan202
A: 

'Linear time' means that time is an O(n) function, where n - the number of items input (items in the lists).
f(n) = O(n) means that that there exist constants x and y such that x * n <= f(n) <= y * n.

def linear_merge(list1, list2):
  finalList = []
  i = 0
  j = 0
  while i < len(list1):
    if j < len(list2):
      if list1[i] < list2[j]:
        finalList.append(list1[i])
        i += 1
      else:
        finalList.append(list2[j])
        j += 1
    else:
      finalList.append(list1[i])
      i += 1
  while j < len(list2):
    finalList.append(list2[j])
    j += 1
  return finalList
Li0liQ
This code does not work.
Mike Graham
@Mike Graham Fixed.
Li0liQ
A: 

It means: "making any constant number of passes of any of the two lists". If your algorithm's complexity is linear (look up big O notation on Wikipedia and O(n)), it means that the performance degradation caused by increasing the list size is directly proportional to the increase in the number of elements.

Alex Ciminian
...that is not true, he can do k passes for any constant k. Your interpretation of complexity is oversimplified and not correct.
Gabriel Ščerbák
You're right, I edited my answer. I answered in a hurry and didn't stop to think that the quote I took from his question isn't actually correct. Thanks for pointing it out. I tried to put it as simply as possible for someone who doesn't know anything about this.
Alex Ciminian
A: 

Linear time means O(n) complexity. You can read something about algorithmn comlexity and big-O notation here: http://en.wikipedia.org/wiki/Big_O_notation . You should try to combine those lists not after getting them in the finalList, try to merge them gradually - adding an element, assuring the result is sorted, then add next element... this should give you some ideas.

Gabriel Ščerbák
A: 

Linear means O(n) in Big O notation, while your code uses a sort() which is most likely O(nlogn).

The question is asking for the standard merge algorithm. A simple Python implementation would be:

def merge(l, m):
    result = []
    i = j = 0
    total = len(l) + len(m)
    while len(result) != total:
        if len(l) == i:
            result += m[j:]
            break
        elif len(m) == j:
            result += l[i:]
            break
        elif l[i] < m[j]:
            result.append(l[i])
            i += 1
        else:
            result.append(m[j])
            j += 1
    return result

>>> merge([1,2,6,7], [1,3,5,9])
[1, 1, 2, 3, 5, 6, 7, 9]
Max Shawabkeh
Note that since you use `.pop(0)`, your algorithm is actually quadratic (if `l` and `m` are lists). A linear replacement would keep track of indeces or use `iter(l)` and `iter(m)` and use `next` instead of `pop(0)`.
Mike Graham
@Mike: You are right. Fix edited in. Having a convenient API for common data structures can really desensitize one to obvious lower-level issues.
Max Shawabkeh
+3  A: 

Linear time means that the time taken is bounded by some undefined constant times (in this context) the number of items in the two lists you want to merge. Your approach doesn't achieve this - it takes O(n log n) time.

When specifying how long an algorithm takes in terms of the problem size, we ignore details like how fast the machine is, which basically means we ignore all the constant terms. We use "asymptotic notation" for that. These basically describe the shape of the curve you would plot in a graph of problem size in x against time taken in y. The logic is that a bad curve (one that gets steeper quickly) will always lead to a slower execution time if the problem is big enough. It may be faster on a very small problem (depending on the constants, which probably depends on the machine) but for small problems the execution time isn't generally a big issue anyway.

The "big O" specifies an upper bound on execution time. There are related notations for average execution time and lower bounds, but "big O" is the one that gets all the attention.

  • O(1) is constant time - the problem size doesn't matter.
  • O(log n) is a quite shallow curve - the time increases a bit as the problem gets bigger.
  • O(n) is linear time - each unit increase means it takes a roughly constant amount of extra time. The graph is (roughly) a straight line.
  • O(n log n) curves upwards more steeply as the problem gets more complex, but not by very much. This is the best that a general-purpose sorting algorithm can do.
  • O(n squared) curves upwards a lot more steeply as the problem gets more complex. This is typical for slower sorting algorithms like bubble sort.

The nastiest algorithms are classified as "np-hard" or "np-complete" where the "np" means "non-polynomial" - the curve gets steeper quicker than any polynomial. Exponential time is bad, but some are even worse. These kinds of things are still done, but only for very small problems.

EDIT the last paragraph is wrong, as indicated by the comment. I do have some holes in my algorithm theory, and clearly it's time I checked the things I thought I had figured out. In the mean time, I'm not quite sure how to correct that paragraph, so just be warned.

For your merging problem, consider that your two input lists are already sorted. The smallest item from your output must be the smallest item from one of your inputs. Get the first item from both and compare the two, and put the smallest in your output. Put the largest back where it came from. You have done a constant amount of work and you have handled one item. Repeat until both lists are exhausted.

Some details... First, putting the item back in the list just to pull it back out again is obviously silly, but it makes the explanation easier. Next - one input list will be exhausted before the other, so you need to cope with that (basically just empty out the rest of the other list and add it to the output). Finally - you don't actually have to remove items from the input lists - again, that's just the explanation. You can just step through them.

Steve314
NP does NOT mean "non-polynomial". It stands for "non-deterministic polynomial" and means that a non-deterministic turing-machine can solve the problem in polynomial time. Not every problem that is not in P, is in NP (also every problem that is in P is also in NP), i.e. there are problems which can not be solved in polynomial time by a non-deterministic turing-machine. (+1 anyway because the part about NP wasn't really relevant to the problem at hand and the rest of the post is excellent).
sepp2k
+1  A: 

This thread contains various implementations of a linear-time merge algorithm. Note that for practical purposes, you would use heapq.merge.

Mike Graham
A: 

If you build the result in reverse sorted order, you can use pop() and still be O(N)
pop() from the right end of the list does not require shifting the elements, so is O(1)
Reversing the list before we return it is O(N)

>>> def merge(l, r):
...     result = []
...     while l and r:
...         if l[-1] > r[-1]:
...             result.append(l.pop())
...         else:
...             result.append(r.pop())
...     result+=(l+r)[::-1]
...     result.reverse()
...     return result
... 
>>> merge([1,2,6,7], [1,3,5,9])
[1, 1, 2, 3, 5, 6, 7, 9]
gnibbler