views:

524

answers:

4

Hi,

I'm a newbie to Haskell, and I'm trying to write an elegant function to merge an arbitrary number of sorted lists into a single sorted list... Can anyone provide an elegant and efficient reference implementation?

Thanks!

+6  A: 

Something like this should work:

merge2 pred xs [] = xs
merge2 pred [] ys = ys
merge2 pred (x:xs) (y:ys) =
  case pred x y of
      True  -> x: merge2 pred xs (y:ys)
      False -> y: merge2 pred (x:xs) ys

merge pred [] = []
merge pred (x:[]) = x
merge pred (x:xs) = merge2 pred x (merge pred xs)

Here, the function merge2 merges 2 lists. The function merge merges a list of lists. The pred is predicate you use for sorting.

Example:

merge (<) [[1, 3, 9], [2, 3, 4], [7, 11, 15, 22]]

should return

[1,2,3,3,4,7,9,11,15,22]
Igor Krivokon
Just wondering, why do you use `case .. of {True -> ..; False -> ..}` when `if .. then .. else ..` would do the same?
ephemient
ephemient: both work fine. you could also use guards.
Martijn
Right, but applying pattern-matching on a boolean when if-then-else would be more idiomatic seems strange. It can't be for the sake of strictness, so I'm wondering if Igor just has odd style (relative to what I use and commonly see) or some other motivation.
ephemient
@ephemient: I wouldn't claim pattern matching is more idiomatic here. If-then-else would be equally good or better. In similar situations, I've seen pattern matching or guards more often than if-then-else, this is why I used it. But, again, I'm not that experienced with Haskell and might be looking at odd haskell code :) As a side note, this is matter of style and doesn't affect performance, since (for the compiler) if-then-else is syntax sugar around the pattern-matching case.
Igor Krivokon
I was trying to say that if-then-else is more idiomatic (in my experience) than case-of-True-False. But it really doesn't make a difference here... it just piqued my curiosity.
ephemient
You should "merge pred = foldl (merge2 pred) []" instead... it would be clearer.
Dietrich Epp
Epp: But what about infinite lists?
trinithis
I'd use guards and `@`-patterns in order to avoid re-concatenations.
Dario
+1  A: 

Since I like taking advantage of infix operators and higher-order functions where it makes sense to, I would write

infixr 5 @@
(@@) :: (Ord a) => [a] -> [a] -> [a]
-- if one side is empty, the merges can only possibly go one way
[] @@ ys = ys
xs @@ [] = xs
-- otherwise, take the smaller of the two heads out, and continue with the rest
(x:xs) @@ (y:ys) = case x `compare` y of
    LT -> x : xs @@ (y:ys)
    EQ -> x : xs @@ ys
    GT -> y : (x:xs) @@ ys

-- a n-way merge can be implemented by a repeated 2-way merge
merge :: (Ord a) => [[a]] -> [a]
merge = foldr1 (@@)

Here, xs @@ ys merges two lists by their natural ordering (and drops duplicates), while merge [xs, ys, zs..] merges any number of lists.

This leads to the very natural definition of the Hamming numbers:

hamming :: (Num a, Ord a) => [a]
hamming = 1 : map (2*) hamming @@ map (3*) hamming @@ map (5*) hamming
hamming = 1 : merge [map (n*) hamming | n <- [2, 3, 5]] -- alternative

-- this generates, in order, all numbers of the form 2^i * 3^j * 5^k
-- hamming = [1,2,3,4,5,6,8,9,10,12,15,16,18,20,24,25,27,30,32,36,40,45,48,50,..]


Stealing yairchu's unimplemented idea:

{-# LANGUAGE ViewPatterns #-}

import qualified Data.Map as M
import Data.List (foldl', unfoldr)
import Data.Maybe (mapMaybe)

-- merge any number of ordered lists, dropping duplicate elements
merge :: (Ord a) => [[a]] -> [a]
-- create a map of {n: [tails of lists starting with n]}; then
-- repeatedly take the least n and re-insert the tails
merge = unfoldr ((=<<) step . M.minViewWithKey) . foldl' add M.empty where
    add m (x:xs) = M.insertWith' (++) x [xs] m; add m _ = m
    step ((x, xss), m) = Just (x, foldl' add m xss)

-- merge any number of ordered lists, preserving duplicate elements
mergeDup :: (Ord a) => [[a]] -> [a]
-- create a map of {(n, i): tail of list number i (which starts with n)}; then
-- repeatedly take the least n and re-insert the tail
-- the index i <- [0..] is used to prevent map from losing duplicates
mergeDup = unfoldr step . M.fromList . mapMaybe swap . zip [0..] where
    swap (n, (x:xs)) = Just ((x, n), xs); swap _ = Nothing
    step (M.minViewWithKey -> Just (((x, n), xs), m)) =
        Just (x, case xs of y:ys -> M.insert (y, n) ys m; _ -> m)
    step _ = Nothing

where merge, like my original, eliminates duplicates, while mergeDup preserves them (like Igor's answer).

ephemient
A: 

if efficiency wasn't a concern I'd go with

merge = sort . concat

otherwise:

merge :: Ord a => [[a]] -> [a]
merge [] = []
merge lists =
  minVal : merge nextLists
  where
    heads = map head lists
    (minVal, minIdx) = minimum $ zip heads [0..]
    (pre, ((_:nextOfMin):post)) = splitAt minIdx lists
    nextLists =
      if null nextOfMin
      then pre ++ post
      else pre ++ nextOfMin : post

note however that this implementation always linearly searches for the minimum (while for a large number of list one may wish to maintain a heap etc.)

yairchu
`merge = sort . concat` won't work with infinite lists.
trinithis
A: 

Unlike the other posts, I would have merge :: [a] -> [a] -> [a]

type SortedList a = [a]

merge :: (Ord a) => SortedList a -> SortedList a -> SortedList a
merge []     ys     = ys
merge xs     []     = xs
merge (x:xs) (y:ys)
  | x < y     = x : merge xs (y : ys)
  | otherwise = y : merge (x : xs) ys

mergeAll :: (Ord a) => [SortedList a] -> SortedList a
mergeAll = foldr merge []
trinithis