views:

244

answers:

7

Hi,

I write the following Haskell code which take a triplet (x,y,z) and a list of triplets [(Int,Int,Int)] and look if there is a triplet (a,b,c) in the list such that x == a and y == b if it is a case i just need to update c = c + z, if there is not a such of triplet in the list I just add the triplet in the list

-- insertEdge :: (Int,Int,Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]

insertEdge (x,y,z) cs = 

if (length [(a,b,c) | (a,b,c) <- cs, a /= x || b /= y]) == (length cs) 

 then ((x,y,z):cs)) 

   else [if (a == x && b == y) then (a,b,c+1) else (a,b,c) | (a,b,c) <- cs]

after profilling my code it appears that this fonction take 65% of the execution time

how can i re-write my code to be more efficient ?

thanks for replying

+2  A: 

The main reason your function is slow is that it traverses the list at least twice, maybe three times. The function can be rewritten to to traverse the list only once using a fold. This will transform the list into a tuple (Bool,[(Int,Int,Int)]) where the Bool indicates if there was a matching element in the list and the list is the transformed list

insertEdge (x,y,z) cs = case foldr f (False,[]) cs of
                          (False,cs') -> (x,y,z):cs'
                          (True,cs')  -> cs' 
  where f (a,b,c) (e,cs') = if (a,b) == (x,y) then (True,(a,b,c+z):cs') else (e,(a,b,c):cs')

If you haven't seen foldr before, it has type

foldr :: (a -> b -> b) -> b -> [a] -> b

foldr embodies a pattern of recursive list processing of defining a base case and combining the current list element with the result from the rest of the list. Writing foldr f b xs is the same as writing a function g with definition

g [] = b
g (x:xs) = f x (g xs)
Geoff Reedy
+4  A: 

The first thing that jumps out at me is the conditional: length examines the entire list, so in the worst-case scenario (updating the last element) your function traverses the list three times: Once for the length of the filtered list, once for the length of cs, and once to find the element to update.

However, even getting rid of the extra traversals, the best you can do with the function as written will usually require a traversal of most of the list. From the name of the function and how much time was being spent in it, I'm guessing you're calling this repeatedly to build up a data structure? If so, you should strongly consider using a more efficient representation.

For instance, a quick and easy improvement would be to use Data.Map, the first two elements of the triplet in a 2-tuple as the key, and the third element as the value. That way you can avoid making so many linear-time lookups/redundant traversals.

As a rule of thumb, lists in Haskell are only an appropriate data structure when all you do is either walk sequentially down the list a few times (ideally, just once) or add/remove from the head of the list (i.e., using it like a stack). If you're searching, filtering, updating elements in the middle, or--worst of all--indexing by position, using lists will only end in tears.


Here's a quick example, if that helps:

import qualified Data.Map as M

incEdge :: M.Map (Int, Int) Int -> ((Int, Int), Int) -> M.Map (Int, Int) Int
incEdge cs (k,v) = M.alter f k cs
    where f (Just n) = Just $ n + v
          f Nothing  = Just v

The alter function is just insert/update/delete all rolled into one. This inserts the key into the map if it's not there, and sums the values if the key does exist. To build up a structure incrementally, you can do something like foldl incEdge M.empty edgeList. Testing this out, for a few thousand random edges your version with a list takes several seconds, whereas the Data.Map version is pretty much immediate.

camccann
Addendum: Yep, I forgot about `insertWith`. ADEpt has the right idea there.
camccann
Exactly. Whenever I see the word "optimize" and the data structure in use is a regular list, I cringe a little.
jrockway
+6  A: 

Other answers are correct, so I want to offer some unasked-for advice instead: how about using Data.Map (Int,Int) Int instead of list?

Then your function becomes insertWith (+) (a,b) c mymap

ADEpt
+2  A: 

Sticking with your data structure, you might

type Edge = (Int,Int,Int)

insertEdge :: Edge -> [Edge] -> [Edge]
insertEdge t@(x,y,z) es =
  case break (abx t) es of
    (_, []) -> t : es
    (l, ((_,_,zold):r)) -> l ++ (x,y,z+zold) : r
  where abx (a1,b1,_) (a2,b2,_) = a1 == a2 && b1 == b2

No matter what language you're using, searching lists is always a red flag. When searching you want sublinear complexity (think: hashes, binary search trees, and so on). In Haskell, an implementation using Data.Map is

import Data.Map

type Edge = (Int,Int,Int)

type EdgeMap = Map (Int,Int) Int
insertEdge :: Edge -> EdgeMap -> EdgeMap
insertEdge (x,y,z) es = alter accumz (x,y) es
  where accumz Nothing = Just z
        accumz (Just zold) = Just (z + zold)

You may not be familiar with alter:

alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a

O(log n). The expression (alter f k map) alters the value x at k, or absence thereof. alter can be used to insert, delete, or update a value in a Map. In short: lookup k (alter f k m) = f (lookup k m).

let f _ = Nothing
alter f 7 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "a")]
alter f 5 (fromList [(5,"a"), (3,"b")]) == singleton 3 "b"

let f _ = Just "c"
alter f 7 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "a"), (7, "c")]
alter f 5 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "c")]

But as ADEpt shows in another answer, this is a bit of overengineering.

Greg Bacon
N.B. -- `Data.Map` isn't a hash table, it's a binary search tree. `Data.IntMap` isn't a hash table either but has similar performance characteristics as far as pure data structures go, since it uses a radix tree (meaning time complexity depends mostly on key length, not tree size).
camccann
@camccann Good point, which I did consider while writing my answer. I opted for the oversimplification because the context was meant to be a rule of thumb. I'll clarify my answer.
Greg Bacon
+3  A: 

It's always a good idea to benchmark (and Criterion makes it so easy). Here are the results for the original solution (insertEdgeO), Geoff's foldr (insertEdgeF), and Data.Map (insertEdgeM):

benchmarking insertEdgeO...
mean: 380.5062 ms, lb 379.5357 ms, ub 381.1074 ms, ci 0.950

benchmarking insertEdgeF...
mean: 74.54564 ms, lb 74.40043 ms, ub 74.71190 ms, ci 0.950

benchmarking insertEdgeM...
mean: 18.12264 ms, lb 18.03029 ms, ub 18.21342 ms, ci 0.950

Here's the code (I compiled with -O2):

module Main where
import Criterion.Main
import Data.List (foldl')
import qualified Data.Map as M

insertEdgeO :: (Int, Int, Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdgeO (x, y, z) cs =
  if length [(a, b, c) | (a, b, c) <- cs, a /= x || b /= y] == length cs
    then (x, y, z) : cs
    else [if (a == x && b == y) then (a, b, c + z) else (a, b, c) | (a, b, c) <- cs]

insertEdgeF :: (Int, Int, Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdgeF (x,y,z) cs =
  case foldr f (False, []) cs of
    (False, cs') -> (x, y, z) : cs'
    (True, cs')  -> cs'
  where
    f (a, b, c) (e, cs')
      | (a, b) == (x, y) = (True, (a, b, c + z) : cs')
      | otherwise        = (e, (a, b, c) : cs')

insertEdgeM :: (Int, Int, Int) -> M.Map (Int, Int) Int -> M.Map (Int, Int) Int
insertEdgeM (a, b, c) = M.insertWith (+) (a, b) c

testSet n = [(a, b, c) | a <- [1..n], b <- [1..n], c <- [1..n]]

testO = foldl' (flip insertEdgeO) [] . testSet
testF = foldl' (flip insertEdgeF) [] . testSet
testM = triplify . M.toDescList . foldl' (flip insertEdgeM) M.empty . testSet
  where
    triplify = map (\((a, b), c) -> (a, b, c))

main = let n = 25 in defaultMain
  [ bench "insertEdgeO" $ nf testO n
  , bench "insertEdgeF" $ nf testF n
  , bench "insertEdgeM" $ nf testM n
  ]

You can improve insertEdgeF a bit by using foldl' (55.88634 ms), but Data.Map still wins.

Travis Brown
I always like your answers, Travis, but it's important he realizes what his old code did wrong too. Something like: the old code performed two length calls at O(n) each and two list comprehensions that are O(n) each (assuming no fancy optimizations). insertEdgeM does a single O(log n) insert and insertEdgeF does a single O(n) traversal. (I think I got that right, its still early).
TomMD
Thanks, Tom, and I certainly didn't mean to offer this as some kind of complete story–it's just a little supplement to the answers by camccann and Geoff.
Travis Brown
A: 

Very small optimisation: Use an as-pattern, this avoids multiple reconstructions of the same tuple. Like this:

insertEdge xyz@(x,y,z) cs =
  if (length [abc | abc@(a,b,c) <- cs, a /= x || b /= y]) == (length cs) 
    then (xyz:cs)) 
    else [if (a == x && b == y) then (a,b,c+1) else abc' | abc'@(a,b,c) <- cs]

You should apply the other optimization hionts first, but this may save a very small amount of time, since the tuple doesn't have to be reconstructed again and again. At least in the last at-pattern (The first two patterns are not important, since the tuple never gets evaluated in the first case and the as-pattern is only applied once in the second case).

FUZxxl
A: 

In

insertEdgeM :: (Int, Int, Int) -> M.Map (Int, Int) Int -> M.Map (Int, Int) Int
insertEdgeM (a, b, c) = M.insertWith (+) (a, b) c

you want to use the strict version of insertWith, namely insertWith'.

tibbe