views:

177

answers:

2

First of all, it's great. However, I came across a situation where my benchmarks turned up weird results. I am new to Haskell, and this is first time I've gotten my hands dirty with mutable arrays and Monads. The code below is based on this example.

I wrote a generic monadic for function that takes numbers and a step function rather than a range (like forM_ does). I compared using my generic for function (Loop A) against embedding an equivalent recursive function (Loop B). Having Loop A is noticeably faster than having Loop B. Weirder, having both Loop A and B together is faster than having Loop B by itself (but slightly slower than Loop A by itself).

Some possible explanations I can think of for the discrepancies. Note that these are just guesses:

  • Something I haven't learned yet about how Haskell extracts results from monadic functions.
  • Loop B faults the array in a less cache efficient manner than Loop A. Why?
  • I made a dumb mistake; Loop A and Loop B are actually different.
    • Note that in all 3 cases of having either or both Loop A and Loop B, the program produces the same output.

Here is the code. I tested it with ghc -O2 for.hs using GHC version 6.10.4 .

import Control.Monad
import Control.Monad.ST
import Data.Array.IArray
import Data.Array.MArray
import Data.Array.ST
import Data.Array.Unboxed

for :: (Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m ()
for start end step f = loop start where
    loop i
        | i <= end   = do
            f i
            loop (step i)
        | otherwise  = return ()

primesToNA :: Int -> UArray Int Bool
primesToNA n = runSTUArray $ do
    a <- newArray (2,n) True :: ST s (STUArray s Int Bool)
    let sr = floor . (sqrt::Double->Double) . fromIntegral $ n+1

    -- Loop A
    for 4 n (+ 2) $ \j -> writeArray a j False

    -- Loop B
    let f i
        | i <= n     = do
            writeArray a i False
            f (i+2)
        | otherwise  = return ()
        in f 4

    forM_ [3,5..sr] $ \i -> do
        si <- readArray a i
        when si $
            forM_ [i*i,i*i+i+i..n] $ \j -> writeArray a j False
    return a

primesTo :: Int -> [Int]
primesTo n = [i | (i,p) <- assocs . primesToNA $ n, p]

main = print $ primesTo 30000000
+2  A: 

Perhaps compare and contrast with the Shootout nsieve program? in any case, the only way to know what really is happening is to look at the core (e.g. with the ghc-core tool).

{-# OPTIONS -O2 -optc-O -fbang-patterns -fglasgow-exts -optc-march=pentium4 #-}
--
-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
--
-- Contributed by Don Stewart 2005
-- nsieve over an ST monad Bool array
--

import Control.Monad.ST
import Data.Array.ST
import Data.Array.Base
import System
import Control.Monad
import Data.Bits
import Text.Printf

main = do
    n <- getArgs >>= readIO . head :: IO Int
    mapM_ (\i -> sieve (10000 `shiftL` (n-i))) [0, 1, 2]

sieve n = do
   let r = runST (do a <- newArray (2,n) True :: ST s (STUArray s Int Bool)
                     go a n 2 0)
   printf "Primes up to %8d %8d\n" (n::Int) (r::Int) :: IO ()

go !a !m !n !c
    | n == m    = return c
    | otherwise = do
          e <- unsafeRead a n
          if e then let loop j
                          | j < m     = do
                              x <- unsafeRead a j
                              when x $ unsafeWrite a j False
                              loop (j+n)

                          | otherwise = go a m (n+1) (c+1)
                    in loop (n `shiftL` 1)
               else go a m (n+1) c
Don Stewart
+1  A: 

I just tried benchmarking this with Criterion and GHC 6.12.1, and Loop A looks only slightly faster for me. I definitely don't get the weird "both together are faster than B alone" effect.

Also, if your step function really is just a step and doesn't do anything wacky with its argument, the following version of for seems a bit faster, especially for smaller arrays:

for' :: (Enum a, Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m ()
for' start end step = forM_ $ enumFromThenTo start (step start) end

Here are the results from Criterion, where loopA' is your loop A using my for', and where loopC is both A and B together:

benchmarking loopA...
mean: 2.372893 s, lb 2.370982 s, ub 2.374914 s, ci 0.950
std dev: 10.06753 ms, lb 8.820194 ms, ub 11.66965 ms, ci 0.950

benchmarking loopA'...
mean: 2.368167 s, lb 2.354312 s, ub 2.381413 s, ci 0.950
std dev: 69.50334 ms, lb 65.94236 ms, ub 73.17173 ms, ci 0.950

benchmarking loopB...
mean: 2.423160 s, lb 2.419131 s, ub 2.427260 s, ci 0.950
std dev: 20.78412 ms, lb 18.06613 ms, ub 24.99021 ms, ci 0.950

benchmarking loopC...
mean: 4.308503 s, lb 4.304875 s, ub 4.312110 s, ci 0.950
std dev: 18.48732 ms, lb 16.19325 ms, ub 21.32299 ms, ci 0.950<

And here's the code:

module Main where

import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed

import Criterion.Main

for :: (Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m ()
for start end step f = loop start where
    loop i
        | i <= end   = do
            f i
            loop (step i)
        | otherwise  = return ()

for' :: (Enum a, Num a, Ord a, Monad m) => a -> a -> (a -> a) -> (a -> m b) -> m ()
for' start end step = forM_ $ enumFromThenTo start (step start) end

loopA  arr n = for  4 n (+ 2) $ flip (writeArray arr) False
loopA' arr n = for' 4 n (+ 2) $ flip (writeArray arr) False

loopB arr n =
  let f i | i <= n     = do writeArray arr i False
                            f (i+2)
          | otherwise  = return ()
  in f 4

loopC arr n = do
  loopA arr n
  loopB arr n

runPrimes loop n = do
    let sr = floor . (sqrt::Double->Double) . fromIntegral $ n+1
    a <- newArray (2,n) True :: (ST s (STUArray s Int Bool))

    loop a n

    forM_ [3,5..sr] $ \i -> do
        si <- readArray a i
        when si $
            forM_ [i*i,i*i+i+i..n] $ \j -> writeArray a j False
    return a

primesA  n = [i | (i,p) <- assocs $ runSTUArray $ runPrimes loopA  n, p]
primesA' n = [i | (i,p) <- assocs $ runSTUArray $ runPrimes loopA' n, p]
primesB  n = [i | (i,p) <- assocs $ runSTUArray $ runPrimes loopB  n, p]
primesC  n = [i | (i,p) <- assocs $ runSTUArray $ runPrimes loopC  n, p]

main = let n = 10000000 in
  defaultMain [ bench "loopA"  $ nf primesA  n
              , bench "loopA'" $ nf primesA' n
              , bench "loopB"  $ nf primesB  n
              , bench "loopC"  $ nf primesC  n ]
Travis Brown
I chose to implement the loop manually rather than use forM_ because it was a little faster when I tried it. I guess `for` versus `for'` depends on a lot of things.
Joey Adams
@Travis Brown: do you mean GHC 6.12.1?
yairchu
@yairchu: I'm sorry—I mistyped and have corrected. If I ever suddenly discover that I'm able to use compilers from the future I'll make a point of being clearer about it.
Travis Brown