views:

321

answers:

3

This is homework :-)

Write a function that counts the number of elements in the list that are larger than or equal to the average (using integer division for simplicity).
Using just a single traversal of the list structure!


I already have a solution to this, BUT it involves ref variable changed from closure foo'.

I'm interested in a way how to functionally pass value when [] is met?


My naïve solution using ref:

let foo ls =
    let avg = ref 0
    let rec foo' xs sumAcc lenAcc  =
        match xs with
        | x'::xs'   ->
            let s = foo' xs' (x' + sumAcc) (1 + lenAcc)
            if x' < !avg then s else s + 1
        | []        ->
            avg := (sumAcc / lenAcc) //? how to change THIS to funtional code ?
            0
    foo' ls 0 0



EDIT(3):

I was interested in performance... on list [1..11000]

`(my solution with REF) 5501: elapsed <00.0108708>`  
`(nlucaroni)            5501: elapsed <00.0041484>`  
`(kvb)                  5501: elapsed <00.0029200>`  <-- continuation is fastest
`(two pass solution)    5501: elapsed <00.0038364>`

since 1. and 3. solutions are non-tail-recursive,


// simple two-pass solution
let foo2pass (xs : System.Numerics.BigInteger list) =
    let len = System.Numerics.BigInteger.Parse(xs.Length.ToString())
    let avg = List.sum xs / len
    (List.filter (fun x -> x >= avg) xs).Length


two pass and kvb's version works on big lists, ie: list [1I .. 10 000 000I]:

(two-pass solution)   5000001: elapsed <00:00:12.3200438>     <-- 12 first time
(two-pass solution)   5000001: elapsed <00:00:06.7956307>     <-- 6
(two-pass solution)   5000001: elapsed <00:00:09.1390587>     <-- 9? WHY IS THAT
(two-pass solution)   5000001: elapsed <00:00:06.8345791>     <-- 6
(two-pass solution)   5000001: elapsed <00:00:09.1071856>     <-- 9? WHY IS THAT

5 times for each solution

(kvb tail-recursive) 5000001I: elapsed <00:00:21.1825866>   <-- 21 first time
(kvb tail-recursive) 5000001I: elapsed <00:00:14.8113939>   <-- stable
(kvb tail-recursive) 5000001I: elapsed <00:00:14.8335997>
(kvb tail-recursive) 5000001I: elapsed <00:00:14.8418234>
(kvb tail-recursive) 5000001I: elapsed <00:00:14.8331327>

and for list [1I .. 1 000 000I], kvb's solution is faster

(two-pass solution) 500001I: elapsed <00:00:01.8975782>
(kvb tail-recursive) 500001: elapsed <00:00:00.6004453>
+7  A: 

You just need to pass the average up the stack with the return value:

let foo ls =
    let rec foo xs sumAcc lenAcc  = match xs with
        | x::xs -> let avg,s = foo xs (x + sumAcc) (1 + lenAcc) in
                   if x < avg then (avg,s) else (avg,s+1)
        | []    -> (sumAcc / lenAcc),0
    in
    let avg,res = foo ls 0 0 in
    res
nlucaroni
+1: took me a few moments to figure out what you're doing there, but its actually pretty clever :)
Juliet
Assuming the OP is using light syntax, you can drop the `in` keywords to make things a bit cleaner.
bcat
#light? *shiver*, no thanks. Call me formal but that `in` means something in the readability of the code.
nlucaroni
+7  A: 

Here's another option:

let foo =
  let rec helper sum ct getCt = function
  | x::xs -> 
      helper (sum+x) (ct+1) (fun avg -> getCt(avg) + (if avg <= x then 1 else 0)) xs
  | [] -> getCt(sum/ct)
  helper 0 0 (fun avg -> 0)

To help clarify what's going on here, I'll describe the parameters for the helper function:

  • sum: the sum of all items seen so far
  • ct: the count of all items seen so far
  • getCt: a function taking a single parameter and which returns the tally of the number of items seen so far which are at least as large as that parameter
  • the final list parameter which is pattern matched
    • if it's empty, then calculate the average of all items by dividing the total by the count, and then pass this to the getCt function to determine how many items were greater than it.
    • otherwise, recurse into the tail of the list, passing in an updated total and count. The new getCt function should call the previous getCt function to see how many items prior to this one are greater than the average, and then increment that total if this item was also greater.

It's also possible to create a modified version that uses only tail calls, so it won't cause a stack overflow even on lists of arbitrary size. To do this, our getCt function now needs an accumulator parameter representing the count so far:

let foo =
  let rec helper sum ct getCt = function
  | x::xs -> 
      helper (sum+x) (ct+1) (fun avg n -> getCt avg (if avg <= x then n+1 else n)) xs
  | [] -> getCt (sum/ct) 0
  helper 0 0 (fun avg n -> n)
kvb
+1: continuation passing occasionally makes people claw their eyes out, but this example is particularly easy to read.
Juliet
+1: I've only misty idea how that works... But like it. Can You elaborate, please? This is interesting!
DinGODzilla
+100 Thank You, I didn't know that before. I can imagine lot of another cases where this is usefull.
DinGODzilla
@kvb: This version actually causes stack overflow. I tested it...
DinGODzilla
@DinGODzilla: Really? I can run `foo [1 .. 10000000]` and get a result (which exhibits arithmetic overflow, but that's a different issue), and when I try `foo [1 .. 100000000]` I get an out of memory exception, not a stack overflow...
kvb
@kvb: Oddly, in Visual Studio it overflows even on `[1 .. 12000]`, but in FSI it just works.. :-/ No idea why, but will test it in FSI. Thanks kvb!
DinGODzilla
@DinGODzilla: Check your Visual Studio project options and make sure tail calls are enabled. I believe they're not in the default debug configuration.
bcat
@DinGODzilla: most likely the out-of-memory exception results because you're using [1 .. 100000000], not because of the kvb's code. Remember, [ n .. m ] constructs all the elements in a list on demand, so a list of 100,000,000 integers consumes 4 bytes per integer + 4 bytes per Cons pointer + 4 bytes per tail pointer. So, you're looking at around 12 bytes per element, or around 1.1 GB of memory *just to construct the list*.
Juliet
A: 

Haskell's lazy evaluation really shines in "tying the knot":

avgList t = let (final, sum, count) = go t 0 0 0
                avg = sum `div` count
                go [] finalA sumA countA = (finalA, sumA, countA)
                go (x:xs) finalA sumA countA = go xs (x' + finalA) (sumA + x) (countA + 1)
                    where x' = if x >= avg then 1 else 0
            in final
Wei Hu