views:

268

answers:

4

I have a function that looks as follows:

let isInSet setElems normalize p = 
        normalize p |> (Set.ofList setElems).Contains

This function can be used to quickly check whether an element is semantically part of some set; for example, to check if a file path belongs to an html file:

let getLowerExtension p = (Path.GetExtension p).ToLowerInvariant()
let isHtmlPath = isInSet [".htm"; ".html"; ".xhtml"] getLowerExtension

However, when I use a function such as the above, performance is poor since evaluation of the function body as written in "isInSet" seems to be delayed until all parameters are known - in particular, invariant bits such as (Set.ofList setElems).Contains are reevaluated each execution of isHtmlPath.

How can best I maintain F#'s succint, readable nature while still getting the more efficient behavior in which the set construction is preevaluated.

The above is just an example; I'm looking for a general approach that avoids bogging me down in implementation details - where possible I'd like to avoid being distracted by details such as the implementation's execution order since that's usually not important to me and kind of undermines a major selling point of functional programming.

+2  A: 
  1. Currying does not hurt. Currying sometimes introduces closures as well. They are usually efficient too. refer to this question I asked before. You can use inline to boost performance if necessary.

  2. However, your performance problem in the example is mainly due to your code:

    normalize p |> (Set.ofList setElems).Contains

here you need to perform Set.ofList setElems even you curry it. It costs O(n log n) time. You need to change the type of setElems to F# Set, not List now. Btw, for small set, using lists is faster than sets even for querying.

Yin Zhu
I realize I can change my function semantics to work around performance issues - but that's not always easy nor desirable. The current code is easy to read; if I need to pass in sets the code's already getting longer - each new `isHtmlPath`-like function is getting more boilerplate. So, I'd like a better design pattern: where precomputable bits are precomputed without me (the lazy programmer) needing to give it much case-by-case thought.
Eamon Nerbonne
@Eamon I see your points. I don't think F# does this kind of optimization now. So we programmers are responsible to design efficient interfaces.
Yin Zhu
+4  A: 

As long as F# doesn't differentiate between pure and impure code, I doubt we'll see optimisations of that kind. You can, however, make the currying explicit.

let isInSet setElems =
    let set = Set.ofList setElems
    fun normalize p -> normalize p |> set.Contains

isHtmlSet will now call isInSet only once to obtain the closure, at the same time executing ofList.

Kha
I think this is what I'll do - and it can be written even shorter, too: `let isInSet setElems normalize = normalize >> (Set.ofList setElems).Contains`. Stylistically, however, would you prefer the more general, more flexible "return a lamba expression" or the less flexible, less general but shorter function composition version, in normal code?
Eamon Nerbonne
+3  A: 

@Kha's answer is spot on. F# cannot rewrite

// effects of g only after both x and y are passed
let f x y =
    let xStuff = g x
    h xStuff y

into

// effects of g once after x passed, returning new closure waiting on y
let f x =
    let xStuff = g x
    fun y -> h xStuff y

unless it knows that g has no effects, and in the .NET Framework today, it's usually impossible to reason about the effects of 99% of all expressions. Which means the programmer is still responsible for explicitly coding evaluation order as above.

Brian
+3  A: 

The answer from Kha shows how to optimize the code manually by using closures directly. If this is a frequent pattern that you need to use often, it is also possible to define a higher-order function that constructs the efficient code from two functions - the first one that does pre-processing of some arguments and a second one which does the actual processing once it gets the remaining arguments.

The code would look like this:

let preProcess finit frun preInput =  
  let preRes = finit preInput
  fun input -> frun preRes input

let f : string list -> ((string -> string) * string) -> bool =
  preProcess 
    Set.ofList                           // Pre-processing of the first argument
    (fun elemsSet (normalize, p) ->      // Implements the actual work to be 
      normalize p |> elemsSet.Contains) // .. done once we get the last argument

It is a question whether this is more elegant though...

Another (crazy) idea is that you could use computation expressions for this. The definition of computation builder that allows you to do this is very non-standard (it is not something that people usually do with them and it isn't in any way related to monads or any other theory). However, it should be possible to write this:

type CurryBuilder() = 
  member x.Bind((), f:'a -> 'b) = f
  member x.Return(a) = a
let curry = new CurryBuilder()

In the curry computation, you can use let! to denote that you want to take the next argument of the function (after evaluating the preceeding code):

let f : string list -> (string -> string) -> string -> bool = curry {
  let! elems = ()
  let elemsSet = Set.ofList elems
  printf "elements converted"
  let! normalize = ()
  let! p = ()
  printf "calling"
  return normalize p |> elemsSet.Contains }

let ff = f [ "a"; "b"; "c" ] (fun s -> s.ToLower()) 
// Prints 'elements converted' here
ff "C"
ff "D"
// Prints 'calling' two times

Here are some resources with more information about computation expressions:

Tomas Petricek
Do you have a link you'd recommend reading about computation expressions? I get the idea but don't quite grasp the details...
Eamon Nerbonne
@Eamon: I added two links to the end of my answer.
Tomas Petricek
Thanks for the great example and the somewhat mind-bending usage of computation expressions! This finally convinced me to look more closely at them - even if I'm not sure they're really needed here.
Eamon Nerbonne
@Eamon: I agree that this isn't the right use for computation expressions :-) nevertheless, they are quite interesting and powerful!
Tomas Petricek