views:

278

answers:

1

We're gettin' hairy here. I've tested a bunch of tree-synchronizing code on concrete representations of data, and now I need to abstract it so that it can run with any source and target that support the right methods. [In practice, this will be sources like Documentum, SQL hierarchies, and filesystems; with destinations like Solr and a custom SQL cross-reference store.]

The tricky part is that when I'm recursing down a tree of type T and synchronizing into a tree of type U, at certain files I need to do a "sub-sync" of a second type V to that type U at the current node. (V represents hierarchal structure inside a file...) And the type inference engine in F# is driving me around in circles on this, as soon as I try to add the sub-syncing to V.

I'm representing this in a TreeComparison<'a,'b>, so the above stuff results in a TreeComparison<T,U> and a sub-comparison of TreeComparison<V,U>.

The problem is, as soon as I supply a concrete TreeComparison<V,'b> in one of the class methods, the V type propagates through all of the inferring, when I want that first type parameter to stay generic (when 'a :> ITree). Perhaps there is some typing I can do on the TreeComparison<V,'b> value? Or, more likely, the inference is actually telling me something is inherently broken in the way I'm thinking about this problem.

This was really tricky to compress, but I want to give working code you can paste into a script and experiment with, so there are a ton of types at the beginning... core stuff is right at the end if you want to skip. Most of the actual comparison and recursion across the types via ITree has been chopped because it's unnecessary to see the inference problem that I'm banging my head against.

open System

type TreeState<'a,'b> = //'
  | TreeNew of 'a
  | TreeDeleted of 'b
  | TreeBoth of 'a * 'b

type TreeNodeType = TreeFolder | TreeFile | TreeSection

type ITree =
  abstract NodeType: TreeNodeType
  abstract Path: string
      with get, set

type ITreeProvider<'a when 'a :> ITree> = //'
  abstract Children : 'a -> 'a seq
  abstract StateForPath : string -> 'a

type ITreeWriterProvider<'a when 'a :> ITree> = //'
  inherit ITreeProvider<'a> //'
  abstract Create: ITree -> 'a //'
  // In the real implementation, this supports:
  // abstract AddChild : 'a -> unit
  // abstract ModifyChild : 'a -> unit
  // abstract DeleteChild : 'a -> unit
  // abstract Commit : unit -> unit

/// Comparison varies on two types and takes a provider for the first and a writer provider for the second.
/// Then it synchronizes them. The sync code is added later because some of it is dependent on the concrete types.
type TreeComparison<'a,'b when 'a :> ITree and 'b :> ITree> =
  {
    State: TreeState<'a,'b> //'
    ATree: ITreeProvider<'a> //'
    BTree: ITreeWriterProvider<'b> //'
  }

  static member Create(
                        atree: ITreeProvider<'a>,
                        apath: string,
                        btree: ITreeWriterProvider<'b>,
                        bpath: string) =
      { 
        State = TreeBoth (atree.StateForPath apath, btree.StateForPath bpath)
        ATree = atree
        BTree = btree
      }

  member tree.CreateSubtree<'c when 'c :> ITree>
    (atree: ITreeProvider<'c>, apath: string, bpath: string)
      : TreeComparison<'c,'b> = //'
        TreeComparison.Create(atree, apath, tree.BTree, bpath)

/// Some hyper-simplified state types: imagine each is for a different kind of heirarchal database structure or filesystem
type T( data, path: string ) = class
  let mutable path = path
  let rand = (new Random()).NextDouble
  member x.Data = data
  // In the real implementations, these would fetch the child nodes for this state instance
  member x.Children() = Seq.empty<T>

  interface ITree with
    member tree.NodeType = 
      if rand() > 0.5 then TreeFolder
      else TreeFile
    member tree.Path
      with get() = path
      and set v = path <- v
end

type U(data, path: string) = class
  inherit T(data, path)
  member x.Children() = Seq.empty<U>
end

type V(data, path: string) = class
  inherit T(data, path)
  member x.Children() = Seq.empty<V>
  interface ITree with
    member tree.NodeType = TreeSection
end


// Now some classes to spin up and query for those state types [gross simplification makes these look pretty stupid]
type TProvider() = class
  interface ITreeProvider<T> with
    member this.Children x = x.Children()
    member this.StateForPath path = 
      new T("documentum", path)
end

type UProvider() = class
  interface ITreeProvider<U> with
    member this.Children x = x.Children()
    member this.StateForPath path = 
      new U("solr", path)
  interface ITreeWriterProvider<U> with
    member this.Create t =
      new U("whee", t.Path)
end

type VProvider(startTree: ITree, data: string) = class
  interface ITreeProvider<V> with
    member this.Children x = x.Children()
    member this.StateForPath path = 
      new V(data, path)
end


type TreeComparison<'a,'b when 'a :> ITree and 'b :> ITree> with
  member x.UpdateState (a:'a option) (b:'b option) = 
      { x with State = match a, b with
                        | None, None -> failwith "No state found in either A and B"
                        | Some a, None -> TreeNew a
                        | None, Some b -> TreeDeleted b
                        | Some a, Some b -> TreeBoth(a,b) }

  member x.ACurrent = match x.State with TreeNew a | TreeBoth (a,_) -> Some a | _ -> None
  member x.BCurrent = match x.State with TreeDeleted b | TreeBoth (_,b) -> Some b | _ -> None

  member x.CreateBFromA = 
    match x.ACurrent with
      | Some a -> x.BTree.Create a
      | _ -> failwith "Cannot create B from null A node"

  member x.Compare() =
    // Actual implementation does a bunch of mumbo-jumbo to compare with a custom IComparable wrapper
    //if not (x.ACurrent.Value = x.BCurrent.Value) then
      x.SyncStep()
    // And then some stuff to move the right way in the tree


  member internal tree.UpdateRenditions (source: ITree) (target: ITree) =
    let vp = new VProvider(source, source.Path) :> ITreeProvider<V>
    let docTree = tree.CreateSubtree(vp, source.Path, target.Path)
    docTree.Compare()

  member internal tree.UpdateITree (source: ITree) (target: ITree) =
    if not (source.NodeType = target.NodeType) then failwith "Nodes are incompatible types"
    if not (target.Path = source.Path) then target.Path <- source.Path
    if source.NodeType = TreeFile then tree.UpdateRenditions source target

  member internal tree.SyncStep() =
    match tree.State with
    | TreeNew a     -> 
        let target = tree.CreateBFromA
        tree.UpdateITree a target
        //tree.BTree.AddChild target
    | TreeBoth(a,b) ->
        let target = b
        tree.UpdateITree a target
        //tree.BTree.ModifyChild target
    | TreeDeleted b -> 
        ()
        //tree.BTree.DeleteChild b

  member t.Sync() =
    t.Compare()
    //t.BTree.Commit()


// Now I want to synchronize between a tree of type T and a tree of type U

let pt = new TProvider()
let ut = new UProvider()

let c = TreeComparison.Create(pt, "/start", ut , "/path")
c.Sync()

The problem likely revolves around CreateSubtree. If you comment out either:

  1. The docTree.Compare() line
  2. The tree.UpdateITree calls

and replace them with (), then the inference stays generic and everything is lovely.

This has been quite a puzzle. I've tried moving the "comparison" functions in the second chunk out of the type and defining them as recursive functions; I've tried a million ways of annotating or forcing the typing. I just don't get it!

The last solution I'm considering is making a completely separate (and duplicated) implementation of the comparison type and functions for the sub-syncing. But that's ugly and terrible.

Thanks if you read this far! Sheesh!

+10  A: 

I have not analyzed the code enough to figure out why, but adding

  member internal tree.SyncStep() : unit =
                             //   ^^^^^^

seems to fix it.

EDIT

See also

http://stackoverflow.com/questions/1134647/why-does-f-infer-this-type

http://stackoverflow.com/questions/1131456/understanding-f-value-restriction-errors

http://stackoverflow.com/questions/2255602/unknown-need-for-type-annotation-or-cast

It takes experience to get a very deep understanding of the F# type inference algorithm's capabilities and limitations. But this example seems to be in a class of issues people run into when they do very advanced things. For members of a class, the F# inference algorithm does something like

  1. Look at all the member explicit signatures to set up an initial type environment for all the members
  2. For any members that have fully explicit signatures, fix their types to the explicit signature
  3. Start reading the method bodies top to bottom, left to right (you'll encounter some 'forward references' that may involved unsolved type variables when doing this, and that can cause trouble, because...)
  4. Solve all the member bodies concurrently (... but we have not done any 'generalization' yet, the part that would 'infer type parameters' rather than 'fix' what in theory could be a function of 'a to be whatever concrete type its first call site used)
  5. Generalize (any remaining unsolved type variables become actual inferred type variables of generic methods)

That may not be exactly right; I don't know it well enough to describe the algorithm, I just have a sense of it. You can always go read the language spec.

What often happens is you get as far as bullet 3 and forcing the inferencer to start trying to concurrently solve/constrain all the method bodies when in fact it's unnecessary because, e.g. maybe some function has an easy concrete fixed type. Like SyncStep is unit->unit, but F# doesn't know it yet in step 3, since the signature was not explicit, it just says ok SyncStep has type "unit -> 'a" for some yet-unsolved type 'a and then now SyncStep is now unnecessarily complicating all the solving by introducing an unnecessary variable.

The way I found this was, the first warning (This construct causes code to be less generic than indicated by the type annotations. The type variable 'a has been constrained to be type 'V') was on the last line of the body of UpdateRenditions at the call to docTree.Compare(). Now I know that Compare() should be unit -> unit. So how could I possibly be getting a warning about generic-ness there? Ah, ok, the compiler doesn't know the return type is unit at that point, so it must thing that something is generic that's not. In fact, I could have added the return type annotation to Compare instead of SyncStep - either one works.

Anyway, I'm being very long-winded. To sum up

  • if you have a well-type program, it should 'work'
  • sometimes the details of the inference algorithm will require some 'extra' annotations... in the worst case you can 'add them all' and then 'subtract away the unnecessary ones'
  • by using the compiler warnings and some mental model of the inference algorithm, you can quickly steer towards the missing annotation with experience
  • very often the 'fix' is just to add one full type signature (including return type) to some key method that is 'declared late' but 'called early' (introducing a forward reference among the set of members)

Hope that helps!

Brian
OK.I'm hyperventilating here. You are seriously AWESOME, Brian. I owe you a beer. Actually by this point I probably owe you dinner and a movie. Sheeeeesh.But I have a question: HOW THE HECK DID YOU FIGURE THAT OUT?Haha. I think I tried forcibly annotating EVERY OTHER FUNCTION in many different ways, but I never thought that a unit -> unit function would benefit from an annotation.
Dan Fitch
Yup, that does it, tests pass on the mock providers and everything. Now I just need to steal your brain.
Dan Fitch
Ok, I added a lot to the answer to suggest strategies you can use to help yourself out next time :)
Brian
Mega major thumbs up. Thank you so much. Thinking about the type resolution from the compiler's perspective is still fairly new to me. (I come from the scary scripty world...) Now the problem and the solution actually make sense. THANK YOU!
Dan Fitch
This answer just helped me /even more/ with a generic-typing issue. And thanks to your heuristic there, I was able to figure out what was going on all by my lil' self! I definitely think you should expand or repost this on your blog. It's kind of an addendum to http://lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!1526.entry anyway, for how inference is resolved in classes...
Dan Fitch