tags:

views:

117

answers:

3

I find myself running into a problem commonly, when writing larger programs in Haskell. I find myself often wanting multiple distinct types that share an internal representation and several core operations.

There are two relatively obvious approaches to solving this problem.

One is using a type class and the GeneralizedNewtypeDeriving extension. Put enough logic into a type class to support the shared operations that the use case desires. Create a type with the desired representation, and create an instance of the type class for that type. Then, for each use case, create wrappers for it with newtype, and derive the common class.

The other is to declare the type with a phantom type variable, and then use EmptyDataDecls to create distinct types for each different use case.

My main concern is not mixing up values that share internal representation and operations, but have different meanings in my code. Both of those approaches solve that problem, but feel significantly clumsy. My second concern is reducing the amount of boilerplate required, and both approaches do well enough at that.

What are the advantages and disadvantages of each approach? Is there a technique that comes closer to doing what I want, providing type safety without boilerplate code?

+3  A: 

There's another straightforward approach.

data MyGenType = Foo | Bar

op :: MyGenType -> MyGenType
op x = ...

op2 :: MyGenType -> MyGenType -> MyGenType
op2 x y = ...

newtype MySpecialType {unMySpecial :: MyGenType}

inMySpecial f = MySpecialType . f . unMySpecial
inMySpecial2 f x y = ...

somefun = ... inMySpecial op x ...
someOtherFun = ... inMySpecial2 op2 x y ...

Alternately,

newtype MySpecial a = MySpecial a
instance Functor MySpecial where...
instance Applicative MySpecial where...

somefun = ... fmap op x ...
someOtherFun = ... liftA2 op2 x y ...

I think these approaches are nicer if you want to use your general type "naked" with any frequency, and only sometimes want to tag it. If, on the other hand, you generally want to use it tagged, then the phantom type approach more directly expresses what you want.

sclv
Really liking that shift key today, huh?
TomMD
Oops. Formatting fixed.
sclv
btw, you can auto-generate inMySpecial1,2,3../withMySpecial etc with template haskell. see http://github.com/yairchu/peakachu/blob/master/src/Data/Newtype.hs
yairchu
And in the second version, Functor can be automatically derived by the new GHC (but sadly not Applicative).
sclv
+1  A: 

I've benchmarked toy examples and not found a performance difference between the two approaches, but usage does typically differ a bit.

For instance, in some cases you have a generic type whose constructors are exposed and you want to use newtype wrappers to indicate a more semantically specific type. Using newtypes then leads to call sites like,

s1 = Specific1 $ General "Bob" 23
s2 = Specific2 $ General "Joe" 19

Where the fact that the internal representations are the same between the different specific newtypes is transparent.

The type tag approach almost always goes along with representation constructor hiding,

data General2 a = General2 String Int

and the use of smart constructors, leading to a data type definition and call sites like,

mkSpecific1 "Bob" 23

Part of the reason being that you want some syntactically light way of indicating which tag you want. If you didn't provide smart constructors, then client code would often pick up type annotations to narrow things down, e.g.,

myValue = General2 String Int :: General2 Specific1

Once you adopt smart constructors, you can easily add extra validation logic to catch misuses of the tag. A nice aspect of the phantom type approach is that pattern matching isn't changed at all for internal code that has access to the representation.

internalFun :: General2 a -> General2 a -> Int
internalFun (General2 _ age1) (General2 _ age2) = age1 + age2

Of course you can use the newtypes with smart constructors and an internal class for accessing the shared representation, but I think a key decision point in this design space is whether you want to keep your representation constructors exposed. If the sharing of representation should be transparent, and client code should be free to use whatever tag it wishes with no extra validation, then newtype wrappers with GeneralizedNewtypeDeriving work fine. But if you are going to adopt smart constructors for working with opaque representations, then I usually prefer phantom types.

Anthony
If memory serves me, `data Foo a = Foo a`, `data Foo a b = Foo a`, and `newtype Bar a = Bar (Foo a)` (with the first `Foo`) should all compile to the same runtime representation, so finding a non-trivial difference in performance would be somewhat unexpected.
camccann
@camccann The beauty of ghc-core and Criterion is empirical proof to supplement memory! :) I think the performance question has more to do with whether or not the operations coming from a class impacts their performance as opposed to the runtime representation of the value itself. Polymorphic functions go from `(General a, General b) => a -> b -> Int` to `General2 a -> General2 b -> Int`.
Anthony
+1  A: 

Put enough logic into a type class to support the shared operations that the use case desires. Create a type with the desired representation, and create an instance of the type class for that type. Then, for each use case, create wrappers for it with newtype, and derive the common class.

This presents some pitfalls, depending on the nature of the type and what kind of operations are involved.

First, it forces a lot of functions to be unnecessarily polymorphic--even if in practice every instance does the same thing for different wrappers, the open world assumption for type classes means the compiler has to account for the possibility of other instances. While GHC is definitely smarter than the average compiler, the more information you can give it the more it can do to help you.

Second, this can create a bottleneck for more complicated data structures. Any generic function on the wrapped types will be constrained to the interface presented by the type class, so unless that interface is exhaustive in terms of both expressivity and efficiency, you run the risk of either hobbling algorithms that use the type or altering the type class repeatedly as you find missing functionality.

On the other hand, if the wrapped type is already kept abstract (i.e., it doesn't export constructors) the bottleneck issue is irrelevant, so a type class might make good sense. Otherwise, I'd probably go with the phantom type tags (or possibly the identity Functor approach that sclv described).

camccann