tags:

views:

469

answers:

8

I'm a C# developer who is working through "Real World Haskell" in order to truly understand functional programming, so that when I learn F#, I'll really grok it and not just "write C# code in F#", so to speak.

Well, today I came across an example which I thought I understood 3 different times, only to then see something I missed, update my interpretation, and recurse (and curse too, believe me).

Now I believe that I do actually understand it, and I have written a detailed "English interpretation" below. Can you Haskell gurus please confirm that understanding, or point out what I have missed?

Note: The Haskell code snippet (quoted directly from the book) is defining a custom type that is meant to be isomorphic to the built in Haskell list type.

The Haskell code snippet

data List a = Cons a (List a)
              | Nil
              defining Show

EDIT: After some responses, I see one misunderstanding I made, but am not quite clear on the Haskell "parsing" rules that would correct that mistake. So I've included my original (incorrect) interpretation below, followed by a correction, followed by the question that still remains unclear to me.

EDIT: Here is my original (incorrect) "English interpretation" of the snippet

  1. I am defining a type called "List".
  2. The List type is parameterised. It has a single type parameter.
  3. There are 2 value constructors which can be used to make instances of List. One value constructor is called "Nil" and the other value constructor is called "Cons".
  4. If you use the "Nil" value constructor, then there are no fields.
  5. The "Cons" value constructor has a single type parameter.
  6. If you use the "Cons" value constructor, there are 2 fields which must be provided. The first required field is an instance of List. The second required field is an instance of a.
  7. (I have intentionally omitted anything about "defining Show" because it is not part of what I want to focus on right now).

The corrected interpretation would be as follows (changes in BOLD)

  1. I am defining a type called "List".
  2. The List type is parameterised. It has a single type parameter.
  3. There are 2 value constructors which can be used to make instances of List. One value constructor is called "Nil" and the other value constructor is called "Cons".
  4. If you use the "Nil" value constructor, then there are no fields.

    5. (this line has been deleted ... it is not accurate) The "Cons" value constructor has a single type parameter.

  5. If you use the "Cons" value constructor, there are 2 fields which must be provided. The first required field is an instance of a. The second required field is an instance of "List-of-a".

  6. (I have intentionally omitted anything about "defining Show" because it is not part of what I want to focus on right now).

The question which is still unclear

The initial confusion was regarding the portion of the snippet that reads "Cons a (List a)". In fact, that is the part that is still unclear to me.

People have pointed out that each item on the line after the "Cons" token is a type, not a value. So that means this line says "The Cons value constructor has 2 fields: one of type 'a' and the other of type 'list-of-a'."

That is very helpful to know. However, something is still unclear. When I create instances using the Cons value constructor, those instances "interpret" the first 'a' as meaning "place the value passed in here." But they do not interpret the second 'a' the same way.

For example, consider this GHCI session:

*Main> Cons 0 Nil
Cons 0 Nil
*Main> Cons 1 it
Cons 1 (Cons 0 Nil)
*Main>

When I type "Cons 0 Nil", it uses the "Cons" value constructor to create an instance of List. From 0, it learns that the type parameter is "Integer". So far, no confusion.

However, it also determines that the value of the first field of the Cons is 0. Yet it determines nothing about the value of the second field ... it only determines that the second field has a type of "List Integer".

So my question is, why does "a" in the first field mean "the type of this field is 'a' and the value of this field is 'a'", while "a" in the second field means only "the type of this field is 'List of a'"?

EDIT: I believe I have now seen the light, thanks to several of the responses. Let me articulate it here. (And if somehow it is still incorrect in some fashion, please by all means let me know!)

In the snippet "Cons a (List a)", we are saying that the "Cons" value constructor has two fields, and that the first field is of type 'a', and that the second field is of type 'List of a'.

That is all we are saying! In particular, we are saying NOTHING about values! This is a key point I was missing.

Later, we want to create an instance, using the "Cons" value constructor. We type this into the interpreter: "Cons 0 Nil". This explicitly tells the Cons value constructor to use 0 for the value of the first field, and to use Nil as the value for the second field.

And that's all there is to it. Once you know that the value constructor definition specifies nothing but types, everything becomes clear.

Thanks everyone for the helpful responses. And as I said, if anything is still off, please by all means tell me about it. Thanks.

+8  A: 
  • The "Cons" value constructor has a single type parameter.

Nope: you've already parametrized it when you declared data List a. One effective property of this is that if I have a Nil :: List Int, I can't interchange it with a Nil :: List Char.

  • If you use the "Cons" value constructor, there are 2 fields which must be provided. The first required field is an instance of List. The second required field is an instance of a.

You've got it swapped: the first required field is an instance of a, the second field is an instance of List.

This chapter of Real World Haskell may be of interest.

Thanks. That is the chapter I'm on right now. So ... when the code says "Cons a (List a)", I thought the "Cons a" part of that was stating that the Cons value constructor was parameterised. They haven't yet covered the syntax for parameterised types, so I guessed that the syntax must require re-stating "a" if you intend to use a. But you're saying that's not necessary? And therefore that's not what that "a" means?

Nope. Once we declare a parameter in our type, we get to reuse it otherwise to say "that type should be used there." It's a little bit like an a -> b -> a type signature: a is parameterizing the type, but then I have to use the same a as the return value.

OK, but this is confusing. It seems that the first "a" means "the first field is an instance of a",

Nope, that is not true. It just means that the data type parametrizes over some type a.

and it ALSO means "the first field has the same value as the value they passed in for a". In other words, it specifies type AND value.

No, that is also not true.

Here's an instructive example, the syntax of which you may or may not have seen before:

foo :: Num a => a -> a

This is a fairly standard signature for a function that takes a number and does something to it and gives you another number. What I actually mean by "a number" in Haskell-speak, though, is some arbitrary type "a" that implements the "Num" class.

Thus, this parses to the English:

Let a indicate a type implementing the Num typeclass, then the signature of this method is one parameter with the type a, and the return value of the type a

Something similar happens with data.

It also occurs to me that the instance of List in the specification of Cons is also confusing you: be really careful when parsing that: whereas Cons is specifying a constructor, which is basically a pattern that Haskell is going to wrap the data into, (List a) looks like a constructor but is actually simply a type, like Int or Double. a is a type, NOT a value in any sense of the term.

Edit: In response to the most recent edit.

I think a dissection is first called for. Then I'll deal with your questions point by point.

Haskell data constructors are a little weird, because you define the constructor signature, and you don't have to make any other scaffolding. Datatypes in Haskell don't have any notion of member variable. (Note: there's an alternate syntax that this way of thinking is more amenable to, but let's ignore that for now).

Another thing is that Haskell code is dense; its type signature are like that to. So expect to see the same symbol reused in different contexts. Type inferencing also plays a big role here.

So, back to your type:

data List a = Cons a (List a)
              | Nil

I chunk this into several pieces:

data List a

This defines the name of the type, and any parameterized types that it will have later on. Note that you will only see this show up in other type signatures.

Cons a (List a) |
Nil

This is the name of the data constructor. This IS NOT a type. We can, however, pattern match for it, ala:

foo :: List a -> Bool
foo Nil = True

Notice how List a is the type in the signature, and Nil is both the data constructor and the "thing" we pattern match for.

Cons a (List a)

These are the types of the values we slot into the constructor. Cons has two entries, one is of type a, and one is of type List a.

So my question is, why does "a" in the first field mean "the type of this field is 'a' and the value of this field is 'a'", while "a" in the second field means only "the type of this field is 'List of a'"?

Simple: don't think of it as us specifying the type; think of it has Haskell is inferencing the type out of it. So, for our intents and purposes, we're simply sticking a 0 in there, and a Nil in the second section. Then, Haskell looks at our code and thinks:

  • Hmm, I wonder what the type of Cons 0 Nil is
  • Well, Cons is a data constructor for List a. I wonder what the type of List a is
  • Well, a is used in the first parameter, so since the first parameter is an Int (another simplification; 0 is actually a weird thing that is typeclassed as Num), so that's mean a is a Num
  • Hey, well, that also means that the type of Nil is List Int, even though there's nothing there that would actually say that

(Note, that's not actually how it's implemented. Haskell can do a lot of weird things while inferencing types, which is partially why the error messages suck.)

Edward Z. Yang
Thanks. That is the chapter I'm on right now. So ... when the code says "Cons a (List a)", I thought the "Cons a" part of that was stating that the Cons value constructor was parameterised. They haven't yet covered the syntax for parameterised types, so I guessed that the syntax must require re-stating "a" if you intend to use a. But you're saying that's not necessary? And therefore that's not what that "a" means?
Charlie Flowers
See my edit. We restate a because we're using the parameter.
Edward Z. Yang
OK, but this is confusing. It seems that the first "a" means "the first field is an instance of a", and it ALSO means "the first field has the same value as the value they passed in for a". In other words, it specifies type AND value. But the *second* "a" specifies only *type*. It says, "The second field must be of type list-of-a". And it doesn't say anything about value. Is that right? If so, what rule determines when a means value-and-type versus type-only?
Charlie Flowers
See my edit. When we're specifying this a "data" declaration, we're specifying a sort of pattern that, when we call the data constructor latter, the data will "slot itself into". Part of this pattern is the types of the inner bits.
Edward Z. Yang
OK, one final bit of confusion remains. It seems that a is taken as a value for the first field. See my edited question. Thanks.
Charlie Flowers
See my response. Type inferencing is the answer!
Edward Z. Yang
I get it now, thanks to pretty much a combination of many answers here. "a" and "List a" are NOTHING but types. And then when I type "Cons 0 Nil", I'm providing two EXPLICIT values. Those 2 values get assigned to the 2 fields, end of story. I understand how the type inferencing "carries over" from field one to field 2, because C# generics do the same thing (though I hear Haskell is better at it). Thanks.
Charlie Flowers
Great! Now you should mark an answer on this question ;-)
Edward Z. Yang
+2  A: 
Cons a (List a)

The first field of a Cons is a value of type "a". The second is a value of type "List a", i.e. a List parameterized with the same type as the parameter of the current list.

newacct
Thanks for the response. I now know that what you say is correct, but I have questions about why that is the case. Please see my comments on the other response (from Ambush Commander). Basically, it seems the first "a" specifies type AND value, but the second "a" specifies only type. Why is that?
Charlie Flowers
No, first `a` specifies that first parameter to `Cons` constructor is of type `a` and second one is of type `List a`.
Hynek -Pichi- Vychodil
OK, one final bit of confusion remains. It seems that a is taken as a value for the first field. See my edited question. Thanks.
Charlie Flowers
+3  A: 

5 is wrong, and I'd say 6 as follows to replace both:

Cons{1} a{2} (List a){3} is a constructor called Cons (part before {1}) for a value of type List a (the data List a part) that needs two values: one of type a (the part between {1} and {2}) and one of type List a(the part between {2} and {3}).

To help you with an apparent source of confusion: in Haskell you hardly ever have to give explicit type parameters - type inference infers the types from your values. So in a sense, yes, when you pass in a value to a function or constructor, you also specify a type, viz. the type of the passed-in value.

Kurt Schelfthout
OK, one final bit of confusion remains. It seems that a is taken as a value for the first field. See my edited question. Thanks.
Charlie Flowers
+2  A: 

Yeah the data syntax is a little confusing, because it puns names and types and doesn't really make a syntactic distinction between them. In particular, in a constructor definition like:

Cons a (List a)

The first word is the name of the constructor; every other word is the name of some predeclared type. So both a and List a are already in scope (the a was brought in to scope by the a in "data List a", and you're saying that those are the types of the parameters. Their role might be better demonstrated by stating the same thing using record syntax:

Cons { head :: a, tail :: List a }

I.e. a value of type List Int, if it was constructed with the Cons constructor, has two fields: an Int and a List Int. If it was constructed with Nil, it has no fields.

luqui
OK, one final bit of confusion remains. It seems that a is taken as a value for the first field. See my edited question. Thanks.
Charlie Flowers
+2  A: 

When I type "Cons 0 Nil", it uses the "Cons" value constructor to create an instance of List. From 0, it learns that the type parameter is "Integer". So far, no confusion.

However, it also determines that the value of the first field of the Cons is 0. Yet it determines nothing about the value of the second field ... it only determines that the second field has a type of "List Integer".

No, it determines that the second field's value is Nil. Given your definition, Nil is a value of the type List a. Therefore so is Cons 0 Nil. And in Cons 1 it the second field's value is it; i.e., Cons 0 Nil. This is precisely what the REPL shows: Cons 1 (Cons 0 Nil).

Alexey Romanov
I am starting to see it, from your answer and the other ones here. The values must always be *explicitly* supplied (at least as far as I know :). The *types* are what can be inferred. So "a" and "List a" specify nothing but the *type*. And when I call it, I am giving it (explicitly) a value for field one and a value for field 2.
Charlie Flowers
+3  A: 

Analogies are usually lacking in all sorts of ways, but since you know C# I thought this might be helpful.

This is how I would describe the List a definition in C#, maybe this clears some things up (or more likely, confuses you even more).

class List<A>
{
}

class Nil<A> : List<A>
{
    public Nil() {}
}

class Cons<A> : List<A>
{
    public A Head;
    public List<A> Tail;

    public Cons(A head, List<A> tail)
    {
     this.Head = head;
     this.Tail = tail;
    }
}

As you can see;

  • the List type as a single type parameter (<A>),
  • the Nil constructor doesn't have any parameters,
  • and the Cons constructor has two parameters, a value of type A and a list of type List<A>

Now, in Haskell the Nil and Cons are just constructors for the List a data type, in C# they are also types in and of themselves, so that's where the analogy fails.

But I hope this gives you some intuitive sense of what the different A's represent.

(And please do comment on how this horrible comparison doesn't do justice to Haskell's data types.)

Tom Lokhorst
Yes, this helps. It emphasizes the fact that "a" and "List a" are just TYPES. I come along later and provide EXPLICIT VALUES when I type "Cons 0 Nil". The first field becomes "0" because I told it to use that value, not because "a" was used in the definition of the value constructor. Your response plus several others have helped me get this.
Charlie Flowers
+1  A: 

I looked at your edited question.

When I create instances using the Cons value constructor, those instances "interpret" the first 'a' as meaning "place the value passed in here.

In "Cons a (List a)", both "a" and "List a" are types. I don't understand what "value" has to with it.

When I type "Cons 0 Nil", it uses the "Cons" value constructor to create an instance of List. From 0, it learns that the type parameter is "Integer". So far, no confusion.

However, it also determines that the value of the first field of the Cons is 0. Yet it determines nothing about the value of the second field ... it only determines that the second field has a type of "List Integer".

The value of the second field is Nil.

So my question is, why does "a" in the first field mean "the type of this field is 'a' and the value of this field is 'a'", while "a" in the second field means only "the type of this field is 'List of a'"?

"a" in the first field means "the type of this field is 'a'". "List a" in the second field means "the type of this field is 'List a'". In the case of "Cons 0 Nil" above, 'a' is inferred to be "Integer". So "Cons a (List a)" becomes "Cons Integer (List Integer)". The 0 is a value of type Integer. The Nil is a value of type "List Integer".

the value of this field is 'a'

I don't understand what you mean by this. 'a' is a type variable; what does it have to do with values?

newacct
Thanks. Your response plus several others has helped me get it. In "Cons a (List a)", both "a" and "List a" are NOTHING but types. When I later type "Cons 0 Nil", it infers that a is Integer (close enough for now), and sets first field to zero and second field to Nil. I was missing the fact that, while it will infer types, it is NOT inferring values ... those are provided explicitly.
Charlie Flowers
A: 

Just to give you some extra "help", in case you are still watching this thread. Haskell has a couple of conventions that mess up others' ideas of how things ought to be done - in Haskell, a parameterized type is so commonly accepted, that it is usually thought of as being a type-level function. Likewise, the value constructors are thought of as "special" functions, that also allow pattern matching, in addition to their "take a value (or more) and produce a value as a result".

Another "funny" characteristic of Haskell is that it does not explicitly (or implicitly) evaluate the arguments to a function, even if that argument is in parentheses. Let me state that slightly differently: Haskell functions do not evaluate arguments in parentheses before other arguments. Arguments are put in parentheses for grouping purposes only, not to have them evaluated "first". Haskell assigns arguments to ("applies") a function at a higher precedence than any other operation - even higher than an implicit function application of one of its own arguments. This is why the Cons constructor has parens around the second argument, (List a) - to tell the compiler that Cons has two arguments, and not three. The parentheses are for grouping only, and not for precedence!

As somewhat of a side topic, be careful with the types in F#. Since F# has its roots in ML, its parameterized types have the parameters in front - int list, not (List Int) in back! Haskell does it the other way, because it's the same way that Haskell does functions - first the function, then arguments to the function. This encourages a common use pattern, and explains why Haskell types and value constructors are Capitalized - to remind you that you're dealing with a type/class-related thing.

Okay, I'm done; thanks for letting me put this huge Wall O' Text on your property...

BMeph