views:

451

answers:

6

Hi,

From what I understand, the list type in Haskell is implemented internally using a linked list. However, the user of the language does not get to see the details of the implementation, nor does he have the ability to modify the "links" that make up the linked list to allow it to point to a different memory address. This, I suppose, is done internally.

How then, can the list type be qualified as in Haskell ? Is it a "data type" or an "abstract data type"? And what of the linked list type of the implementation ?

Additionally, since the list type provided by the Prelude is not a linked list type, how can the basic linked list functions be implemented ?

Take, for example, this piece of code designed to add an element a at the index n of a list :

add [] acc _ _ = reverse acc
add (x:xs) acc 0 a = add xs (x:a:acc) (-1) a 
add (x:xs) acc n a = add xs (x:acc) (n-1) a

Using a "real" linked list, adding an element would just consist of modifying a pointer to a memory address. This is not possible in Haskell (or is it ?), thus the question : is my implementation of adding an element to a list the best possible one, or am I missing something (the use of the reverse function is, I think, particularly ugly, but is it possible to do without ?)

Please, do not hesitate to correct me if anything I have said is wrong, and thank you for your time.

+7  A: 

Haskell is a purely functional programming language. This means no change can be done at all.

The lists are non-abstract types, it's just a linked list.

You can think of them defined in this way:

data [a] = a : [a] | []

which is exactly the way a linked list is defined - A head element and (a pointer to) the rest.

Note that this is not different internally - If you want to have more efficient types, use Sequence or Array. (But since no change is allowed, you don't need to actually copy lists in order to distinguish between copies so, which might be a performance gain as opposed to imperative languages)

Dario
Indeed, the internal module `GHC.Types` defines it as `infixr 5 :; data [] a = [] | a : [a]`
ephemient
I still don't get why the list type you defined above is a linked list. Since Haskell is purely functional, I understand that data is immutable, but I was trying to oppose the actual lists the programmer uses while coding in Haskell, and the GHC implementation of those lists when it compiles to a lower level language. Sorry if my question was not clear enough.
CharlieP
CharlieP, you are looking for the pointers but are not finding any. Consider Java, which is also missing pointers. Is it impossible to make a linked list in pure Java? No, of course not. You would have reference to a head object, which itself would have a value and a reference to the next object in the list. Add a sentinel object to indicate the end of the list, and you're done. That is exactly what the given type definition says. You don't need explicit pointers to make a linked list. You just need links. Who cares what GHC does under the hood?
Daniel Yankowsky
+8  A: 

You're confusing mutability with data structure. It is a proper list — just not one you're allowed to modify. Haskell is purely functional, meaning values are constant — you can't change an item in a list any more than you could turn the number 2 into 3. Instead, you perform calculations to create new values with the changes you desire.

You could define that function most simply this way:

add ls idx el = take idx ls ++ el : drop idx ls

The list el : drop idx ls reuses the tail of the original list, so you only have to generate a new list up to idx (which is what the take function does). If you want to do it using explicit recursion, you could define it like so:

add ls 0 el   = el : ls
add (x:xs) idx el
  | idx < 0   = error "Negative index for add"
  | otherwise = x : add xs (idx - 1) el
add [] _ el   = [el]

This reuses the tail of the list in the same way (that's the el : ls in the first case).

Since you seem to be having trouble seeing how this is a linked list, let's be clear about what a linked list is: It's a data structure consisting of cells, where each cell has a value and a reference to the next item. In C, it might be defined as:

struct ListCell {
void *value; /* This is the head */
struct ListCell *next; /* This is the tail */
}

In Lisp, it's defined as (head . tail), where head is the value and tail is the reference to the next item.

In Haskell, it's defined as data [] a = [] | a : [a], where a is the value and [a] is the reference to the next item.

As you can see, these data structures are all equivalent. The only difference is that in C and Lisp, which are not purely functional, the head and tail values are things you can change. In Haskell, you can't change them.

Chuck
+4  A: 

Your code might work, but it's definitely not optimal. Take the case where you want to insert an item at index 0. An example:

add [200, 300, 400] [] 0 100

If you follow the derivation for this, you end up with:

add [200, 300, 400] [] 0 100
add [300, 400] (200:100:[]) (-1) 100 
add [400] (300:[200, 100]) (-2) 300 
add [] (400:[300, 200, 100]) (-3) 400 
reverse [400, 300, 200, 100]
[100, 200, 300, 400]

But we are only adding an item to the beginning of the list! Such an operation is simple! It's (:)

add [200, 300, 400] [] 0 100
100:[200, 300, 400]
[100, 200, 300, 400]

Think about how much of the list really needs to be reversed.

You ask about whether the runtime modifies the pointers in the linked list. Because lists in Haskell are immutable, nobody (not even the runtime) modifies the pointers in the linked list. This is why, for example, it is cheap to append an item to the front of a list, but expensive to append an element at the back of a list. When you append an item to the front of the list, you can re-use all of the existing list. But when you append an item at the end, it has to build a completely new linked list. The immutability of data is required in order for operations at the front of a list to be cheap.

Daniel Yankowsky
Also notice what that algorithm will do if you give it an infinite list. Kaboom!
Chuck
A very good point!
Daniel Yankowsky
+3  A: 

Re: adding an element to the end of a List, I'd suggest using the (++) operator and splitAt function:

add xs a n = beg ++ (a : end)
  where
    (beg, end) = splitAt n xs

The List is a linked-list, but it's read-only. You can't modify a List in place - you instead create a new List structure which has the elements you want. I haven't read it, but this book probably gets at your underlying question.

HTH

Aidan Cully
+4  A: 

In Haskell, "data type" and "abstract type" are terms of art:

  • A "data type" (which is not abstract) has visible value constructors which you can pattern-match on in case expressions or function definitions.

  • An "abstract type" does not have visible value constructors, so you cannot pattern match on values of the type.

Given a type a, [a] (list of a) is a data type because you can pattern match on the visible constructors cons (written :) and nil (written []). An example of an abstract type would be IO a, which you cannot deconstruct by pattern matching.

Norman Ramsey
+1  A: 

The compiler is free to choose any internal representation it wants for a list. And in practice it does actually vary. Clearly the list "[1..]" is not implemented as a classical series of cons cells.

In fact a lazy list is stored as a thunk which evaluates to a cons cell containing the next value and the next thunk (a thunk is basically a function pointer plus the arguments for the function, which gets replaced by the actual value once the function is called). On the other hand if the strictness analyser in the compiler can prove that the entire list will always be evaluated then the compiler does just create the entire list as a series of cons cells.

Paul Johnson