views:

355

answers:

6

I've got the following problem: I have a tree of objects of different classes where an action in the child class invalidates the parent. In imperative languages, it is trivial to do. For example, in Java:

public class A {
    private List<B> m_children = new LinkedList<B>();
    private boolean m_valid = true;

    public void invalidate() {
        m_valid = false;
    }

    public void addChild(B child) {
        m_children.add(child);
        child.m_parent = this;
    }
}

public class B {
    public A m_parent = null;
    private int m_data = 0;

    public void setData(int data) {
        m_data = 0;
        m_parent.invalidate();
    }
}

public class Main {
    public static void main(String[] args) {
        A a = new A();
        B b = new B();
        b.setData(0); //invalidates A
    }
}

How do I do the above in Haskell? I cannot wrap my mind around this, since once I construct an object in Haskell, it cannot be changed.

I would be much obliged if the relevant Haskell code is posted.

EDIT: the problem I am trying to solve is the following:

I have an application that edits documents. A document is a hierarchy of objects. When properties of children objects are modified, the document needs to be set to an invalid state, so as that the user knows that the document needs to be validated.

+2  A: 

I don't have much experience with Haskell, but as far as I know it's not possible to have circles in the reference graph in pure functional languages. That means that:

  1. You can't have a 2-way lists, children in trees pointing to their parents, etc.*
  2. It is usually not enough to change just one node. Any node that is changed requires changes in every node starting from the "root" of the data structures all the way to the node you wish to change.

The bottom line is, I wouldn't try to take a Java (or any other imperative language) algorithm and try to convert it to Haskell. Instead, try to find a more functional algorithm (and maybe even a different data structure) to solve the problem.

EDIT:

From your clarification it's not entirely clear whether or not you need to invalidate only the direct parent of the object that changed or all its ancestors in the hierarchy, but that doesn't actually matter that much. Since invalidating an object basically means changing it and that's not possible, you basically have to create a modified duplicate of that object, and then you have to make its parent point to it to, so you have to create a new object for that as well. This goes on until you get to the root. If you have some recursion to traverse the tree in order to "modify" your object, then you can recreate the path from that object to the root on your way out of the recursion.

Hope that made sense. :s

*As pointed out in the comments by jberryman and in other answers, it is possible to create circular reference graphs in Haskell using lazy evaluation.

Tal Pressman
Actually that is not true. In haskell laziness allows us to define a data structure in terms of itself, even in very complex ways (google "tying the knot"). So a doubly-linked list is no problem. But you get into trouble when you try to modify one of the list elements :)
jberryman
Hmm, good to know. I'll edit my answer to fix it.
Tal Pressman
+3  A: 

To answer the question in your title: Yes, you can create nodes which have links to their parents as well as their children. Example:

--               parent       children
data Tree = Node (Maybe Tree) [Tree]
root = Node Nothing [a,b] -- I can "forward reference" a and b because haskell is lazy
a = Node (Just root) []
b = Node (Just root) []

The question is whether that's useful for your particular use-case (often times it isn't).

Now the question in your body: You're right, you can't change a value after it's been created. So once you have a valid tree, you'll always have a valid tree as long as the variable referencing that tree is in scope.

You didn't really describe what problem you're trying to solve, so I can't tell you how to functionally model what you're trying to do, but I'm sure there's a way without mutating the tree.

sepp2k
The problem I am trying to solve is the following: I have an application that edits documents. A document is a hierarchy of objects. When properties of children objects are modified, the document needs to be set to an invalid state, so as that the user knows that the document needs to be validated.
axilmar
A: 

Look into using the Functor instance of the Maybe type.

For example, maybe your problem is something like this: you want to insert an element into a binary tree, but only if it isn't already present. You could do that with something like:

data Tree a = Node a (Tree a) (Tree a)
            | Tip

maybeInsert :: a -> Tree a -> Maybe (Tree a)
maybeInsert a Tip = Just $ Node a Tip Tip
maybeInsert a (Node a' l r)
    | a == a' = Nothing
    | a < a'  = fmap (\l'-> Node a' l' r) (maybeInsert a l)
    | a > a'  = fmap (\r'-> Node a' l r') (maybeInsert a r)

So the function will return Nothing if we found the element to be already present, or return Just the new tree with the element inserted.

Hopefully that is relevant to whatever you are trying to do.

jberryman
+6  A: 

Modifying a tree which might require frequent excursions up the path to the root and back seems like the perfect job for a variant of the Zipper data structure with "scars", in the terminology of the original paper by Huet; the code samples from the paper also suggest a name of "memorising zipper". Of course, with some care, a regular zipper could also be used, but the augmented version might be more convenient and/or efficient to use.

The basic idea is the same as that behind a regular zipper, which already allows one to move up and down a tree in a purely functional manner (without any explicit back-pointers), but a "go up" operation followed by a "go down" operation becomes a no-op, leaving the focus at the original node (whereas with the regular zipper it would move it to the leftmost sibling of the original node).

Here's a link to the paper: Gérard Huet, Functional Pearl: The Zipper. It's just six pages, but the ideas contained therein are of great usefulness to any functional programmer.

Michał Marczyk
an article on zippers in haskell http://scienceblogs.com/goodmath/2010/01/zippers_making_functional_upda.php
stonemetal
Thank you for the document, I've read about the zipper structure before. I don't understand how to use it in my program though for the task at hand. For example, it's not possible to go up and set a new parent to the child with the "invalid" property set to true, because the invalidation must be valid for all children. If I use the zipper structure, I would have to change the parent object for all children, and since my tree is 6 layers deep, this change will ripple to the root object.
axilmar
There is also another problem: I understand how to use the given functions ("go_up", "go_left" etc), and I also understand the concept of the zipper (essentially use the local arguments as destructively updated variables), but I don't understand how to combine it with the rest of the program.Suppose I use the function 'go_up'. Where do I store the result? do I store it in a list? and if so, then where do I store the list? should I make a record that contains all the 'current data', like the data of the zipper structure? Any solutions to this greatly appreciated.
axilmar
I'm not sure I understand your first comment. If you update a node's parent, than all the node's siblings will also have an updated parent from this point on. If you need to go further up -- to the grandparent node perhaps -- then any update there will be visible as an update to the grandparent node to any "cousin nodes" of the node the focus was originally on. And so on. That's the whole idea behind the zipper: you make changes locally, then at any point you can extract the root node from the zipper to obtain the original tree with all your updates performed.
Michał Marczyk
As for storing the result, this is the same question as "when I cons a new entry onto a list, where do I store the resulting longer list?". :-) All zipper manipulation functions (this includes functions for moving across the zipper as well as the insert / update / delete functions) act upon so-called "zipper locations" (locs for short) and return new locs; then you can take those new locs and manipulate them further, or if you are done updating your tree, climb up the path to the root and extract the root node (= the whole tree) from the zipper in its newly updated form. Hope this helps.
Michał Marczyk
If the above is not clear (and I'm not entirely happy with my phrasing -- it's just my first shot...), I've just one question before I make another attempt: after updating an "object node", you want to invalidate the whole document (setting some flag at the root node most likely), right? Or is it something below the root that needs to be updated? Either way is ok with a zipper, just asking to adjust any examples I may think of to your use case... Also, where do you want the focus to be after the update and invalidation are performed? (The originally focused child node? The root?)
Michał Marczyk
After updating the object node, I want to invalidate the whole document by setting some flag in the root node.Regarding the focus, having the root node would be nice.
axilmar
In this case, start by constructing a zipper for the whole tree, then move down to the node you're interested in, perform your update, `go_up` a couple of times until you are at the root (you can discover whether you are at the root by pattern matching -- only root locs have the path field set to `Top`), update the validation flag at the root, then extract your tree. Et voilà, you now have a tree exactly as the original, but with the child node modified and the validation flag reset at the root.
Michał Marczyk
Note that Huet's paper describes zippers on tree which only hold data at the leaves, however adapting the idea for trees which have "decorated" interior nodes is straightforward. You'll probably have to read his section on arity-respecting zippers on heterogeneous trees, though.
Michał Marczyk
Isn't that terribly inefficient, though? my root node that I need to change to 'invalid' contains lots of things. Each instance has 30 fields. If I replicate the root instance each time I change a detail of it, my program will be terribly inefficient. This is also true for child nodes that contain a lot of information.
axilmar
Zippers can be extremely efficient, see e.g. http://stackoverflow.com/questions/2531753/how-well-do-zippers-perform-in-practice-and-when-should-they-be-used (and maybe follow the link to the paper there). But ultimately you'll have to perform some measurements to discover what's efficient in your particular case. It might be the case that replacing the 30 fields means simply replacing 30 pointers... And then if you perform your updates in batches, only marking the root as invalid once per batch, there really shouldn't be a problem. You'll just have to profile to be sure.
Michał Marczyk
+1  A: 

Here is some zipper code that demonstrates easy modification of the data a cursor points at as well as a "global" property of the tree. We build a tree, move the cursor to the node initially containing a 1, change it to a 3, and are left with a cursor pointing at that node in a fully updated tree.

import Data.Maybe (fromJust)
import Data.Tree
import Data.Tree.Zipper

type NodeData = Either Bool Int
type TreePath a = [TreePos Full a -> TreePos Full a]

firstChild' = fromJust . firstChild
parent'     = fromJust . parent
prev'       = fromJust . prev
next'       = fromJust . next

-- Determine the path from the root of the tree to the cursor.
pathToMe :: TreePos Full NodeData -> TreePath NodeData
pathToMe t | isRoot t  = []
           | isFirst t = firstChild' : pathToMe (parent' t)
           | otherwise = next' : pathToMe (prev' t)

-- Mark a tree as invalid, but leave the cursor in the same place.
invalidate :: TreePos Full NodeData -> TreePos Full NodeData
invalidate t = foldr ($) (setLabel (Left False) (root t)) (pathToMe t)

-- Set a node's internal data.
setData :: Int -> TreePos Full NodeData -> TreePos Full NodeData
setData = (invalidate . ) . setLabel . Right

main = let tree1 = Node (Left True) [Node (Right 1) [], Node (Right 2) []]
           Just cursor = firstChild (fromTree tree1)
           tree2 = setData 3 cursor
       in do putStrLn (drawTree (fmap show tree1))
             putStrLn (drawTree (fmap show (toTree tree2)))
             putStrLn $ "Cursor at "++show (label tree2)

Output:

Left True
|
+- Right 1
|
`- Right 2

Left False
|
+- Right 3
|
`- Right 2

Cursor at Right 3
Anthony
Thank you. From what I understand, the above code does the following:1) creates a tree in main.2) gets a child by getting a cursor.3) sets the data, invalidates the tree and returns a new root tree.The invalidation happens like this: a new root node is constructed out of the new valid flag and the rest of the tree.Have I understood the above correctly?
axilmar
That is correct, with the caveat that setData doesn't just give you the root to the new tree, but a new tree with the cursor set at the same location as in the original tree (so you can continue making subsequent local changes).
Anthony
Another note is that this code could be made more efficient by not rebuilding the tree when the root is already invalid.
Anthony
What if the root node invokes a callback when it is invalidated? in Java, I can have a list of objects to be notified when the object is invalidated. How can I do that with Haskell? will I need to rebuilt the object each time a new callback is added to the object? what about those references to previous roots that are not updated? With Java, one member update in the root is reflected to all the objects that refer to this object.
axilmar
It's hard to provide totally general answers to those questions. Bear in mind that rebuilding the root is a very cheap operation as the sub-trees dangling off of it are unaffected. Also, using a zipper as done here means that there are no issues of persistent references to old roots left dangling. So adding a new callback involves rebuilding the root node, but that's not a big deal.If you have another specific problem case, I'd encourage you to start from something like this code and post it as another question so you can get some concrete responses.
Anthony
A: 

Couldn't laziness take care of making sure validation doesn't happen too often? That way, you don't need to store the m_valid field.

For example, if you only validate on save, then you can edit the objects to your hearts content, without revalidating all the time; only when the user presses the 'Save' button is the value of validateDoc computed. Since I don't know for sure what your notion of valid means and what you need it for, I might be totally of the mark.

Untried & incomplete code:

data Document = Document { subDocs :: [SubDoc] }
data SubDoc = SubDoc { content :: String }

addSubDoc :: SubDoc -> (Document -> Document)
addSubDoc = error "not yet implemented: addSubDoc"

modifySubDoc :: Int -> (SubDoc -> SubDoc) -> (Document -> Document)
modifySubDoc = error "not yet implemented: modifySubDoc"


validateDoc :: Document -> Bool
validateDoc = all validateSubDoc . subDocs

validateSubDoc :: SubDoc -> Bool
validateSubDoc = not . null . contents

I'm assuming the overall validity of the document depends only on the subdocuments (simulated here by ensuring that they contain a non-empty string).

By the way, I think you forgot a a.addChild(b); in main.

yatima2975
Valid document means it passes a certain test. The idea here is that the user edits the document, then validates it.
axilmar