views:

211

answers:

5

It's a pretty normal binary tree, except for the fact that one of the nodes may be empty.

I'd like to find a way to output it in a horizontal way (that is, the root node is on the left and expands to the right).

I've had some experience expanding trees vertically (root node at the top, expanding downwards), but I'm not sure where to start, in this case.

Preferably, it would follow these couple of rules:

  • If a node has only one child, it can be skipped as redundant (an "end node", with no children, is always displayed)
  • All nodes of the same depth must be aligned vertically; all nodes must be to the right of all less-deep nodes and to the left of all deeper nodes.
  • Nodes have a string representation which includes their depth.
  • Each "end node" has its own unique line; that is, the number of lines is the number of end nodes in the tree, and when an end node is on a line, there may be nothing else on that line after that end node.
  • As a consequence of the last rule, the root node might be better off in either the top left or the bottom left corner; top left is preferred.

For example, this is a valid tree, with six end nodes (node is represented by a name, and its depth): EDIT: Please see bottom of question for an alternative, easier rendering

        
[a0]-----------[b3]------[c5]------[d8]
    \              \         \----------[e9]
     \              \----[f5]
      \-[g1]--------[h4]------[i6]
            \           \--------------------[j10]
             \-[k3]

Which represents the vertical, explicit binary tree:

0              a
              / \
1            g   *
            / \   \
2          *   *   *
          /     \   \
3        k       *   b
                /   / \
4              h   *   *
              / \   \   \
5            *   *   f   c
            /     \     / \
6          *       i   *   *
          /           /     \
7        *           *       *
        /           /         \
8      *           *           d
      /           /
9    *           e
    /
10 j

(branches folded for compactness; * representing redundant, one-child nodes; note that *'s are actual nodes, storing one child each, just with names omitted here for presentation sake)

(also, to clarify, I'd like to generate the first, horizontal tree; not this vertical tree)

I say language-agnostic because I'm just looking for an algorithm; I say ruby because I'm eventually going to have to implement it in ruby anyway.

Assume that each Node data structure stores only its id, a left node, and a right node.

A master Tree class keeps tracks of all nodes and has adequate algorithms to find:

  • A node's nth ancestor
  • A node's nth descendant
  • All end-node descendants of a node, and their count
  • The generation of a node
  • The lowest common ancestor of two given nodes

I already know:

  • The number of end nodes

Anyone have any ideas of where I could start? Should I go for the recursive approach? Iterative? Some Psuedo-code would be pretty cool too, and much appreciated =)


progress

As per walkytalky's suggestion, I decided to see what it would look like to map each "relevant" or significant node to a grid, with the columns being the depth and the rows identifiable by their end nodes. Here is what happens (skipping column 7 because there are no significant nodes in depth 7):

depth: 0  1  2  3  4  5  6  8  9  10
       a        b     c     d
                               e
                      f
          g        h     i
                                  j
                k

It should be easy enough to generate this grid, with either breadth-first or depth-first searches. Perhaps most trivially by simply keeping a 2D array and placing every significant node found into it, inserting a row for every "second child".

Now, knowing these facts:

  • The last node in a row must be an end node
  • Children are always to the right, and on the same row or lower, of their parent node.
  • All non-end nodes must have exactly two children
  • Therefore, all non-end nodes have children that are the first to the right of their column, the first child being on the same row, the second child being n rows below them, where n is the number of nodes on the right side of it.

We can see that, given any valid grid, there is one unambiguous way to "connect the dots", so to speak; there is one unambiguous tree being represented.

Now, the "connecting the dots" is no longer a binary-tree-structure question...it's simply a decoration question. We just need to build an algorithm to properly place the right -'s and \'s where they can go, perhaps following only simple grid/lexicographical rules, instead of binary-tree-structure rules.

Basically, this means that the problem of rendering a tree is now the much simpler problem of rendering a grid, with fancy decorations.

Can anyone suggest any way of formulating these rules? Or maybe a completely different method altogether?


edit

I have conceived of a much, much easier final rendering:

--d0----d1----d3----d4----d5----d6----d8----d9----d10-- => guide line (not rendered)

 [a0 ]-------[b3 ]-------[c5 ]-------[d8 ]
   |           |           \---------------[e9 ]
   |           \---------[f5 ]
   \---[g1 ]-------[h4 ]-------[i6 ]
         |           \---------------------------[j10]
         \---[k3 ]

--d0----d1----d3----d4----d5----d6----d8----d9----d10-- => guide line (not rendered)

It might be easier to try to create this one, instead of the one I had posted earlier. For one, it preserves a pretty grid shape, and you don't have to fickle with diagonal lines. The rows are all mapped along clearly visible column lines. Unfortunately, it is nowhere near as pretty as the first.

+1  A: 

Looks like an interesting problem; I'd be happy to give it a try, if I had more time.

I'd probably go with the following approach :

  1. Start rendering "right" (or in your case, "top") nodes, until I reach the end. (i.e.: render a, b, c, and d)
  2. Go back to the last node with a child (i.e.: c), and do the same thing recursively

You would have to keep a global variable indicating on wich row you are printing. Each recursive call increases this variable.

edit: ok, couldn't resist trying to write some untested pseudo-code, hope it works:

function print_tree(Node n) {
    print "\n" // begin on a fresh new line
    childs = new Array();
    do {
        if (n.hasLeftChild) {
            childs.push(n.leftChild)
        }
        print "---" + n.id    //this needs a lot of tweaking, but you get the idea
    } while(n = n.rightChild)
    childs.reverse()
    foreach(child in childs) {
        print_tree(child);
    }
}
fraktal
A: 

You'd probably need to perform a depth first search if not a search of the entire tree in order to properly size it for output along 2 dimensions.

btreat
+1  A: 

If there are N end nodes, there must be N-1 internal nodes with 2 children. (There can be any number of internal nodes with 1 child, which we will have to count to get the depths but otherwise ignore.) Generating the tree is thus equivalent to positioning these nodes on a grid, where:

  • the number of rows in the grid is N
  • I think the number of columns is between 1+floor(log2(N)) and 2*N-1, depending on how much overlap there is; this probably doesn't matter much for our purposes, though
  • each endpoint appears on a different row
  • all nodes at the same depth appear in the same column
  • all internal nodes appear on the same row as their rightmost descendant endpoint

So, let's see:

  • Walk the tree depth-first, right-to-left.
  • For each endpoint, record its depth and label.
  • For each 2-child internal, record its depth, label and the indices of both rightmost and leftmost child endpoints.
  • Sort the whole lot by depth -- this gives you the column ordering, with the number of distinct depths giving the actual number of columns. (All other ordering should come out automatically from the walk, I think, but that's not the case here because any branch can be any depth.)
  • Place all the nodes in the grid.
  • Mark empty cells to the right of each non-endpoint node as horizontal branches.
  • Mark empty cells down from each internal node to the row above its left child as vertical branches, and the cell at the level of the left child as a junction.

  • Print with appropriate ASCII decoration.

Update:

As you say, the positioning is enough to unambiguously determine the connections, but you still need to do some bottom-up work to get that right, so I'd probably still do the "mark" steps during the grid building.

I sort of thought the printing was trivial enough to gloss over, but:

  • Iterate down each column and determine the column width as size of fixed elements + max label length + floor(log10(depth) + 1). (Fixed elements might be [ and ]-, for example. We can substitute ]\n as the suffix for endpoints.)
  • For each row
    • for each column
      • if cell contains a node or endpoint
        • print fixed prefix
        • print label
        • print depth
        • print fill spaces (max label length - current label length)
        • print appropriate suffix
        • if node is an endpoint, skip to next row
      • if cell is empty, print fill spaces to width of column
      • if cell contains a vertical, print some chosen prefix number of spaces, a bar, and fill with spaces
      • if cell contains a junction, print some chosen prefix number of spaces, a backslash, and fill with hyphens
      • if cell contains a horizontal, print full column width of hyphens

Converting this to print diagonals might be easiest if you generate the straight version first and then do some substitutions in the character array -- otherwise you can get cases where you're rendering a long vertical branch in a different column than the one in which it originated.

At some point I may try to put this into code, but it probably won't be today -- stuff to do!

walkytalky
I like the approach to positioning each relevant node on a grid; I'll look more into this :)
Justin L.
Interesting to note: Given the proposed grid, without any knowledge of any connections, there is one unambiguous way to connect the pieces together. I'll edit my post to include this.
Justin L.
All of the answers were really good; yours was the approach I ended up taking, because it let me see and understand everything in a whole other angle, which I could easily attack.
Justin L.
+1  A: 

Below is fully functional C# code that does exactly what you want. How it does it:

  • The tree is represented as objects from classes that inherit from Node
  • First compute the number of leaves and create an array of that much lines
  • Then for each level:
    • find out on what lines are we going to write
    • for those lines, compute the maximum of what is already on those lines
    • write the all the nodes to column max(number from previous step, end of previous level)+1; prepend with - to get to that column
    • write diagonal lines from all binary nodes up to the line of their right child (in my program first child is left, second is right, you have it the other way around)
    • advance one level

The algorithm makes sure that each level starts only after previous ends. That is probably good choice for short names, but for longer names, this probably shouldn't be enforced.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace SO_ASCII_tree
{
    class Program
    {
        static void Main()
        {
            Node root = …;

            StringBuilder[] lines = Enumerable.Range(0, root.Leaves).Select(i => new StringBuilder()).ToArray();

            Node[] currentLevel = new Node[] { root };
            int level = 0;
            int min = 0;
            int max = 0;
            while (currentLevel.Any())
            {
                NamedNode[] namedNodes = currentLevel.OfType<NamedNode>().ToArray();
                if (namedNodes.Any())
                {
                    min = namedNodes.Select(node => lines[node.Line].Length).Max();
                    min = Math.Max(min, max);
                    if (min != 0)
                        min++;
                    foreach (NamedNode namedNode in namedNodes)
                        WriteAtPosition(lines[namedNode.Line], namedNode.Write(level), min, '-');
                    max = namedNodes.Select(node => lines[node.Line].Length).Max();
                    // change to max = min + 1; for long names
                }

                foreach (Node node in currentLevel)
                    node.SetChildLines();

                Binary[] binaries = namedNodes.OfType<Binary>().ToArray();
                foreach (Binary binary in binaries)
                    GoDown(lines, binary.Line, binary.Right.Line);

                currentLevel = currentLevel.SelectMany(node => node.Children).ToArray();
                level++;
            }

            foreach (StringBuilder line in lines)
                Console.WriteLine(line.ToString());
        }

        static void WriteAtPosition(StringBuilder line, string message, int position, char prepend = ' ')
        {
            if (line.Length > position)
                throw new ArgumentException();
            line.Append(prepend, position - line.Length);
            line.Append(message);
        }

        static void GoDown(StringBuilder[] lines, int from, int to)
        {
            int line = from + 1;
            int position = lines[from].Length;
            for (; line <= to; line++, position++)
                WriteAtPosition(lines[line], "\\", position);
        }
    }

    abstract class Node
    {
        public int Line
        { get; set; }

        public abstract int Leaves
        { get; }

        public abstract IEnumerable<Node> Children
        { get; }

        public virtual void SetChildLines()
        { }
    }

    abstract class NamedNode : Node
    {
        public string Name
        { get; set; }

        public string Write(int level)
        {
            return '[' + Name + level.ToString() + ']';
        }
    }

    class Binary : NamedNode
    {
        public Node Left
        { get; set; }
        public Node Right
        { get; set; }

        int? leaves;
        public override int Leaves
        {
            get
            {
                if (leaves == null)
                    leaves = Left.Leaves + Right.Leaves;
                return leaves.Value;
            }
        }

        public override IEnumerable<Node> Children
        {
            get
            {
                yield return Left;
                yield return Right;
            }
        }

        public override void SetChildLines()
        {
            Left.Line = Line;
            Right.Line = Line + Left.Leaves;
        }
    }

    class Unary : Node
    {
        public Node Child
        { get; set; }

        int? leaves;
        public override int Leaves
        {
            get
            {
                if (leaves == null)
                    leaves = Child.Leaves;
                return leaves.Value;
            }
        }

        public override IEnumerable<Node> Children
        {
            get
            {
                yield return Child;
            }
        }

        public override void SetChildLines()
        {
            Child.Line = Line;
        }
    }

    class Leaf : NamedNode
    {
        public override int Leaves
        {
            get
            {
                return 1;
            }
        }

        public override IEnumerable<Node> Children
        {
            get
            {
                yield break;
            }
        }
    }
}

EDIT: Your example tree gets rendered exactly the same as your rendering:

[a0]-----------[b3]------[c5]------[d8]
    \              \         \----------[e9]
     \              \----[f5]
      \-[g1]--------[h4]------[i6]
            \           \--------------------[j10]
             \-[k3]
svick
Do you have the output of the given tree? Actually, I've been meaning to get around to installing .NET and a C# compiler...doing it now...
Justin L.
+1  A: 

If you start with a label width for each level (not including [] characters), equal to the largest label for that width (in this example the widths are mostly 2 except j10 which is 3, and levels 2 and 7 which are 0).

Have each level with non-zero max label width equally spaced with one - character between each level, so you can calculate initial level y locations.

Give each node it's line number.

Then adjust the level locations based on the maximum number of lines between children for a level.

Added 2 to level 1 for a0 to g1
Added 1 to level 2 for g1 to k3
Added 1 to level 4 for b3 to [ ]

Use \ and ` characters for diagonals.

[a0]---------[b3]-------[c5]------[d8]
    \            \          `----------[e9]
     \            `-----[f5]
      `[g1]--------[h4]------[i6]
           \           `--------------------[j10]
            `[k3]
Stephen Denne
I like this approach; however, I'm not sure I completely understand it. Your "line initialization" seems to only cover the first, top line...how would you initially render, say, line 2? 3?
Justin L.