views:

63

answers:

2

Careless use of templates can cause bloat. One way to avoid that bloat is to have a thin typesafe template that wraps non-typesafe non-template code. To do this, the wrapper needs to provide some way for the non-template code to access things it knows nothing about.

For example, in a data structure, the wrapper defines the node structs. The unsafe code needs to read and write to the nodes, but must do so indirectly, through some kind of interface which is specified by the wrapper.

One way to implement this interface is to fill in a struct (defined by the unsafe code) with details such as function-pointers and constants determined by the wrapper. And one relevant kind of constant is the offset (within some structure) of a particular field. The unsafe code can use that offset (and some pointer arithmetic) to access that field directly.

This is getting problematic, though - as optimisers get more aggressive, this can lead to pointer alias issues. This is particularly the case if nodes can escape the library. For example, nodes may be extracted from a binary tree and relinked to form a linked list. Another example, annoyingly, happens when unit testing.

I currently have a container library written along these lines, and it causes none of these problems at present - but it will shortly. The reason it avoids these problems is because all the unit testing is applied to containers (not the underlying code), and because the nodes never escape the containers. That is, nodes are always accessed the same pointer-arithmetic way, so the pointer alias optimization issue never arises.

Unfortunately, I'm shortly going to need to allow nodes to be extracted from the containers, and I'm probably going to need unit tests on the underlying unsafe code as well.

Rather than deal with this specific library, I have a much simpler extract from an old binary tree library here that suffers the same problems. In VC++9 it just works. Using MinGW GCC 4.4.0, a debug build works, but a release build fails. The problem is a mixture of inlining and failure of the optimizer to spot pointer aliasses.

Just to be clear - I don't want to here "WTF - GOTO!!!" or whatever. The issue is resolving the optimization/pointer problem. Though if you can find a way to write Tree_To_List that is properly structured and doesn't use hidden/disguised gotos to achieve it, I'm interested.

Also, a layer of template-based abstraction is missing (the template c_Bin_Tree_Tool doesn't do the whole job - c_Tool finishes the wrapping, but in a per-use way rather than in re-usable form. That's just a side-effect of extracting the relevant code.

What this code does is create an unbalanced binary tree by inserting nodes one-by-one, then balance that tree. The balancing works by converting the tree to a list (which in a way it already is), then converting the list back to a tree. The tree is dumped to stdio both before and after the balance.

bintree.h...

inline void* Ptr_Add  (void* p1, std::ptrdiff_t p2)  {  return (void*) (((std::ptrdiff_t) p1) + p2);  }

struct c_Bin_Tree_Closure
{
  typedef int (*c_Node_Comp) (void* p_Node1, void* p_Node2);

  c_Node_Comp    m_Node_Comp;
  std::ptrdiff_t m_Left, m_Right;
};

class c_BT_Policy_Closure
{
  private:
    const c_Bin_Tree_Closure* m_Closure;

  protected:
    void** Left_Of  (void* p)  {  return ((void**) Ptr_Add (p, m_Closure->m_Left ));  }
    void** Right_Of (void* p)  {  return ((void**) Ptr_Add (p, m_Closure->m_Right));  }

    int Compare_Node (void* p_Node1, void* p_Node2) const
    {
      return m_Closure->m_Node_Comp (p_Node1, p_Node2);
    }

  public:
    c_BT_Policy_Closure ()
    {
      m_Closure = 0;
    }

    void Set_Closure (const c_Bin_Tree_Closure& p_Closure)
    {
      m_Closure = &p_Closure;
    }
};

template<class tc_Policy>
class c_Bin_Tree_Tool : public tc_Policy
{
  public:
    c_Bin_Tree_Tool ()
    {
    }

    void *List_To_Tree (size_t p_Size, void* &p_List);
    void  Tree_To_List (void* p_Root, void* &p_First, void* &p_Last, size_t &p_Size);
    void  Balance      (void* &p);
    void  Insert       (void* &p_Tree, void* p_Node);
};

template<class tc_Policy>
void* c_Bin_Tree_Tool<tc_Policy>::List_To_Tree (size_t p_Size, void* &p_List)
{
  if (p_Size == 0)  return 0;

  size_t l_Size = p_Size / 2;
  void*  l_Ptr  = List_To_Tree (l_Size, p_List);

  void* l_This = p_List;
  p_List = *tc_Policy::Right_Of (l_This);

  *tc_Policy::Left_Of (l_This) = l_Ptr;

  l_Size = p_Size - (l_Size + 1);
  *tc_Policy::Right_Of (l_This) = List_To_Tree (l_Size, p_List);

  return l_This;
}

template<class tc_Policy>
void c_Bin_Tree_Tool<tc_Policy>::Tree_To_List (void* p_Root, void* &p_First, void* &p_Last, size_t &p_Size)
{
  //  Use left links as prev links and right links as next links

  void*  l_Start    = 0;         //  first-item-in-list pointer
  void*  l_Prev     = 0;         //  previous node in list
  void** l_Prev_Ptr = &l_Start;  //  points to the next (ie right) pointer for the next node.
  void*  l_Pos      = p_Root;
  void*  l_Next;
  void*  l_Parent   = 0;
  size_t l_Count    = 0;

  p_Last = 0;

  TOP_OF_LOOP:
    l_Next = *tc_Policy::Left_Of (l_Pos);

    if (l_Next != 0)
    {
      *tc_Policy::Left_Of (l_Pos) = l_Parent;  //  So we can find our way back up the tree
      l_Parent = l_Pos;
      l_Pos    = l_Next;
      goto TOP_OF_LOOP;
    }

  AFTER_STEP_PARENT:
    l_Next = *tc_Policy::Right_Of (l_Pos);

    *tc_Policy::Left_Of (l_Pos) = l_Prev;

    p_Last      = l_Pos;
    l_Prev      = l_Pos;
    *l_Prev_Ptr = l_Pos;
    l_Prev_Ptr  = tc_Policy::Right_Of (l_Pos);
    l_Count++;

    if (l_Next != 0)
    {
      l_Pos = l_Next;
      goto TOP_OF_LOOP;
    }

    if (l_Parent != 0)
    {
      l_Pos    = l_Parent;
      l_Parent = *tc_Policy::Left_Of (l_Pos);
      goto AFTER_STEP_PARENT;
    }

  *l_Prev_Ptr = 0;
  p_First     = l_Start;
  p_Size      = l_Count;
}

template<class tc_Policy>
void c_Bin_Tree_Tool<tc_Policy>::Balance (void* &p)
{
  void   *l_First, *l_Last;
  size_t l_Count;

  Tree_To_List (p, l_First, l_Last, l_Count);
  p = List_To_Tree (l_Count, l_First);
}

template<class tc_Policy>
void c_Bin_Tree_Tool<tc_Policy>::Insert (void* &p_Tree, void* p_Node)
{
  void** l_Tree = &p_Tree;

  while (*l_Tree != 0)
  {
    int l_Compare = tc_Policy::Compare_Node (*l_Tree, p_Node);
    l_Tree = ((l_Compare < 0) ? tc_Policy::Right_Of (*l_Tree) : tc_Policy::Left_Of (*l_Tree));
  }

  *l_Tree = p_Node;
  *tc_Policy::Right_Of (p_Node) = 0;
  *tc_Policy::Left_Of  (p_Node) = 0;
};

test_bintree.cpp...

#include <iostream>

#include "bintree.h"

struct c_Node
{
  c_Node *m_Left, *m_Right;
  int    m_Data;
};

c_Node g_Node0001 = { 0, 0,  1 };  c_Node g_Node0002 = { 0, 0,  2 };
c_Node g_Node0003 = { 0, 0,  3 };  c_Node g_Node0004 = { 0, 0,  4 };
c_Node g_Node0005 = { 0, 0,  5 };  c_Node g_Node0006 = { 0, 0,  6 };
c_Node g_Node0007 = { 0, 0,  7 };  c_Node g_Node0008 = { 0, 0,  8 };
c_Node g_Node0009 = { 0, 0,  9 };  c_Node g_Node0010 = { 0, 0, 10 };

int Node_Compare (void* p1, void* p2)
{
  return (((c_Node*) p1)->m_Data - ((c_Node*) p2)->m_Data);
}

c_Bin_Tree_Closure  g_Closure =
{
  (c_Bin_Tree_Closure::c_Node_Comp) Node_Compare,
  offsetof (c_Node, m_Left ),  offsetof (c_Node, m_Right)
};

class c_Tool : public c_Bin_Tree_Tool<c_BT_Policy_Closure>
{
  protected:
    typedef c_Bin_Tree_Tool<c_BT_Policy_Closure> c_Base;
  public:
    c_Tool ()  {  Set_Closure (g_Closure);  }
    void Insert  (c_Node* &p_Tree, c_Node* p_Node)  {  c_Base::Insert ((void*&) p_Tree, p_Node);  }
    void Balance (c_Node* &p)  {  c_Base::Balance ((void*&) p);  }
};

void BT_Dump (size_t p_Depth, c_Node* p_Node)
{
  if (p_Node != 0)
  {
    BT_Dump (p_Depth + 1, p_Node->m_Left);
    for (size_t i = 0; i < p_Depth; i++)  std::cout << " .";
    std::cout << " " << p_Node->m_Data << std::endl;
    BT_Dump (p_Depth + 1, p_Node->m_Right);
  }
}

int main (int argc, char* argv[])
{
  c_Tool  l_Tool;
  c_Node *l_Root = 0;

  l_Tool.Insert (l_Root, &g_Node0001);
  l_Tool.Insert (l_Root, &g_Node0002);
  l_Tool.Insert (l_Root, &g_Node0003);
  l_Tool.Insert (l_Root, &g_Node0004);
  l_Tool.Insert (l_Root, &g_Node0005);
  l_Tool.Insert (l_Root, &g_Node0006);
  l_Tool.Insert (l_Root, &g_Node0007);
  l_Tool.Insert (l_Root, &g_Node0008);
  l_Tool.Insert (l_Root, &g_Node0009);
  l_Tool.Insert (l_Root, &g_Node0010);

  BT_Dump (0, l_Root);  std::cout << "----------" << std::endl;

  l_Tool.Balance (l_Root);
  BT_Dump (0, l_Root);

  return 0;
}

The expected results are...

 1
 . 2
 . . 3
 . . . 4
 . . . . 5
 . . . . . 6
 . . . . . . 7
 . . . . . . . 8
 . . . . . . . . 9
 . . . . . . . . . 10
----------
 . . . 1
 . . 2
 . 3
 . . . 4
 . . 5
 6
 . . . 7
 . . 8
 . 9
 . . 10

What actually happens (MinGW GCC 4.4.0, optimized release build)...

 1
 . 2
 . . 3
 . . . 4
 . . . . 5
 . . . . . 6
 . . . . . . 7
 . . . . . . . 8
 . . . . . . . . 9
 . . . . . . . . . 10
----------
 1

As far as I can tell, the Balance operation runs correctly, but the BT_Dump function can't see all the changes to the m_Left and m_Right fields.

EDIT That's wrong - otherwise why am I seeing node 1 as the new root. I guess that's what happens when you rely too much on memory of an investigation that was done months ago.

EDIT Actually, node 1 as root is an issue, but since it was the old root - well, best to just ignore this what-the-problem-is stuff and work out your own theories ;-)

There are a number of issues, in terms of being standards-undefined, in the code. I think the biggest problem is that the links in the node struct are c_Node*, but since the unsafe code knows nothing about c_Node it accesses them (via pointer arithmetic) as void*.

One fix would be for the unsafe code to do all reads and writes through getter and setter function pointers, avoiding all the pointer arithmetic and ensuring that all access to c_Node instances is done through c_Node* pointers. Even better - the interface becomes a class with getter/setter methods etc. In the complete binary tree library, I have alternate policy classes that do this, though to be honest my real fix when the problem arose was to throw all the code in a "junk" folder on the basis that I rarely use it, and should probably be using boost intrusive lists anyway.

However, this still leaves the other much more complex and heavily used container library, which is not going away. I think I'm just going to have to do the very painful refactoring to get rid of the offsetof and pointer arithmetic stuff, but...

What exactly are the C++ rules - when precisely is the compiler allowed to fail to spot a possible pointer alias? And could the binary tree code above be rewritten so that it still uses pointer arithmetic, still allows the nodes to be accessed/modified both inside and outside the library, and yet is safe from this optimization issue?

A: 

Careless use of templates CAN cause bloat. But you're totally missing the point here.

  • Templates cause bloat when used carelessly, not carefully.
  • The quantity of runtime errors avoided by templates is massive.
  • The speed of templated code is far greater than non-templated code.
  • The size of the executable is absolutely trivial unless you run on an embedded system.
  • The STL provides a map container (which is a binary search tree) for your use.

You just haven't thought this through properly at all. The advantages templates offer far outweigh a few kb in executable size.

It's also worth noting that the code works as expected on Visual Studio 2010.

DeadMG
As I said in the question, my reaction on spotting the problem was to chuck the binary tree code in a junk folder - in the rare cases when I need this kind of thing I should probably be using boost intrusive lists (but not std::map which cannot support it). Note - the point of the library isn't just to have unbalanced trees with a balance operation, but rather (e.g.) to convert a tree to a list, perform some operation that works better on lists, and then convert back to a tree when finished.
Steve314
The much more important code I want to fix is an in-memory multiway (B+tree variant) library - more efficient (in many but not all cases) that std::map etc, plus handles some things that the standard RB tree containers cannot. Note - I wasn't criticising the use of templates, only looking for advice on keeping my templates as portable as possible. And the technique I describe for managing bloat is a well known technique, heavily used in the past if less so these days, which I've seen make three orders of magnitude difference to executable code size.
Steve314
+2  A: 

Did you switch off warnings? You should have got some "dereferencing type punned pointers violates strict aliasing", because thats exactly what you do at (void**) Ptr_Add(...

The compiler is free to assume that pointers to different types do not alias (with a few execpitions), and will produce optimized code which caches the targets of pointers in registers. To avoid that, you have to use unions to convert between different pointers types. Quoting from http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options:

In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an unsigned int can alias an int, but not a void* or a double. A character type may alias any other type.

Pay special attention to code like this:

     union a_union {
        int i;
        double d;
      };

     int f() {
        union a_union t;
        t.d = 3.0;
        return t.i;
      }

The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above will work as expected. See Structures unions enumerations and bit-fields implementation. However, this code might not:

     int f() {
        union a_union t;
        int* ip;
        t.d = 3.0;
        ip = &t.i;
        return *ip;
      }

Similarly, access by taking the address, casting the resulting pointer and dereferencing the result has undefined behavior, even if the cast uses a union type, e.g.:

     int f() {
        double d = 3.0;
        return ((union a_union *) &d)->i;
      }

The -fstrict-aliasing option is enabled at levels -O2, -O3, -Os.

In your case you could use something like

union {
    void** ret_ptr;
    ptrdiff_t in_ptr;
}

but the code in ptr_add just looks horrible ;-)

Or just disable this specific optimization with "fno-strict-aliasing". Better fix your code though ;-)

drhirsch
I specified -Wno-invalid-offsetof for obvious reasons. Didn't seen the punned pointers warning - maybe it gets turned off along with the offsetof warning?
Steve314
Neither I nor the compiler may always be able to detect cases of pointer abuse :-) Try -Wstrict-aliasing. There is an __attribute__((__may_alias__)) which can be attached to variable definitions for a quick fix (hack). Maybe you should risk some nloat and just use a templated, nice version of Ptr_Add().
drhirsch
That is a very very interesting section of the MinGW manual. Thanks. Will probably be accepting, but I'll see what other answers I get first.
Steve314
@drhirsch - I don't think a template Ptr_Add will cause bloat - it'll be a big surprise if it doesn't inline as something smaller than a call anyway - but I think there are more problems (of the same nature) than just the Ptr_Add. The attribute is interesting, but I assume GCC-specific, so feels like saving up problems for later. I think you're right about unions - finding the right places to apply that should be a relatively quick fix.
Steve314
@drhisrsch - Applying the union-based punning in Ptr_Add, c_Tool::Insert and c_Tool::Balance seems to be enough to fix that code - I'm pretty sure it'll work as an easy fix. Of course I may still want to get rid of all the offsetof and pointer arithmetic anyway, but - well, there's the force of laziness to consider.
Steve314
You are right, a templated version will most likely cause no bloat, I should have put that in quotes, because it was meant to be ironic. Which of course is hard to see in written text ;-)
drhirsch
Instead to compiler will probably inline such a simple function and completly optimize it away. Maybe some non inlined template versions of the functions are stored somewhere in the executable, but never actually executed, which may make the executable somewhat bigger, but this is proably not harmful ;-)
drhirsch
@drhirsch - That's what function level linking is for - though for a template function defined without a prior prototype and (belt and braces) with an explicit "inline" keyword, I don't think the function will be included in object files. There's no need for a just-in-case thing because the compiler should know that everywhere it's referenced it'll be able to see the full code. Of course that's a guess, but I think a reasonable one - and you'd need a lot of unused Ptr_Add instantiations to cause noticable bloat anyway. And if all else fails, I could always rewrite as a macro.
Steve314