views:

169

answers:

4

Hi all,

I am having a little dilemma with a C++ code. Its actually a performance issue. I am trying to traverse through two hash_maps which is causing a lot of slowness. Heres the hash map class code. Let me know if I am missing something from this.

  template<class Handle, class Object, class HashFunc, vector<Object *> (*initFunc)()>
  class ObjMapping: public BaseCache
  {

  public:
    typedef ObjMapping<Handle, Object, HashFunc, initFunc> ObjMappingType;
    typedef InvalidObjectException<Handle> InvalidHandle;
    typedef typename ReferenceCounted<Object>::ObjRef ObjRef;

  protected:
    typedef dense_hash_map<Handle, pair<int, ObjRef> , HashFunc> ObjHashMap;
    typedef typename dense_hash_map<Handle, pair<int, ObjRef> , HashFunc>::iterator ObjHashMapIterator;
    typedef typename dense_hash_map<Handle, pair<int, ObjRef> , HashFunc>::const_iterator ObjHashMapConstIterator;

  public:

    class iterator
    {

    public:
      typedef Object &reference;
      typedef ObjRef pointer;

      iterator(ObjMappingType &container, ObjHashMapIterator start) :
        base(start), container_(&container)
      {
        incIterCount_();
      }

      iterator(const iterator &rhs) :
        base(rhs.base), container_(rhs.container_)
      {
        Monitor crit(container_->getIterMutex());
        incIterCount_();
      }

      void operator =(const iterator &rhs)
      {
        if (this != &rhs)
        {
          Monitor crit(container_->getIterMutex());
          decIterCount_();
          base = rhs.base;
          container_ = rhs.container_;
          incIterCount_();
        }
      }

      ~iterator()
      {
        Monitor crit(container_->getIterMutex());
        decIterCount_();
      }

      reference operator *() const
      {
        return *base->second.second;
      }
      pointer operator ->() const
      {
        return base->second.second;
      }

      iterator operator ++()
      {
        {
          Monitor crit(container_->getIterMutex());
          decIterCount_();
          ++base;
          incIterCount_();
        }
        return *this;
      }
      iterator operator ++(int)
      {
        iterator result = *this;
        {
          Monitor crit(container_->getIterMutex());
          decIterCount_();
          ++base;
          incIterCount_();
        }
        return result;
      }
      iterator operator --()
      {
        {
          Monitor crit(container_->getIterMutex());
          decIterCount_();
          --base;
          incIterCount_();
        }
        return *this;
      }
      iterator operator --(int)
      {
        iterator result = *this;
        {
          Monitor crit(container_->getIterMutex());
          decIterCount_();
          --base;
          incIterCount_();
        }
        return result;
      }

      bool operator ==(const iterator &i) const
      {
        return (base == i.base);
      }
      bool operator !=(const iterator &i) const
      {
        //return !(*this == i);
        return (base != i.base);
      }

    private:
      void incIterCount_()
      {
        if (!container_->endIterator(base))
        {
          ++base->second.first;
        }
      }
      void decIterCount_()
      {
        if (!container_->endIterator(base) && --base->second.first == 0)
        {
          container_->wake();
        }
      }

      ObjHashMapIterator base;
      ObjMappingType *container_;
    };

    ~ObjMapping()
    {
    }

    bool validObj(const Handle &id) const
    {
      Monitor crit(mutex);
      MethodTracker track ("ObjMapping::validObj");
      return objs.find(id) != objs.end();
    }

    ObjRef getObj(const Handle &id) const
    {
      Monitor crit(mutex);
      MethodTracker track ("ObjMapping::getObj");
      if (!validObj(id))
      {
        throw InvalidHandle(id);
      }
      return objs.find(id)->second.second;
    }

    void addObj(auto_ptr<Object> obj)
    {
      Monitor crit(mutex);
      Handle h(obj->getID());

      // Stop iterator changes while container is being altered
      Monitor iter(iterMutex_);
      objs.insert(typename ObjHashMap::value_type(h, make_pair(0, ReferenceCounted<Object>::alloc(
          obj))));
    }

    // Will remove the given object from the cache
    // NOTE: This is a dangerous operation: it will block until there are no references to the
    // object other than the one in the cache, which opens many possibilities for deadlocks, 
    // and means that it is not safe to store references from the cache outside it.
    void removeObj(const Handle &id)
    {
      Monitor crit(mutex);
      ObjHashMapIterator entry = objs.find(id);
      if (entry != objs.end())
      {
        // If there are other references to the object wait for them to be released
        entry->second.second.ensureUnique();

        // Wait until no further iterators for this entry
        Monitor crit(iterMutex_);
        while (entry->second.first != 0)
        {
          iterBlock_.wait(iterMutex_);
        }

        objs.erase(entry);
      }
    }

    // Will remove the given object from the cache if the cache contains the only reference to it,
    // returns true only if the object is not in the cache
    bool releaseObj(const Handle &id)
    {
      Monitor crit(mutex);
      ObjHashMapIterator entry = objs.find(id);
      if (entry != objs.end())
      {
        Monitor crit(iterMutex_);
        if (entry->second.first != 0 || entry->second.second.references() != 1)
        {
          return false;
        }

        objs.erase(entry);
      }
      return true;
    }

    size_t size() const
    {
      return objs.size();
    }

    iterator begin()
    {
      Monitor crit(iterMutex_);
      MethodTracker track ("ObjMapping::begin");
      return iterator(*this, objs.begin());
    }
    iterator end()
    {
      Monitor crit(iterMutex_);
      MethodTracker track ("ObjMapping::end");
      return iterator(*this, objs.end());
    }

    void wake()
    {
      iterBlock_.broadcast();
    }

    Mutex &getIterMutex()
    {
      return iterMutex_;
    }

    void dump(ostream &out)
    {
      Monitor crit(mutex);
      out << "Mapping cache contains " << objs.size() << " base objects" << endl;
    }

    // Will reload *all* objects from the cache
    // NOTE: This is a *VERY* dangerous operation: see comments above for removeObj
    void reload()
    {
      Monitor crit(mutex);

      // Delete all objects in cache
      ObjHashMapIterator i = objs.begin();
      while (i != objs.end())
      {
        // If there are other references to the object wait for them to be released
        i->second.second.ensureUnique();

        // Wait until no further iterators for this entry
        Monitor crit(iterMutex_);
        while (i->second.first != 0)
        {
          iterBlock_.wait(iterMutex_);
        }

        objs.erase(i++);
      }

      // Reload all objects from DB
      vector<Object *> base = initFunc();
      for (typename vector<Object *>::const_iterator i = base.begin(); i != base.end(); ++i)
      {
        Handle id = (*i)->getID();
        objs.insert(make_pair(id, make_pair(0, ReferenceCounted<Object>::alloc(
            auto_ptr<Object> (*i)))));
      }
    }

    static ObjMapping<Handle, Object, HashFunc, initFunc> &getTable()
    {
      static bool created = false;
      static Mutex createMutex;
      MethodTracker track ("ObjMapping::getTable");
      static auto_ptr<ObjMapping<Handle, Object, HashFunc, initFunc> > theTable;
      if (!created)
      {
        Monitor crit(createMutex);
        if (!created)
        {
          theTable.reset(new ObjMapping<Handle, Object, HashFunc, initFunc> );
          created = true;
        }
      }
      return *theTable;
    }

  protected:
    friend class iterator;
    bool endIterator(ObjHashMapIterator &it)
    {
      return it == objs.end();
    }

    ObjMapping() :
      mutex(Mutex::Recursive)
    {
      vector<Object *> base = initFunc();
      objs.set_empty_key(0);
      for (typename vector<Object *>::const_iterator i = base.begin(); i != base.end(); ++i)
      {
        Handle id = (*i)->getID();
        objs.insert(make_pair(id, make_pair(0, ReferenceCounted<Object>::alloc(
            auto_ptr<Object> (*i)))));
      }
    }

  private:
    ObjMapping(const ObjMapping &);
    const ObjMapping &operator =(const ObjMapping &);

    mutable Mutex mutex;
    ObjHashMap objs;
    Mutex iterMutex_;
    Condition iterBlock_;
  };

And I have created two objects out of this like,

typedef ObjMapping<RosterID, Roster, __gnu_cxx::hash<RosterID>, Roster::readAllRosters> RosterTable;
typedef ObjMapping<RosterHeaderID, RosterHeader, __gnu_cxx::hash<RosterID>, RosterHeader::readAllRosterHeaders> RosterHeaderTable;

Two methods Roster::readAllRosters & RosterHeader::readAllRosterHeaders are database queries which extracts the data returns as a vector.

The traversal code that is causing the slowness is given below,

for (RosterTable::iterator it = RosterTable::getTable().begin(); it != RosterTable::getTable().end(); ++it) {
  if (RosterHeaderTable::getTable().getObj( it->getHeader() )->getEmployee() == getID())
  {
    // Adding a roster to a map.
  }
}

Can anyone see anything that can be done to improve this code ? Also note that if I do comment out the if statement in the traversal code it runs fine. And you can also see in the hash map class that I have mutexd most of the methods that might be causing dead locks. Please help !

+4  A: 

Do the methods ++() and --() really need to be protected? When are two threads going to use same iterator? You need to protect the hash map, not the iterator. Locking and unlocking are expensive operations. Removing this will certainly improve the performance.

Manoj R
this is a multi threaded application hence the locking. but i can see where you are coming from interms of it being expensive.
kuzyt
bdw the iteration is happens at the very beginning the during the program start to build up the cache. just so you know coz i forgot to mention it before.
kuzyt
@kuzyt: People often recommend not making containers into monitors; the locks should be on a higher-level object than the raw containers. The reason is that other threads can invalidate iterators while you're in the middle of an iteration loop (say, by calling `reload` in your case), making your program crash due to an unpredictable race condition anyway. And if you say "well, I won't call reload while another thread might be reading from it", then container is de-facto immutable while you're iterating, and you don't need locks on every operation.
Doug
A: 

You have hash maps but you want to go through all the elements, right?

Then why not add another container that hold data in a linear way? Like a vector or a list?

That way you just provide iterators that go through this container when the user want to go through all the elements.

It have (low) a cost, but it makes things fast in a simple way.

Klaim
i could give that a go as well.
kuzyt
That's how I fixed a lot of similar problems. It's a really simple solution and if you don't have hard memory restrictions, it should really help. Just make sure all the containers are synchronized - not hard to achieve.
Klaim
A: 

Coming at it from a different angle: I don't know where you store your employee IDs, but could you perhaps do all this stuff on the DB already, before filling your RosterTable?

Christian Severin
you are right. i can get this stuff from db. but the reason for caching is to make things less expensive in terms of db queries. we have multiple dbs that we talk to at the same time.
kuzyt
+2  A: 

Can anyone see anything that can be done to improve this code ?

Your threading/locking model is naïve. There are at least four problems that I can see.

  • Recursive locking;
  • locking in every member function;
  • missing locking in some functions;
  • the way you use function local static variables is not thread safe.

The recursive locking is suboptimal. It costs more both because it more complicated and because you are obtaining the exclusive access more than once in one thread. Recursive locking can be avoided by splitting the implementation to public/front end and internal/back end functions. Front end function do the locking and call back end function that do not contain any locking and never call any front end functions.

However, locking every member function is suboptimal, again. Many operations using the container will have to lock/unlock the container several times during what should be (semantically) atomic operation. It is also likely incorrect. While individual operations over the container are exclusive, the state of the container can change between these operations in manner that the code does not expect. Unless you know/can prove that this is not the case for you, use external locking over the whole container.

Also, at least size() is missing a lock.

wilx
trust me when i say in still a novice at this. but ur tips are excellent. thank you. so do you think if i can actually get rid of the mutexes iterator class it mite solve the issue ? i access the hash map object via the getTable method singleton. should i think about having a mutex there perhaps ???
kuzyt