views:

152

answers:

6

hi,

I am trying to implement LRU Cache using C++ . I would like to know what is the best design for implementing them. I know LRU should provide find(), add an element and remove an element. The remove should remove the LRU element. what is the best ADTs to implement this For ex: If I use a map with element as value and time counter as key I can search in O(logn) time, Inserting is O(n), deleting is O(logn).

Thanks & Regards,

Brett.

A: 

I suggest a heap and maybe a Fibonacci Heap

rano
+2  A: 

When you say priority, I think "heap" which naturally leads to increase-key and delete-min.

msw
+1  A: 

I would not make the cache visible to the outside world at all if I could avoid it. I'd just have a collection (of whatever) and handle the caching invisibly, adding and removing items as needed, but the external interface would be exactly that of the underlying collection.

As far as the implementation goes, a heap is probably the most obvious. It has complexities roughly similar to a map, but instead of building a tree from linked nodes, it arranges items in an array and the "links" are implicit based on array indices. This increases the storage density of your cache and improves locality in the "real" (physical) processor cache.

Jerry Coffin
A: 

I'd go with a normal heap in C++.

With the std::make_heap (guaranteed by the standard to be O(n)), std::pop_heap, and std::push_heap in #include, implementing it would be absolutely cake. You only have to worry about increase-key.

Dragontamer5788
+1  A: 

The best way to implement an LRU is to use the combination of a std::list and stdext::hash_map (want to use only std then std::map).

  • Store the data in the list so that the least recently used in at the last and use the map to point to the list items.
  • For "get" use the map to get the list addr and retrieve the data and move the current node to the
    first(since this was used now) and update the map.
  • For "insert" remove the last element from the list and add the new data to the front and update the map.

This is the fastest u can get, If u are using a hash_map u should almost have all the operations done in O(1). If using std::map it should take O(logn) in all cases.

A very good implementation is available here

aeh
I looked at the code of `lru_cache`. It's based on a `std::map` which means `O(n log n)` complexity.
Matthieu M.
@Matthieu: its not O(nlogn) but O(logn) coz each insert/erase/get is O(logn) according to stl doc. However just replacing the std::map with stdext::hash_map with proper hash function will surely get you O(1)
aeh
@aeh: yep sorry, got a bit carried away here. `O(log n)` of course. And we both agree that a hash map would probably be better.
Matthieu M.
+1  A: 

One major issue with LRU caches is that there is little "const" operations, most will change the underlying representation (if only because they bump the element accessed).

This is of course very inconvenient, because it means it's not a traditional STL container, and therefore any idea of exhibiting iterators is quite complicated: when the iterator is dereferenced this is an access, which should modify the list we are iterating on... oh my.

And there are the performances consideration, both in term of speed and memory consumption.

It is unfortunate, but you'll need some way to organize your data in a queue (LRU) (with the possibility to remove elements from the middle) and this means your elements will have to be independant from one another. A std::list fits, of course, but it's more than you need. A singly-linked list is sufficient here, since you don't need to iterate the list backward (you just want a queue, after all).

However one major drawback of those is their poor locality of reference, if you need more speed you'll need to provide your own custom (pool ?) allocator for the nodes, so that they are kept as close together as possible. This will also alleviate heap fragmentation somewhat.

Next, you obviously need an index structure (for the cache bit). The most natural is to turn toward a hash map. std::tr1::unordered_map, std::unordered_map or boost::unordered_map are normally good quality implementation, some should be available to you. They also allocate extra nodes for hash collision handling, you might prefer other kinds of hash maps, check out Wikipedia's article on the subject and read about the characteristics of the various implementation technics.

Continuing, there is the (obvious) threading support. If you don't need thread support, then it's fine, if you do however, it's a bit more complicated:

  • As I said, there is little const operation on such a structure, thus you don't really need to differentiate Read/Write accesses
  • Internal locking is fine, but you might find that it doesn't play nice with your uses. The issue with internal locking is that it doesn't support the concept of "transaction" since it relinquish the lock between each call. If this is your case, transform your object into a mutex and provide a std::unique_ptr<Lock> lock() method (in debug, you can assert than the lock is taken at the entry point of each method)
  • There is (in locking strategies) the issue of reentrance, ie the ability to "relock" the mutex from within the same thread, check Boost.Thread for more information about the various locks and mutexes available

Finally, there is the issue of error reporting. Since it is expected that a cache may not be able to retrieve the data you put in, I would consider using an exception "poor taste". Consider either pointers (Value*) or Boost.Optional (boost::optional<Value&>). I would prefer Boost.Optional because its semantic is clear.

Matthieu M.