Adaptive Maps in Scala (or Java) Preserving Insertion Order

views:

218

answers:

+7 Q:

Adaptive Maps in Scala (or Java) Preserving Insertion Order

I would like to find and reuse (if possible) a map implementation which has the following attributes:

While the number of entries is small, say < 32, underlying storage should be done in an array like this [key0, val0, key1, val1, ...] This storage scheme avoids many small Entry objects and provides for extremely fast look ups (even tho they are sequential scans!) on modern CPU's due to the CPU's cache not being invalidated and lack of pointer indirection into heap.
The map should maintain insertion order for key/value pairs regardless of the number of entries similar to LinkedHashMap

We are working on an in-memory representations of huge (millions of nodes/edges) graphs in Scala and having such a Map would allow us to store Node/Edge attributes as well as Edges per node in a much more efficient way for 99%+ of Nodes and Edges which have few attributes or neighbors while preserving chronological insertion order for both attributes and edges.

If anyone knows of a Scala or Java map with such characteristics I would be much obliged.

Thanx

Under java you can maintain a 2d array(spreadsheet). I wrote a program which basically defines a 2 d array with 3 coloumns of data, and 3 coloumns for looking up the data. the three coloumns are testID, SubtestID and Mode. This allows me to basically look up a value by testid and mode or any combination, or I can also reference by static placement. The table is loaded into memory at startup and referenced by the program. It is infinately expandable and new values can be added as needed.

If you are interested, I can post a code source example tonight.

Another idea may be to maintain a database in your program. Databases are designed to organize large amounts of data.

Adam Outler 2010-10-06 18:59:58

This answer does not address my specific narrow question of having an Adaptive Map. We did consider other graph representations, but for lots of technical reasons I can't go into, we must maintain a "localized" design where graph Nodes, Edges, etc (all Atoms really) have to have their own attribute map objects. Again, I want to avoid a common pattern of having many tiny Map.Entry-like objects for small (< 32 entry maps) to save on memory and to maintain on-CPU cache locality (i.e. scanning through a small array is always faster in practice than following a chain of heap pointers).

Alex Kravets 2010-10-07 21:38:42

+1 A:

While I'm not aware of any implementations that exactly fit your requirements, you may be interested in peeking at Flat3Map (source) in the Jakarta Commons library.

Unfortunately, the Jakarta libraries are rather outdated (e.g., no support for generics in the latest stable release, although it is promising to see that this is changing in trunk) and I usually prefer the Google Collections, but it might be worth your time to see how Apache implemented things.

Flat3Map doesn't preserve the order of the keys, unfortunately, but I do have a suggestion in regard to your original post. Instead of storing the keys and values in a single array like [key0, val0, key1, val1, ...], I recommend using parallel arrays; that is, one array with [key0, key1, ...] and another with [val0, val1, ...]. Normally I am not a proponent of parallel arrays, but at least this way you can have one array of type K, your key type, and another of type V, your value type. At the Java level, this has its own set of warts as you cannot use the syntax K[] keys = new K[32]; instead you'll need to use a bit of typecasting.

ide 2010-10-10 06:14:26

Now this *is* a type of answer that I was looking for. In my previous work, I found that "flat" maps (as apache ppl call them) become slower than standard hash maps only after 32 or even 64 entries, probably due to modern CPU's having very good on core caches and pointer indirection into the heap causing memory stalls. Ideally the switch from a "flat" to a standard map would happen based on a configurable threshold. I would upvote this answer but that will remove the question from the Unaswered queue :-) I want to keep the question prominent for a little while longer. Thanks for your answer.

Alex Kravets 2010-10-10 12:30:37

+1 A:

Have you measured with profiler if LinkedHashMap is too slow for you? Maybe you don't need that new map - premature optimization is the root of all evil.. Anyway for processing millions or more pieces of data in a second, even best-optimized map can be too slow, because every method call decreases performance as well in that cases. Then all you can do is to rewrite your algorithms from Java collections to arrays (i.e. int -> object maps).

iirekm 2010-10-12 10:26:22

The issue is not the speed or rather not just the speed, it's also the number of small Emtry objects allocated, retained and GC'ed.

Alex Kravets 2010-10-12 20:22:53

But allocation time adds up to the slowness - the more objects you allocate the slower program is, so it all reduces to performance measuring by the profiler.

iirekm 2010-10-13 07:38:49

Today where most computers have 4GB of memory memory usage optimizations rarely have sense. However when have, usually it's best to use the Flyweight pattern. An example can be found in TreeModel from Java Swing. instead of node.getAttribute(key) = node.attributeMap.get(key) use something like node.getAttribute(key) = graph.attributeModel.getAttribute(node)

iirekm 2010-10-13 07:56:02

Allocation adds to the slowness - I agree, that's why I dont' want to allocate all those unnecessary Map.Entry objects that simply hold tree pointers :-)

Alex Kravets 2010-10-13 21:19:31

+1 for the "premature optimization is the root of all evil". :D

Eric-Karl 2010-10-14 01:13:24

ansaurus

tags:

views:

answers:

Adaptive Maps in Scala (or Java) Preserving Insertion Order

related questions