views:

129

answers:

3

I am looking for an efficient indexed persistent data structure. I typically work in .NET and am aware of FSharp's Map however that implementation and most others I am aware of only provide a single 'index', the left side of the mapping.

Basically here is the scenario

public class MyObject
    public int Id { get; }
    public int GroupId { get; }
    public string Name { get; }

Where the Id of an object will be globally unique set of items added. GroupId may have duplicate values, and I want to be able to query for all values with a matching GroupId and within a GroupId names will be unique but may be duplicated across different GroupId's. This not a situation where I can simply create a composite key of the 3 fields as I need independent access to groups of the items based on particular field values.

I can do this, and have in the past, using dictionaries of dictionaries, which has been recommended in other posts here on STackoverflow...however, I also want the data structure to be 1) Fully Persistent and everything that means 2) efficient in memory - meaning that versions need to share as many nodes as possible 3) efficient in modifcations - I would like it to be fast

I realize that I am asking for quite a bit here but I wanted to ask to avoid even trying to re-invent the wheel if it has already been done.

Thanks

A: 

Just as you could use a Dictionary of Dictionaries, I expect that e.g. an F# Map of Maps may be what you want, e.g.

Map<int, Map<string, MyObject> >  // int is groupid, string is name

maybe? I am unclear if you also need fast access by integer id.

You might also check out Clojure's library; I don't know much about Clojure, but a range of efficient persistent data structures seems to be one of Clojure's strengths.

Brian
I am aware of clojure. What I am looking for is an integrated data structure that handles the multi-key lookup for me. A map of maps will work but it means that managing all the map replacements has to be done by me to keep the persistent nature of the structure.
mwatts42
A: 

It seems that you are trying to apply OOP principles to your FP application.

If you think in terms of functions, what is it you are trying to do?

If you use a List, for example, you can just tell it you want to pull all the objects that have a certain group value.

If you need fast access by group you could have a Map of Lists so you can pull up all the objects in a group.

There are different data structures and many functions that work on each, but you should first think about your problem from a functional, not object-oriented, POV.

James Black
Thanks for your comment but I think I am looking at it from a functional point of view. I need something a little smarter than just lists and pulling from them based on a criteria because my structure can have potentially about 4-5 million items in it and they all need to be in memory at once. In addition I need get as close to O(1) access as I can get, which means simply using a list and filter is not efficient enough
mwatts42
+2  A: 

I am not sure why elsewhere, and in existing replies to your question, people recommend to imbricate existing structures. Imbricating structures (maps of maps, maps of lists, dictionaries of dictionaries, ...) only works for two indexes if one is looser than the other (two values having the same index for Index1 implies these two values have the same index for Index2), which is an unnecessary constraint.

I would use a record of maps, as many of them as you want different indexes, and I would maintain the invariant that every value that is present in a map is present in all the others in the same record. Adding a value obviously requires adding it to all maps in the record. Similarly for removal. The invariant can be made impossible to transgress from the outside through encapsulation.

If you worry that the values stored in your data structure would be duplicated, don't. Each map would only contain a pointer. They would all point to the same single representation of the value. Sharing will be as good as it already is with simple single-indexed maps.

Pascal Cuoq
Pascal, thanks for your comment. I am actually using something very close to what you describe already, I was just looking to see if something better already existed in the functional world. I guess what I was hoping for was basically a data structure that is a functional database table that would let me do very fast efficient lookup on contents.
mwatts42