views:

239

answers:

13

I'm looking for a data structure similar to a dictionary that returns the set of all related items to a key.

For example, I would use it like this:

var data = new FancyDataStructure();

data.Add(new string[] {"Elizabeth", "Liz", "Betty"});
data.Add(new string[] {"Bob", "Robert", "Rob"});

string[] alternateNames1 = data["Betty"];
string[] alternateNames2 = data["Liz"]

In this instance, alternateNames1 would be an array containing "Liz" and "Elizabeth", and alternateNames2 would be an array containing "Elizabeth" and "Betty."

I don't want to reinvent this, but I couldn't find any examples of such a structure.

Update

Thank you to those that have written back with suggestions. Many people have suggested using some version of Dictionary<string, IEnumerable<string>>. Currently I am using this approach, but it doesn't actually fulfill the requirement without being horribly difficult to maintain. Every value in every list needs to be able to function as a key to every other value ever added to it in a set.

Thus, given the following:

data.Add(new string[] {"Elizabeth", "Liz"}
data.Add(new string[] {"Liz", "Betty"}
alternates = data["Betty"];

I would expect alternates to now contain "Elizabeth," and "Liz."

It looks as though I might just have to build such a structure to suit my needs. Keep the ideas coming though!

Brian

+1  A: 

System.Collections.Generic namespace and the System.Collections are loaded with KeyValue pair dictionaries, sorted dictionaries, List Objects and much more.

System.Collections.Generic.Dictionary<int, string> dic = new Dictionary<int, string>();
        dic.Add(1, test);

or a nested list inside a dictionary

Dictionary<string, List<string>> dic = new Dictionary<string, List<string>>();
List<string> alternatives = new List<string>();
alternatives.Add("Brenda");
dic.Add("Betty", alternatives);
sadboy
A: 

Something like this seems simple enough.

var data = new List<string[]>();

data.Add(new string[] {"Elizabeth", "Liz", "Betty"});
data.Add(new string[] {"Bob", "Robert", "Rob"});

var alternateNames1 = data.Where(x =>x.Contains("Betty")).Select(x => x.Where(y => y != "Betty"));
Paul Creasey
This won't scale for large "sets", you will still have O(n) lookups; if that is OK then go for it.
Jared Updike
A: 

The de facto alt.net standard is in Iesi.Collections, but the base class library only has HashSet<T> in dotnet 3.5 or above.

I've used "group by" like clauses in linq to easily remove duplicates from arbitrary IEnumerable<T> collections, but that doesn't quite give you set semantics.

HashSet<> is close to what you want.

Based on your requirements, I don't think there's something off the shelf that would map strings to pre-existing collections; basically, you'd have to write a class that takes a method like StoreAssociations<<T>>(IEnumerable<<T>> names), converts IEnumerable to HashSet, and iterates over each item in the HashSet to add a mapping in an IDictionary<string,HashSet<T>> to the newly-created hashset.

JasonTrue
A: 

I use this:

It has a generic Set<a> type and implements all of the lovely iterators, .Contains, .Count, etc.

Jared Updike
what does the C5 Set have over the BCL HashSet?
Jimmy
-1: OP's title might be a little misleading, but his question is not. OP doesn't want a set class, he wants a special kind of dictionary which maps multiple keys to the same value.
Juliet
HashSet doesn't exist on older frameworks :-) (before the relatively new 3.5)
Jared Updike
A: 

all you need for storage is a Dictionary.

Dictionary<string,string[]> data;

Add(string[] list) { 
    foreach(var name in list)
       alternates[name] = list;
}
Add(new string[] {"Elizabeth", "Liz", "Betty"});
Add(new string[] {"Bob", "Robert", "Rob"});
Jimmy
That's not quite what the questioner wants, because alternates["Bob"] includes "Bob".
Dominic Cooney
mmmm. I'd add that filter on the output side though. This way, you use O(N) memory instead of O(N^N)
Jimmy
A: 

I would just use the type Dictionary<string, IEnumerable<string>>. To build this structure from a list of lists, you could have code like this:

var alternateNames = new string[][] {
    new string[] { "Elizabeth", "Liz", "Betty" },
    new string[] { "Bob", "Robert", "Rob" }, };
var altNameLookup = 
    (
        from nameList in alternateNames
        from name in nameList
        select new { 
            Name = name, NameList = nameList.Except(new string[] { name } ) }
    ).ToDictionary(o => o.Name, o => o.NameList);
Jacob
A: 

Just a thought in another direction - strongly typed datasets seem to have a lot going for them. And serialized as byte arrays they are pretty fast for moving multidimensionally structured data around.

Iteration and Linq capability are sort of built in.

Maybe overkill for a lot of stuff, but I have a number of places where I stored the whole dataset in one varbinary(max) columnn in SQL.

Charles Hankey
A: 

You basically have a dictionary where multiple keys map to the same value. There's no built-in data structure which supports the operation you want, but its easy to represent as a Dictionary{string, HashSet{string}} in .NET:

static void AddNames(Dictionary<string, HashSet<string>> map, params string[] names)
{
    for (int i = 0; i < names.Length; i++)
    {
        HashSet<string> value;
        if (!map.TryGetValue(names[i], out value))
        {
            value = new HashSet<string>();
            map.Add(names[i], value);
        }

        for (int j = 0; j < names.Length; j++)
        {
            value.Add(names[j]);
        }
    }
}

static void Main(string[] args)
{
    Dictionary<string, HashSet<string>> names = new Dictionary<string,HashSet<string>>();
    AddNames(names, "Chris", "Christopher");
    AddNames(names, "Christina", "Chrissy", "Chris");

    HashSet<string> relatedToChris = names["Chris"];                // gets "Chris", "Christina", "Chrissy", "Christopher";
    HashSet<string> namesRelatedToChristinia = names["Christina"];  // gets "Christina", "Chrissy", "Chris";
}

You can think of your datastructure as a directed graph where each node has an edge connected to its related name. Since there are n^2 edges, the dictionary requires O(n^2) time to insert and memory. Its not possible to reduce the lookup time to anything better.

Fortunately, since its implemented as a dictionary, lookups as still O(1). Deletes are O(m) where m is the number of values related to a key.

Juliet
It seems that you have a distinct hashset here for every key, even though for several related keys the contents of their hashsets is the same. Consequently you have such high insert and delete complexity. Wouldn't it be better to have a single shared hashset for all related keys? Then insert would be O(m) (assuming O(1) key lookups), and delete would be O(1).
Pavel Minaev
A: 

Try using a dictionary, something like:

Dictionary<string, List<string>>

So a dictionary of string keys with values of List

Blankman
A: 

How about a pair of data structures: Dictionary<string, Guid>and Dictionary<Guid, List<string>>

To add a pair of keys (a, b) [you can decompose a larger add into pairs (1+2, 2+3, ...] proceed as follows:-

Lookup a and b in the first dictionary.
If neither exists, create a new Guid and add (a,g) and (b,g) to first dictionary and (g,List{a}) and (g,List{b}) to second dictionary.

If one exists, say a, grab the guid from it (g) and add the other (b, g) to the first dictionary and tack b onto the end of the list found at [g] in the second dictionary.

If both exist AND they have the same guid - nothing to do.

If both exist and they have different guids you need to merge the two sets // This is something most of the other proposed solutions seem to be missing // so pick a Guid to eliminate, go get it from the second dictionary, add the list of strings to the other entry and then remove this entry. Finally mark all the words in the first dictionary that were in that list.

To get all the related words, lookup the Guid in the first dictionary and grab the list from the second dictionary.

Of course a static incrementing long value would probably work better than a Guid.

Hightechrider
You could call that a "relational solution" :) worth expanding on algorithmic complexity of lookup/insert/delete in your solution, though.
Pavel Minaev
A: 

Or, since List is a reference type you could do the following ...

Dictionary<string, List<string>> dict = new ...

Proceed as follows:-

To add a single association (a = b) {decomposed from a list of equivalences}

Lookup a and b in the Dictionary

If neither exists

dict.Add(a, new List<string>(){a}); dict.Add(b, new List<string>(){b});

If one exists, say, a

var list = dict[a];
list.Add(b);
dict.Add(b, list);

If both exist and the lists are the same (object compare) you are done.

If both exists and the lists are different:

var list1 = dict[a];
var list2 = dict[b];
list1.AddRange(list2);
dict.Remove(b);
dict.Add(b, list1);
Hightechrider
A: 

I wrote some code, I don't know how efficient it, but it i think that it does what you want.

It is your structure

class FancyDataStructure
{
    private IDictionary<string, HashSet<string>> dictionary 
        = new Dictionary<string, HashSet<string>>();

    public void Add(params string[] names)
    {
        HashSet<string> set = new HashSet<string>(names);
        for (int i = 0; i < names.Length; i++)
        {
            if (!dictionary.ContainsKey(names[i]))
            {
                dictionary.Add(names[i], set);
            }
            else
            {
                HashSet<string> union = 
                new HashSet<string>(set.Union<string>(dictionary[names[i]]));
                set = union;
                foreach (string oldName in dictionary[names[i]])
                {
                    dictionary[oldName] = union;
                }
                for (int j = 0; j < i; j++)
                {
                    if (!dictionary.ContainsKey(names[j]))
                    {
                        dictionary.Add(names[j], union);
                    }
                }
            }
        }
    }

    public string[] this[string key]
    {
        get
        {
            List<string> result = dictionary[key].ToList<string>();
            result.Remove(key);
            return result.ToArray();
        }
    }
}

and you can use it, like this

    static void Main(string[] args)
    {

        FancyDataStructure data = new FancyDataStructure();

        data.Add("Elizabeth", "Liz");
        data.Add("Liz", "Betty");

        string[] alternates = data["Betty"];
        foreach (var item in alternates)
        {
            Console.WriteLine(item);
        }
    }
+1  A: 

Your problem sounds like it is really a graphing problem. Think of the names as nodes and membership in the set as the edges. From this standpoint, you would want a data structure that handles sparse graphs well, such as an adjacency list. This is, of course, similary to what you are already doing with a Dictionary<string, IEnumerable<string>> but thinking about it in this way might lead you to some helpful implementations and algorithms.

Dolphin
I think you're likely right in treating this as something that can't be easily composed from existing structures. I've already come up with a workaround to my needs, but if I have some more time I will actually try to build out this new structure and post here if I come up with something usable.
Brian Vallelunga