views:

325

answers:

2

Hi All --

I am converting a VB6 to C# and I want to make my data structure that holds values and relationships more efficient. In VB I have a collection of values and another collection of relationships between those values with priorities for those relationships. I also have an algorithm that when a set of values is passed to it all relationships required to join those values together is returned. For example, say the values collection contains 1-10 and the relationship collection contains

1,2
3,2
5,2
2,8
8,10
9,10

If the input was 1,9,10 the returned relationships would be --

1,2
2,8
8,10
9,10

Since there may be multiple paths the least amount of relationships would be returned but there is a caveat of relationship priorities. If a relationship has a higher priority then that relationship would be added and the rest of the relationships would be added from there. I am thinking of using a Disjoint-set data structure but I am not sure.

Any ideas?

Thanks

More information --

The number of values would normally be less than 100 and the relationships less than 500. The collections are static and the algorithm will be used again and again to find paths. Also, I did not ask this but would the algorithm in Disjoint-set data structure be the most efficient?

+2  A: 

You need to ask yourself (and tell us) what kind of pattern of use you expect. Do these relations get added in order or randomly, do yours queries come in order (as you show them) or randomly, and is this essentially a batch process -- load them up, read off the queries -- or do you expect to do it "on line" in the sense that you may add some, then query some, then add some more and query some more?

Will you know how many you want to store beforehand, and how many do you expect to store? Dozens? Thousands? Tens of millions?

Here are some suggestions:

  • if you know beforehand how many you expect to store, it's not a really big number, you don't expect to add them after first loading up, there aren't any duplicates in the left-hand side of the pair, and they're reasonably "dense" in the sense that there aren't big gaps between numbers in the left-hand one of the pair, then you probably want an array. Insertion is O(1), access is O(1), but can't have duplicate indices and expanding it after you build it is a pain.
  • if the number is really large, like > 108, you probably want some kind of database. Databases are relatively very slow -- 4 to 5 orders of magnitude greater than in-memory data structures -- but handle really big data.
  • If you have insertions after the first load, and you care about order, you're going to want some sort of tree, like a 2-3 tree. Insertion and access both O(lg n). You'd probably find an implmentation under a name like "ordered list" (I'm not a C# guy.)
  • Most any other case, you probably want a hash. Average insertion and access both O(1), like an array; worst case [which you won't hit with this data] is O(n)
Charlie Martin
+5  A: 

It sounds like what you have is a Graph. This is a structure with Nodes and Edges. There are many many libraries and tools that deal with Graphs. Microsoft even put out a paper on how to deal with them. I think graphs are great and extremely useful in many situations.

One big benefit with graphs is the ability to assign priorities to the edges between the nodes. Then when you want to find the path between two nodes, boom, the graph can choose the path with the ideal priority.

In your situation, your values are the nodes and your relationships are the edges.

thanks, I believe this is the way to go. I actually read parts 1-4 of that article but never got back to read the last 2. Now I guess I have to :)
Dan R.