views:

280

answers:

3

Hi,

The deadline for this project is closing in very quickly and I don't have much time to deal with what it's left. So, instead of looking for the best (and probably more complicated/time consuming) algorithms, I'm looking for the easiest algorithms to implement a few operations on a Graph structure.

The operations I'll need to do is as follows:

  • List all users in the graph network given a distance X
  • List all users in the graph network given a distance X and the type of relation
  • Calculate the shortest path between 2 users on the graph network given a type of relation
  • Calculate the maximum distance between 2 users on the graph network
  • Calculate the most distant connected users on the graph network

A few notes about my Graph implementation:

  • The edge node has 2 properties, one is of type char and another int. They represent the type of relation and weight, respectively.
  • The Graph is implemented with linked lists, for both the vertices and edges. I mean, each vertex points to the next one and each vertex also points to the head of a different linked list, the edges for that specific vertex.

What I know about what I need to do:

  • I don't know if this is the easiest as I said above, but for the shortest path between 2 users, I believe the Dijkstra algorithm is what people seem to recommend pretty often so I think I'm going with that.
    • I've been searching and searching and I'm finding it hard to implement this algorithm, does anyone know of any tutorial or something easy to understand so I can implement this algorithm myself? If possible, with C source code examples, it would help a lot. I see many examples with math notations but that just confuses me even more.
    • Do you think it would help if I "converted" the graph to an adjacency matrix to represent the links weight and relation type? Would it be easier to perform the algorithm on that instead of the linked lists? I could easily implement a function to do that conversion when needed. I'm saying this because I got the feeling it would be easier after reading a couple of pages about the subject, but I could be wrong.
  • I don't have any ideas about the other 4 operations, suggestions?
+5  A: 

This assumes O(E log V) is an acceptable running time, if you're doing something online, this might not be, and it would require some higher powered machinery.

  • List all users in the graph network given a distance X

Djikstra's algorithm is good for this, for one time use. You can save the result for future use, with a linear scan through all the vertices (or better yet, sort and binary search).

  • List all users in the graph network given a distance X and the type of relation

Might be nearly the same as above -- just use some function where the weight would be infinity if it is not of the correct relation.

  • Calculate the shortest path between 2 users on the graph network given a type of relation

Same as above, essentially, just determine early if you match the two users. (Alternatively, you can "meet in the middle", and terminate early if you find someone on both shortest path spanning tree)

  • Calculate the maximum distance between 2 users on the graph network

Longest path is an NP-complete problem.

  • Calculate the most distant connected users on the graph network

This is the diameter of the graph, which you can read about on Math World.

As for the adjacency list vs adjacency matrix question, it depends on how densely populated your graph is. Also, if you want to cache results, then the matrix might be the way to go.

Larry
It's like ~18000 vertices with ~520000 links on the data sample provided.
Nazgulled
In that case, O(E log V) should be good enough, you can sort of store the matrix (maybe you can, but it's a bit much) ~18000^2 = ~324 mil, and it'll allow things to converge faster, but 18000^3 is for something like Floyd-Warshall is approaching too much.
Larry
+6  A: 

List all users in the graph network given a distance X

A distance X from what? from a starting node or a distance X between themselves? Can you give an example? This may or may not be as simple as doing a BF search or running Dijkstra.

Assuming you start at a certain node and want to list all nodes that have distances X to the starting node, just run BFS from the starting node. When you are about to insert a new node in the queue, check if the distance from the starting node to the node you want to insert the new node from + the weight of the edge from the node you want to insert the new node from to the new node is <= X. If it's strictly lower, insert the new node and if it is equal just print the new node (and only insert it if you can also have 0 as an edge weight).

List all users in the graph network given a distance X and the type of relation

See above. Just factor in the type of relation into the BFS: if the type of the parent is different than that of the node you are trying to insert into the queue, don't insert it.

Calculate the shortest path between 2 users on the graph network given a type of relation

The algorithm depends on a number of factors:

  • How often will you need to calculate this?
  • How many nodes do you have?

Since you want easy, the easiest are Roy-Floyd and Dijkstra's.

  • Using Roy-Floyd is cubic in the number of nodes, so inefficient. Only use this if you can afford to run it once and then answer each query in O(1). Use this if you can afford to keep an adjacency matrix in memory.
  • Dijkstra's is quadratic in the number of nodes if you want to keep it simple, but you'll have to run it each time you want to calculate the distance between two nodes. If you want to use Dijkstra's, use an adjacency list.

Here are C implementations: Roy-Floyd and Dijkstra_1, Dijkstra_2. You can find a lot on google with "<algorithm name> c implementation".

Edit: Roy-Floyd is out of the question for 18 000 nodes, as is an adjacency matrix. It would take way too much time to build and way too much memory. Your best bet is to either use Dijkstra's algorithm for each query, but preferably implementing Dijkstra using a heap - in the links I provided, use a heap to find the minimum. If you run the classical Dijkstra on each query, that could also take a very long time.

Another option is to use the Bellman-Ford algorithm on each query, which will give you O(Nodes*Edges) runtime per query. However, this is a big overestimate IF you don't implement it as Wikipedia tells you to. Instead, use a queue similar to the one used in BFS. Whenever a node updates its distance from the source, insert that node back into the queue. This will be very fast in practice, and will also work for negative weights. I suggest you use either this or the Dijkstra with heap, since classical Dijkstra might take a long time on 18 000 nodes.

Calculate the maximum distance between 2 users on the graph network

The simplest way is to use backtracking: try all possibilities and keep the longest path found. This is NP-complete, so polynomial solutions don't exist.

This is really bad if you have 18 000 nodes, I don't know any algorithm (simple or otherwise) that will work reasonably fast for so many nodes. Consider approximating it using greedy algorithms. Or maybe your graph has certain properties that you could take advantage of. For example, is it a DAG (Directed Acyclic Graph)?

Calculate the most distant connected users on the graph network

Meaning you want to find the diameter of the graph. The simplest way to do this is to find the distances between each two nodes (all pairs shortest paths - either run Roy-Floyd or Dijkstra between each two nodes and pick the two with the maximum distance).

Again, this is very hard to do fast with your number of nodes and edges. I'm afraid you're out of luck on these last two questions, unless your graph has special properties that can be exploited.

Do you think it would help if I "converted" the graph to an adjacency matrix to represent the links weight and relation type? Would it be easier to perform the algorithm on that instead of the linked lists? I could easily implement a function to do that conversion when needed. I'm saying this because I got the feeling it would be easier after reading a couple of pages about the subject, but I could be wrong.

No, adjacency matrix and Roy-Floyd are a very bad idea unless your application targets supercomputers.

IVlad
1) Distance X, I assume it's like the total weight between 2 nodes. 2) I don't know how often, the operations above are options in the UI menu. 3) It's like ~18000 vertices with ~520000 links on the data sample provided. 4) Also, thanks for the links, I'll investigate...
Nazgulled
I updated my answer, and also added some more questions :).
IVlad
There are no special properties... Well, if you say there are no special algorithms and that it will take too long with the usual ones, than I don't understand the point of these operations being part of the project. Last year (which I didn't complete because of personal issues) the project was similar but only the shortest path was asked :S
Nazgulled
Maybe the tutor just wants to see if you can implement the algorithms, and won't test the ones that CAN'T be fast on big graphs. Shortest paths can be fast, it's the last two that can't be. In my opinion, you should implement them anyway and, if the tutor asks why they're slow, explain that those two requirements cannot be done more efficiently.
IVlad
Just did a quick estimate using `clock()` from `time.h` and computing the diameter of the graph takes like ~6 hours on a friend's Mac (I'm on a Linux VM, so it takes almost double). Is this normal? The sample data has exactly 18484 nodes.
Nazgulled
Sounds normal, yes. It involves finding the distance between all pairs of nodes. What algorithm did you use? The fastest is to either use Dijktras's implemented with heaps or the Bellman-Ford I told you about. It won't be a lot faster than Roy-Floyd, but it will be a bit faster.
IVlad
I used Dijkstra's, but not with heaps... I'll implement the heap if I have time left. I'm still missing the longest path question (that I'm not sure what to do) and a long report to write about the whole thing.
Nazgulled
+1  A: 

The simplest algorithm to compute shortest path between two nodes is Floyd-Warshall. It's just triple-nested for loops; that's it.

It computes ALL-pairs shortest path in O(N^3), so it may do more work than necessary, and will take a while if N is huge.

polygenelubricants
I don't want it that easy lol. It would probably take a lot of time and I don't think the instructors would like that.
Nazgulled
Yeah, I just read about the 18K vertices now; that's definitely out of the question.
polygenelubricants