views:

288

answers:

7

Hi,

After implementing most of the common and needed functions for my Graph implementation, I realized that a couple of functions (remove vertex, search vertex and get vertex) don't have the "best" implementation.

I'm using adjacency lists with linked lists for my Graph implementation and I was searching one vertex after the other until it finds the one I want. Like I said, I realized I was not using the "best" implementation. I can have 10000 vertices and need to search for the last one, but that vertex could have a link to the first one, which would speed up things considerably. But that's just an hypothetical case, it may or may not happen.

So, what algorithm do you recommend for search lookup? Our teachers talked about Breadth-first and Depth-first mostly (and Dikjstra' algorithm, but that's a completely different subject). Between those two, which one do you recommend?

It would be perfect if I could implement both but I don't have time for that, I need to pick up one and implement it has the first phase deadline is approaching...

My guess, is to go with Depth-first, seems easier to implement and looking at the way they work, it seems a best bet. But that really depends on the input.

But what do you guys suggest?

+4  A: 

If you’ve got an adjacency list, searching for a vertex simply means traversing that list. You could perhaps even order the list to decrease the needed lookup operations.

A graph traversal (such as DFS or BFS) won’t improve this form a performance point of view.

Konrad Rudolph
Are you saying that using DFS/BFS or traversing the linked list will probably be the same as performance goes?
Nazgulled
@Nazgulled If you are searching for something, and you stop when you find it, then there may be a difference based on the input data. If you are searching the whole graph for all nodes with a property, then there probably won't be a significant difference. But without specifics of your data it is not something we can predict.
Pete Kirkham
A: 

Depth-first search is best because

  1. It uses much less memory
  2. Easier to implement
Peter Alexander
How does DFS use less memory than BFS? Recursion also uses memory, and the iterative version requires a stack anyway. If anything, DFS uses AT LEAST the same amount of memory if implemented recursively and THE SAME amount of memory if done iteratively.
IVlad
IVlad - let's look at a potentially contrived example - a graph with a vertex that has 1 million adjacent vertices. A DFS is going to contain at most 2 vertices on the stack at any given time, where as a BFS is going to contain a million on the queue at the start.
Niki Yoshiuchi
@Niki - and let's look at a linked list graph with 1 million nodes. The exact opposite is true. Generally, the amount of memory used by DFS and BFS is the same. If DFS is implemented recursively, the memory used by DFS might be more (though still O(V)), because we might have multiple parameters for the function plus the function call overhead. If the DFS is implemented iteratively, then the memory used is generally the same even in practice. So saying DFS (or BFS) "uses much less memory" is completely wrong.
IVlad
@IVlad - You're right, there are cases where BFS uses less memory than DFS. But claiming that "DFS uses AT LEAST the same amount of memory" is false as I've shown, and "It uses much less memory" is false as you've shown. I never made any claims that DFS always uses less memory.
Niki Yoshiuchi
A: 

the depth first and breadth first algorithms are almost identical, except for the use of a stack in one (DFS), a queue in the other (BFS), and a few required member variables. Implementing them both shouldn't take you much extra time.

Additionally if you have an adjacency list of the vertices then your look up with be O(V) anyway. So little to nothing will be gained via using one of the two other searches.

Mimisbrunnr
Well, I actually have a simple stack library implemented already, but not a queue one (but it shouldn't be that hard anyway, I just want to minimize the time wasted, I can bother with it later, if I still have time before the deadline). Anyway, are you saying (second paragraph) that using DFS/BFS or traversing the linked list will probably be the same?
Nazgulled
A: 

I'd comment on Konrad's post but I can't comment yet so... I'd like to second that it doesn't make a difference in performance if you implement DFS or BFS over a simple linear search through your list. Your search for a particular node in the graph doesn't depend on the structure of the graph, hence it's not necessary to confine yourself to graph algorithms. In terms of coding time, the linear search is the best choice; if you want to brush up your skills in graph algorithms, implement DFS or BFS, whichever you feel like.

saramah
+1  A: 

I think BFS would usually be faster an average. Read the wiki pages for DFS and BFS.

The reason I say BFS is faster is because it has the property of reaching nodes in order of their distance from your starting node. So if your graph has N nodes and you want to search for node N and node 1, which is the node you start your search form, is linked to N, then you will find it immediately. DFS might expand the whole graph before this happens however. DFS will only be faster if you get lucky, while BFS will be faster if the nodes you search for are close to your starting node. In short, they both depend on the input, but I would choose BFS.

DFS is also harder to code without recursion, which makes BFS a bit faster in practice, since it is an iterative algorithm.

If you can normalize your nodes (number them from 1 to 10 000 and access them by number), then you can easily keep Exists[i] = true if node i is in the graph and false otherwise, giving you O(1) lookup time. Otherwise, consider using a hash table if normalization is not possible or you don't want to do it.

IVlad
Actually, I have already implemented an Hash Table that will also hold the same data as in the Graph (the data will be shared with pointers). However, one of the requisites of this uni project is that when we implement our data structures, we also implement the most common operations. I probably won't even search for a specific node in the Graph, I'll use the Hash Table instead.
Nazgulled
A: 

If you are searching for a specific vertex and terminating when you find it, I would recommend using A*, which is a best-first search.

The idea is that you calculate the distance from the source vertex to the current vertex you are processing, and then "guess" the distance from the current vertex to the target.

You start at the source, calculate the distance (0) plus the guess (whatever that might be) and add it to a priority queue where the priority is distance + guess. At each step, you remove the element with the smallest distance + guess, do the calculation for each vertex in its adjacency list and stick those in the priority queue. Stop when you find the target vertex.

If your heuristic (your "guess") is admissible, that is, if it's always an under-estimate, then you are guaranteed to find the shortest path to your target vertex the first time you visit it. If your heuristic is not admissible, then you will have to run the algorithm to completion to find the shortest path (although it sounds like you don't care about the shortest path, just any path).

It's not really any more difficult to implement than a breadth-first search (you just have to add the heuristic, really) but it will probably yield faster results. The only hard part is figuring out your heuristic. For vertices that represent geographical locations, a common heuristic is to use an "as-the-crow-flies" (direct distance) heuristic.

Niki Yoshiuchi
+2  A: 

Finding and deleting nodes in a graph is a "search" problem not a graph problem, so to make it better than O(n) = linear search, BFS, DFS, you need to store your nodes in a different data structure optimized for searching or sort them. This gives you O(log n) for find and delete operations. Candidatas are tree structures like b-trees or hash tables. If you want to code the stuff yourself I would go for a hash table which normally gives very good performance and is reasonably easy to implement.

Arno