ansaurus

Question

Can this breadth-first search be made faster?

Answer 1

A:

I'll bet that machine has more than one core, doesn't it? Run it in parallel.

Python Threading

stimms 2009-11-18 02:37:53

I think you mean Python Multiprocessing.

Jed Smith 2009-11-18 02:39:49

Answer 2

+1 A:

Hmm, doesn't BFS involve marking nodes you've already seen so you don't visit them again?

Carl Smotricz 2009-11-18 02:38:13

It does. If the node has already been placed in parent[], then it has been seen before, and so it is not put in the queue to act on again.

timbo 2009-11-18 03:15:25

Oh I see. I missed that, sorry. Still, would be cheaper to put that flag into each node than to manipulate an ever-growing set.

Carl Smotricz 2009-11-18 03:20:25

Answer 3

+8 A:

Well, given the upvotes on the comment, I'll make it an answer now.

The SQL in the tight loop is definitely slowing you down. I don't care how fast the call is. Think about it -- you're asking for a query to be parsed, a lookup to be run -- as fast as that is, it's still in a tight loop. What does your data set look like? Can you just SELECT the entire data set into memory, or at least work with it outside of MySQL?

If you work with that data in memory, you will see a significant performance gain.

Jed Smith 2009-11-18 02:44:36

seconded, 8000 nodes will easily fit in memory.

Carl Smotricz 2009-11-18 02:52:51

Good call! The table with the node information is simply rows of fromNode, toNode. I will investigate simply loading it into memory..maybe just a big Dictionary structure.

timbo 2009-11-18 03:30:44

As a simple test, I have loaded the MySQL into memory by simply using ENGINE=MEMORY on the CREATE TABLE definition. Same code now completes in about 2.5 seconds!

timbo 2009-11-18 03:50:35

@timbo: See there?

Jed Smith 2009-11-18 05:48:02

A bit late to the party, but are you using prepared statements for your query? Should be able to shave a little SQL processing time with that, especially in the in-memory variation.

Carl Smotricz 2009-11-18 14:37:11

Answer 4

+1 A:

Something like this:

#!/usr/bin/env python

from Queue import Queue

def traverse_path(fromNode, toNode, nodes):
    def getNeighbours(current, nodes):
        return nodes[current] if current in nodes else []

    def make_path(toNode, graph):
     result = []
     while 'Root' != toNode:
      result.append(toNode)
      toNode = graph[toNode]
     result.reverse()
     return result

    q = Queue()
    q.put(fromNode)
    graph = {fromNode: 'Root'}

    while not q.empty():
     # get the next node and add its neighbours to queue
     current = q.get()
     for neighbor in getNeighbours(current, nodes):
      # use neighbor only continue if not already visited
      if neighbor not in graph:
       graph[neighbor] = current
       q.put(neighbor)

     # check if destination
     if current == toNode:
      return make_path(toNode, graph)
    return []

if __name__ == '__main__':
    nodes = {
     'E1123': ['D111', 'D222', 'D333', 'D444'],
     'D111': ['C01', 'C02', 'C04'],
     'D222': ['C11', 'C03', 'C05'],
     'D333': ['C01'],
     'C02': ['B1'],
     'B1': ['A3455']
    }
    result = traverse_path('E1123', 'A3455', nodes)
    print result

['E1123', 'D111', 'C02', 'B1', 'A3455']

If you replace your SQL queries with a dictionary of lists (and that would be the tricky part), you will get this performance.

hughdbrown 2009-11-18 04:03:00

Awesome Hugh. I've been getting good performance with an in memory table. I will try the next step. There are about 14K connections and this will eventually be a web app so I need to think carefully about how the load into memory works..

timbo 2009-11-20 02:43:49

I've found that my code was similar to yours Hugh however yours is certainly more elegant. Apart from a locale-dependent spelling of 'neighbour/neighbor', the only extra suggestion I would have for the above is to call result.reverse() in make_path as this returns a list in the order fromNode -> toNode

timbo 2009-11-21 01:28:04

Hmmmm. That's a good idea. I'll add that.

hughdbrown 2009-11-21 01:32:32

ansaurus

tags:

views:

answers:

Can this breadth-first search be made faster?

related questions