views:

440

answers:

2

I have implemented an unweighted random walk function for a graph that I built in Python using NetworkX. Below is a snippet of my program that deals with the random walk. Elsewhere in my program, I have a method that creates the graph, and I have a method that simulates various custom graph testing methods that I've written. One of these graph testing methods picks two nodes at random from the graph and runs a random walk between both of them. The two things that are being calculated from this Random Walk are hitting time (the number of links that are traversed from the starting to the ending point) and the commute time (the number of traversed links from starting to ending and back to the starting point).

def unweighted_random_walk(starting_point,ending_point, graph):
    '''
    starting_point: String that represents the starting point in the graph
    ending_point: String that represents the ending point in the graph
    graph: A NetworkX Graph object
    '''
    ##Begin the random walk
    current_point=starting_point
    #current_node=graph[current_point]
    current_point_neighors=graph.neighbors(current_point)
    hitting_time=0

    #Determine the hitting time to get to an arbitrary neighbor of the
    #starting point
    while current_point!=ending_point:
        #pick one of the edges out of the starting_node with equal probs
        possible_destination=current_point_neighbors[random.randint(0,current_point_neighors)]
        current_point=possible_destination
        current_point_neighbors=graph.neighbors(current_point)
        hitting_time+=1
    return hitting_time

My code for the random walk is pretty straight-forward because I'm just picking random nodes until the ending point is reached. However, this current implementation is very slow when I try running several random walks (I think I need to run a million at some point).

My question is: Is there any way that I can use Hadoop MapReduce to parallelize some of the operations that are going on here for this Random Walk? Is there a better way for me to do my random walk?

+3  A: 

I don't see how map-reduce can help you. It's used where you have a two-part operation: the first part is a calculation that can be performed independently on many different data elements, and the second part is somehow combining all those results. Perhaps there's a clever way to use map-reduce to help with this random walk, but I don't see it.

Your random walk is completely random: it could end up with many loops, even hopping back and forth between the same two nodes before continuing on. Perhaps you want to somehow constrain it so you don't have so large a space to search?

Ned Batchelder
A: 
gotgenes
Thanks for the advice. I refactored my function after you advice. I'm new to the whole distributed computing thing, and I recently got access to a Hadoop cluster, so I thought that I could use that in some way to perform a larger amount of random walks.
rohanbk