I have implemented an unweighted random walk function for a graph that I built in Python using NetworkX. Below is a snippet of my program that deals with the random walk. Elsewhere in my program, I have a method that creates the graph, and I have a method that simulates various custom graph testing methods that I've written. One of these graph testing methods picks two nodes at random from the graph and runs a random walk between both of them. The two things that are being calculated from this Random Walk are hitting time (the number of links that are traversed from the starting to the ending point) and the commute time (the number of traversed links from starting to ending and back to the starting point).
def unweighted_random_walk(starting_point,ending_point, graph):
'''
starting_point: String that represents the starting point in the graph
ending_point: String that represents the ending point in the graph
graph: A NetworkX Graph object
'''
##Begin the random walk
current_point=starting_point
#current_node=graph[current_point]
current_point_neighors=graph.neighbors(current_point)
hitting_time=0
#Determine the hitting time to get to an arbitrary neighbor of the
#starting point
while current_point!=ending_point:
#pick one of the edges out of the starting_node with equal probs
possible_destination=current_point_neighbors[random.randint(0,current_point_neighors)]
current_point=possible_destination
current_point_neighbors=graph.neighbors(current_point)
hitting_time+=1
return hitting_time
My code for the random walk is pretty straight-forward because I'm just picking random nodes until the ending point is reached. However, this current implementation is very slow when I try running several random walks (I think I need to run a million at some point).
My question is: Is there any way that I can use Hadoop MapReduce to parallelize some of the operations that are going on here for this Random Walk? Is there a better way for me to do my random walk?