Usage of multiprocessing is not possible after pagerank computation
My goal is to compute the PageRank of the vertices in the graph and then select some vertices with a high PageRank to compute the shortest distances from these vertices to the other vertices in the graph.
So I have a graph with more than 100 nodes, e.g. 200, and compute the PageRank for all vertices with graph_tool.centrality.pagerank
. Then I select some of them to compute the shortest distances from them. To do this fast the shortest distance computation for each selected vertex should run in a different process. But here I have the problem, that after using PageRank the shortest distance computation in a new process doesn't work, it just hangs in an infinite loop.
Here is some code to reproduce the error:
from multiprocessing import Pool
import graph_tool.all as gt
from numpy.random import randint, poisson
def retrieval_process(retrieval_data):
print('Start')
graph = retrieval_data[0]
entry_point = retrieval_data[1]
cutoff_distance = retrieval_data[2]
result = gt.shortest_distance(graph, source=graph.vertex(entry_point), max_dist=cutoff_distance)
print('Finish')
return list(result.a)
if __name__ == '__main__':
def corr(a, b):
if a == b:
return 0.999
else:
return 0.001
g, bm = gt.random_graph(200, lambda: poisson(10), directed=False,
model="blockmodel-traditional",
block_membership=lambda: randint(10),
vertex_corr=corr)
pr = gt.pagerank(g)
po = Pool(processes=1)
cutoff_distance = 5.0
entry_point = 0
r_data = (g, entry_point, cutoff_distance)
res = po.apply_async(retrieval_process, (r_data,))
result = res.get()
print result
If I remove the line with the PageRank computation everything works fine.