distance_histogram seems to be broken
Bug reports:
When I calculate the all pairs shortest path distances of a graph using shortest_distance()
and histogram them using np.histogram
, there is a hefty mismatch to distance_histogram()
's output just for larger graphs.
Please follow the general troubleshooting steps first:
-
Are you running the latest graph-tool
version? -
Do you observe the problem with the current git version? -
Are you using Macports or Homebrew? If yes, please submit an issue there instead: https://github.com/Homebrew/brew/issues and https://trac.macports.org/newticket -
Did you compile graph-tool
manually?
Please replace this section with a brief summary of your issue.
Do not forget to supply the following information:
-
A minimal and self-contained example that shows the problem. -
Your operating system. -
The Python version you are using. -
If you compiled graph-tool manually: Your compiler version, as well as the version of Boost being used.
Setup
gt.__version__ == 2.43 (commit 5778eb10, )
- Python version: 3.9.7
-
uname -a
Linux hostname 5.11.0-38-generic #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Installed / automatically compiled via conda-forge
MWE
import graph_tool.all as gt
import numpy as np
G = gt.random_graph(300, lambda: np.random.randint(1,10,2))
G_weight = G.new_edge_property('float')
for e in G.edges():
G_weight[e] = 100 * np.random.random()
apsp_true = [ d for l in gt.shortest_distance(G, weights=G_weight) for d in l if 0 < d <= 1e50 ]
counts, bins = gt.distance_histogram(G, G_weight)
counts_np, _ = np.histogram(apsp_true, bins=bins)
print(np.where(counts!=counts_np))
Up to 300 nodes, this returns (array([], dtype=int64),)
as expected. From 301 onward there is a great mismatch:
(array([ 2, 3, 4, 6, 9, 12, 13, 14, 16, 17, 20, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 33, 35, 36, 37, 41,
42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 56,
57, 61, 62, 65, 66, 67, 68, 69, 70, 71, 72, 73, 75,
76, 77, 78, 79, 80, 81, 83, 84, 85, 86, 87, 88, 89,
90, 91, 92, 94, 95, 96, 97, 98, 100, 101, 102, 103, 104,
105, 106, 107, 108, 110, 111, 112, 113, 114, 115, 116, 117, 118,
119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131,
132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144,
145, 146, 147, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158,
159, 161, 162, 163, 166, 167, 168, 169, 170, 171, 172, 173, 174,
175, 176, 177, 180, 181, 182, 183, 184, 185, 187, 188, 189, 190,
191, 192, 193, 194, 195, 196, 197, 199, 200, 201, 202, 203, 205,
206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218,
219, 220, 221, 222, 223, 224, 225, 226, 228, 229, 230, 231, 233,
234, 236, 237, 238, 240, 241, 242, 243, 245, 246, 247, 248, 249,
250, 253, 254, 255, 256, 257, 258, 259, 261, 262, 263, 265, 266,
267, 268, 271, 272, 273, 275, 276, 277, 279, 281, 284, 288, 305,
306, 308, 310, 313, 321, 326, 333, 334, 348, 361, 365, 424]),)
For some graphs even sum(counts) != sum(counts_np)
.
What's happening here?