Potential bug in the clustering.motif_significance function
Bug report: Potential bug in the clustering.motif_significance function
Please follow the general troubleshooting steps first:
-
Are you running the latest graph-tool
version? -
Do you observe the problem with the current git version? -
Are you using Macports or Homebrew? If yes, please submit an issue there instead: https://github.com/Homebrew/brew/issues and https://trac.macports.org/newticket -
Did you compile graph-tool
manually? -
If you answered yes above, did you use the exact same compiler to build graph-tool
,boost-python
andPython
?
Report:
I am analyzing the motifs in a network with self-loops. I ran the clustering.motif function which identifies the motifs and the respective counts for each motif. Then, I looked at the clustering.motif_significance function and the full output also includes motifs and counts (along with zscores, sample counts and sample sd).
However, the lengths of the motif arrays produced by the functions are different, even though in the documentation it is written that the two functions produce the same motif output. Additionally, the counts array generated by clustering.motif_significance contains 0 values at the end, but I think those values correspond to motifs in a different part in the motif array (and should probably not be there if the motif occurs 0 times).
There, are also nan values in the zscores array, potentially caused by the self loops.
In the example below, I added a short for loop that checks the isomorphism of the motifs generated by the two functions and at some points the indices of the isomorphic pairs do not coincide.
Your exact graph-tool version: 2.58 Your operating system: MacOS Python Version: 3.11.6 | packaged by conda-forge A minimal working example that shows the problem:
from graph_tool import all as gt
gt.seed_rng(42)
g = gt.random_graph(100, lambda: (5,5), self_loops=True)
motifs_1, counts_1 = gt.motifs(g, 3)
motifs_2, zscores, counts_2, s_counts, s_dev = gt.motif_significance(g, 3, self_loops = True, full_output = True)
#Print motif lengths and counts
print(f"Motif_1 array length: {len(motifs_1)}")
print(f"Motif_2 array length: {len(motifs_2)}")
print(counts_1)
print(counts_2)
#Print Z-Scores
print(f"Z-Scores:{zscores}")
#Graph with index 18 is different in the two motif arrays but the count is the same
gt.graph_draw(motifs_1[18], vertex_font_size=12, edge_pen_width=1.5,
output_size=(1000, 1000), vertex_color="black",
edge_font_size=10, edge_text_color="red")
print(counts_1[18])
gt.graph_draw(motifs_2[18], vertex_font_size=12, edge_pen_width=1.5,
output_size=(1000, 1000), vertex_color="black",
edge_font_size=10, edge_text_color="red")
print(counts_2[18])
# Initialize a list to store isomorphic pairs
isomorphic_pairs = []
# Iterate through the graphs in motifs_array1
for index, graph in enumerate(motifs_1):
# Iterate through the graphs in motifs_1
for s_index, s_graph in enumerate(motifs_2):
# Check if the current graph in motifs_array2 is isomorphic to the current graph in motifs_1
if gt.isomorphism(graph, s_graph):
isomorphic_pairs.append((index, s_index))
# Print the isomorphic pairs
if isomorphic_pairs:
for motif_index, s_index in isomorphic_pairs:
print(f"motif_1 index: {motif_index}, motif_2 index: {s_index}")
else:
print("No isomorphic pairs found.")