diff --git a/doc/demos/inference/_background.rst b/doc/demos/inference/_background.rst index 0ec8c3218f769e41fb9dd598a5b76d1745b04bfd..403977e5b72ccd01ff0c59777a1fc3de066fd628 100644 --- a/doc/demos/inference/_background.rst +++ b/doc/demos/inference/_background.rst @@ -161,10 +161,10 @@ figure. .. note:: - We emphasize that no constraints are imposed on what kind of - modular structure is allowed, as the matrix of edge counts :math:e - is unconstrained. Hence, we can detect the putatively typical pattern - of "community structure" + With the SBM no constraints are imposed on what kind of modular + structure is allowed, as the matrix of edge counts :math:e is + unconstrained. Hence, we can detect the putatively typical pattern of + "community structure" _, i.e. when nodes are connected mostly to other nodes of the same group, if it happens to be the most likely network description, but we can also diff --git a/doc/demos/inference/_reconstruction.rst b/doc/demos/inference/_reconstruction.rst index 613caa63c07e85835fe80352a7aff120e20f47f0..49259acbcaf32a06512d771cfb7945bbf3addf14 100644 --- a/doc/demos/inference/_reconstruction.rst +++ b/doc/demos/inference/_reconstruction.rst @@ -13,6 +13,7 @@ the observed data, the network can be reconstructed according to the posterior distribution, .. math:: + :label: posterior-reconstruction P(\boldsymbol A, \boldsymbol b | \boldsymbol{\mathcal{D}}) = \frac{P(\boldsymbol{\mathcal{D}} | \boldsymbol A)P(\boldsymbol A, \boldsymbol b)}{P(\boldsymbol{\mathcal{D}})} @@ -111,14 +112,14 @@ In this situation the priors :math:P(p|\alpha=1,\beta=1) and .. note:: - It is important to emphasize that since this approach makes use of - the *correlations* between edges to inform the reconstruction, as - described by the inferred SBM, this means it can also be used when - only single measurements have been performed, :math:n_{ij}=1, and - the error magnitudes :math:p and :math:q are unknown. Since every - arbitrary adjacency matrix can be cast in this setting, this method - can be used to reconstruct networks for which no error assessments of - any kind have been provided. + Since this approach also makes use of the *correlations* between + edges to inform the reconstruction, as described by the inferred SBM, + this means it can also be used when only single measurements have + been performed, :math:n_{ij}=1, and the error magnitudes :math:p + and :math:q are unknown. Since every arbitrary adjacency matrix can + be cast in this setting, this method can be used to reconstruct + networks for which no error assessments of any kind have been + provided. Below, we illustrate how the reconstruction can be performed with a simple example, using @@ -173,7 +174,8 @@ simple example, using global pv, u, cs u = s.collect_marginal(u) bstate = s.get_block_state() - pv = bstate.levels[0].collect_vertex_marginals(pv) + b = gt.perfect_prop_hash([bstate.levels[0].b])[0] + pv = bstate.levels[0].collect_vertex_marginals(pv, b=b) cs.append(gt.local_clustering(s.get_graph()).fa.mean()) gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), @@ -310,9 +312,9 @@ Which yields: .. testoutput:: measured - Posterior probability of edge (11, 36): 0.515651... - Posterior probability of non-edge (15, 73): 0.009000... - Estimated average local clustering: 0.571673 ± 0.003228... + Posterior probability of edge (11, 36): 0.967896... + Posterior probability of non-edge (15, 73): 0.038703... + Estimated average local clustering: 0.572129 ± 0.005409... The results are very similar to the ones obtained with the uniform model in this case, but can be quite different in situations where a large @@ -423,7 +425,8 @@ inference: global pv, u, cs u = s.collect_marginal(u) bstate = s.get_block_state() - pv = bstate.levels[0].collect_vertex_marginals(pv) + b = gt.perfect_prop_hash([bstate.levels[0].b])[0] + pv = bstate.levels[0].collect_vertex_marginals(pv, b=b) cs.append(gt.local_clustering(s.get_graph()).fa.mean()) gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), diff --git a/doc/demos/inference/_reconstruction_dynamics.rst b/doc/demos/inference/_reconstruction_dynamics.rst index 4263fc1b306d312603756c47b4c2386398be4108..0bd822c95a7279dc0a619323500bda905e032243 100644 --- a/doc/demos/inference/_reconstruction_dynamics.rst +++ b/doc/demos/inference/_reconstruction_dynamics.rst @@ -3,7 +3,76 @@ Reconstruction from dynamics ++++++++++++++++++++++++++++ +In some cases direct measurements of the edges of a network are either +impossible to be done, or can be done only at significant experimental +cost. In such situations, we are required to infer the network of +interactions from the observed functional behavior +[peixoto-network-2019]_. In graph-tool this can be done for epidemic +spreading (via +:class:~graph_tool.inference.uncertain_blockmodel.EpidemicsBlockState), +for the kinetic Ising model (via +:class:~graph_tool.inference.uncertain_blockmodel.IsingGlauberBlockState +and +:class:~graph_tool.inference.uncertain_blockmodel.CIsingGlauberBlockState), +and for the equilibrium Ising model (via +:class:~graph_tool.inference.uncertain_blockmodel.PseudoIsingBlockState +and +:class:~graph_tool.inference.uncertain_blockmodel.PseudoCIsingBlockState). +We consider the general reconstruction framework outlined above, where +the observed data :math:\mathcal{D} in +Eq. :eq:posterior-reconstruction are a time-series generated by some +of the supported processes. Just like before, the posterior distribution +includes not only the adjacency matrix, but also the parameters of the +dynamical model and of the SBM that is used as a prior. +For example, in the case of a SIS epidemics, where :math:\sigma_i(t)=1 +means node :math:i is infected at time :math:t, or :math:0 +otherwise, the likelihood for a time-series :math:\boldsymbol\sigma is + +.. math:: + + P(\boldsymbol\sigma|\boldsymbol A,\boldsymbol\beta,\gamma)=\prod_t\prod_iP(\sigma_i(t)|\boldsymbol\sigma(t-1)), + +where + +.. math:: + + P(\sigma_i(t)|\boldsymbol\sigma(t-1)) = + f(e^{m_i(t-1)}, \sigma_i(t))^{1-\sigma_i(t-1)} \times f(\gamma,\sigma_i(t))^{\sigma_i(t-1)} + +is the transition probability for node :math:i at time :math:t, with +:math:f(p,\sigma) = (1-p)^{\sigma}p^{1-\sigma}, and where + +.. math:: + + m_i(t) = \sum_jA_{ij}\ln(1-\beta_{ij})\sigma_j(t) + +is the contribution from all +neighbors of node :math:i to its infection probability at time :math:t. In the +equations above the value :math:\beta_{ij} is the probability of an infection +via an existing edge :math:(i,j), and :math:\gamma is the :math:1\to 0 +recovery probability. With these additional parameters, the full +posterior distribution for the reconstruction becomes + +.. math:: + + P(\boldsymbol A,\boldsymbol b,\boldsymbol\beta|\boldsymbol\sigma) = + \frac{P(\boldsymbol\sigma|\boldsymbol A,\boldsymbol b,\gamma)P(\boldsymbol A|\boldsymbol b)P(\boldsymbol b)P(\boldsymbol\beta)}{P(\boldsymbol\sigma|\gamma)}. + +Since :math:\beta_{ij}\in[0,1] we use the uniform prior +:math:P(\boldsymbol\beta)=1. Note also that the recovery probably +:math:\gamma plays no role on the reconstruction algorithm, since its +term in the likelihood does not involve :math:\boldsymbol A (and +hence, gets cancelled out in the denominator +:math:P(\boldsymbol\sigma|\gamma)=P(\gamma|\boldsymbol\sigma)P(\boldsymbol\sigma)/P(\gamma)). This +means the above posterior only depends on the infection events +:math:0\to 1, and thus is also valid without any modifications to all +epidemic variants SI, SIR, SEIR, etc, since the infection events occur +with the same probability for all these models. + +In the example below is shown how to perform reconstruction from an +epidemic process. + .. testsetup:: dynamics import os @@ -16,47 +85,75 @@ Reconstruction from dynamics .. testcode:: dynamics - g = gt.collection.data["dolphins"] - - ss = [] - for i in range(1000): - si_state = gt.SIState(g, beta=1) - s = [] - for j in range(10): - si_state.iterate_sync() - s.append(si_state.get_state().copy()) - s = gt.group_vector_property(s) - ss.append(s) + # We will first simulate the dynamics with a given network + g = gt.collection.data["dolphins"] - u = g.copy() - u.clear_edges() - ss = [u.own_property(s) for s in ss] + # The algorithm accepts multiple independent time-series for the + # reconstruction. We will generate 100 SI cascades starting from a + # random node each time, and uniform infection probability 0.7. - rstate = gt.EpidemicsBlockState(u, s=ss, beta=None, r=1e-6, global_beta=.99, - state_args=dict(B=1), nested=False) + ss = [] + for i in range(100): + si_state = gt.SIState(g, beta=.7) + s = [si_state.get_state().copy()] + for j in range(10): + si_state.iterate_sync() + s.append(si_state.get_state().copy()) + # Each time series should be represented as a single vector-valued + # vertex property map with the states for each note at each time. + s = gt.group_vector_property(s) + ss.append(s) - # Now we collect the marginals for exactly 100,000 sweeps, at - # intervals of 10 sweeps: + # Prepare the initial state of the reconstruction as an empty graph + u = g.copy() + u.clear_edges() + ss = [u.own_property(s) for s in ss] # time series properties need to be 'owned' by graph u + # Create reconstruction state + rstate = gt.EpidemicsBlockState(u, s=ss, beta=None, r=1e-6, global_beta=.1, + state_args=dict(B=1), nested=False, aE=g.num_edges()) + + # Now we collect the marginals for exactly 100,000 sweeps, at + # intervals of 10 sweeps: + gm = None bm = None - - def collect_marginals(s): - global gm, bm - gm = s.collect_marginal(gm) - b = gt.perfect_prop_hash([s.bstate.b])[0] + betas = [] + + def collect_marginals(s): + global gm, bm + gm = s.collect_marginal(gm) + b = gt.perfect_prop_hash([s.bstate.b])[0] bm = s.bstate.collect_vertex_marginals(bm, b=b) + betas.append(s.params["global_beta"]) + + gt.mcmc_equilibrate(rstate, force_niter=10000, mcmc_args=dict(niter=10, xstep=0), + callback=collect_marginals) + + print("Posterior similarity: ", gt.similarity(g, gm, g.new_ep("double", 1), gm.ep.eprob)) + print("Inferred infection probability: %g ± %g" % (mean(betas), std(betas))) + + gt.graph_draw(gm, gm.own_property(g.vp.pos), vertex_shape="pie", vertex_color="black", + vertex_pie_fractions=gm.own_property(bm), vertex_pen_width=1, + edge_pen_width=gt.prop_to_size(gm.ep.eprob, 0, 5), + eorder=gm.ep.eprob, output="dolphins-posterior.svg") + + +The reconstruction can accurately recover the hidden network and the infection probability: + +.. testoutput:: dynamics - gt.mcmc_equilibrate(rstate, force_niter=10000, mcmc_args=dict(niter=10, xstep=0, p=0, h=0), - callback=collect_marginals) + Posterior similarity: 0.978108... + Inferred infection probability: 0.687213 ± 0.054126 - b = bm.new_vp("int", vals=[bm[v].a.argmax() for v in bm.vertices()]) +The figure below shows the reconstructed network and the inferred community structure. + +.. figure:: dolphins-posterior.* + :align: center + :width: 400px - graph_draw(gm, gm.own_property(g.vp.pos), vertex_shape="square", vertex_color="black", - vertex_fill_color=b, vertex_pen_width=1, - edge_pen_width=prop_to_size(gm.ep.eprob, 0, 5), eorder=gm.ep.eprob, output="dolphins") - - eprob = u.ep.eprob - print("Posterior probability of edge (11, 36):", eprob[u.edge(11, 36)]) - print("Posterior probability of non-edge (15, 73):", eprob[u.edge(15, 73)]) - print("Estimated average local clustering: %g ± %g" % (np.mean(cs), np.std(cs))) + Reconstructed network of associations between 62 dolphins, from the + dynamics of a SI epidemic model, using the degree-corrected SBM as a + latent prior. The edge thickness corresponds to the marginal + posterior probability of each edge, and the node pie charts to the + marginal posterior distribution of the node partition. diff --git a/doc/demos/inference/_sampling.rst b/doc/demos/inference/_sampling.rst index d8cbd4f8f38eb0635c6e362f9fa71d96ce789fbd..34f1da9b465e55ebfe2c6608c7739f1cca57d4be 100644 --- a/doc/demos/inference/_sampling.rst +++ b/doc/demos/inference/_sampling.rst @@ -169,7 +169,8 @@ i.e. the posterior probability that a node belongs to a given group: def collect_marginals(s): global pv - pv = s.collect_vertex_marginals(pv) + b = gt.perfect_prop_hash([s.b])[0] + pv = s.collect_vertex_marginals(pv, b=b) # Now we collect the marginals for exactly 100,000 sweeps, at # intervals of 10 sweeps: diff --git a/doc/gt_theme/static/gt_style.css b/doc/gt_theme/static/gt_style.css index a2bb9d6c17cef4d87635f00e7ff34d1c30bd8801..5d8639cbd75022d2ed354f1d963ec4c49160c7c5 100644 --- a/doc/gt_theme/static/gt_style.css +++ b/doc/gt_theme/static/gt_style.css @@ -53,6 +53,7 @@ table.docutils.figure { margin-left:auto; margin-right:auto; + display: table; } /* stupid workaround to hide ugly c++ signature stuff from sphinx*/ @@ -172,4 +173,17 @@ div.sphinxsidebar div.bodywrapper { margin: 0 0 0 310px; +} + +div.sidebar { + margin: 1em 0 0.5em 1em; + border: 1px solid #eee; + padding: 7px 7px 0 7px; + background-color: #eee; + width: 40%; + float: right; +} + +div.sidebar p { + margin: 0px 7px 7px 7px; } \ No newline at end of file diff --git a/doc/pyenv.py b/doc/pyenv.py index eb7ce43d6f484ee48fa9ca9569688dbd50f409c7..4109c424a74b193627bdefec759024554b277c3c 100644 --- a/doc/pyenv.py +++ b/doc/pyenv.py @@ -13,7 +13,7 @@ font_size=14 rcParams["backend"] = "PDF" rcParams["figure.figsize"] = (4, 3) rcParams["font.family"] = "Serif" -rcParams["font.serif"] = ["Times"] +#rcParams["font.serif"] = ["Times"] rcParams["font.size"] = font_size rcParams["axes.labelsize"] = font_size rcParams["xtick.labelsize"] = font_size - 2 @@ -36,7 +36,7 @@ rcParams["ps.usedistiller"] = "xpdf" rcParams["pdf.compression"] = 9 rcParams["ps.useafm"] = True rcParams["path.simplify"] = True -rcParams["text.latex.preamble"] = [r"\usepackage{times}", +rcParams["text.latex.preamble"] = [#r"\usepackage{times}", #r"\usepackage{euler}", r"\usepackage{amssymb}", r"\usepackage{amsmath}"] diff --git a/src/graph_tool/__init__.py b/src/graph_tool/__init__.py index e5a33f964c598debc1135d2ab0f3a4e54b3777d6..89a562d5e94626755e3a49071f989821f1d72773 100644 --- a/src/graph_tool/__init__.py +++ b/src/graph_tool/__init__.py @@ -2245,8 +2245,7 @@ class Graph(object): -------- >>> g = gt.collection.data["pgp-strong-2009"] >>> g.get_out_degrees([42, 666]) - array([20, 38]) - + array([20, 39], dtype=uint64) """ return libcore.get_degree_list(self.__graph, numpy.asarray(vs, dtype="uint64"), @@ -2261,8 +2260,7 @@ class Graph(object): -------- >>> g = gt.collection.data["pgp-strong-2009"] >>> g.get_in_degrees([42, 666]) - array([20, 39]) - + array([20, 38], dtype=uint64) """ return libcore.get_degree_list(self.__graph, numpy.asarray(vs, dtype="uint64"), diff --git a/src/graph_tool/centrality/__init__.py b/src/graph_tool/centrality/__init__.py index a3aa710c6057b0438ac50e36a7230dd5fb9a0d1e..10b46decc96041ad7ca77dfb32ab85e8eabc94b3 100644 --- a/src/graph_tool/centrality/__init__.py +++ b/src/graph_tool/centrality/__init__.py @@ -620,7 +620,7 @@ def eigenvector(g, weight=None, vprop=None, epsilon=1e-6, max_iter=None): >>> w.a = np.random.random(len(w.a)) * 42 >>> ee, x = gt.eigenvector(g, w) >>> print(ee) - 724.302745922... + 724.302745... >>> gt.graph_draw(g, pos=g.vp["pos"], vertex_fill_color=x, ... vertex_size=gt.prop_to_size(x, mi=5, ma=15), ... vcmap=matplotlib.cm.gist_heat, diff --git a/src/graph_tool/clustering/__init__.py b/src/graph_tool/clustering/__init__.py index 7d8333089c7bfd03127363984883a1bb63af7899..579cdfb253a7d9b29376fabab4ea612b33b0c10e 100644 --- a/src/graph_tool/clustering/__init__.py +++ b/src/graph_tool/clustering/__init__.py @@ -165,10 +165,10 @@ def global_clustering(g, weight=None): c = 3 \times \frac{\text{number of triangles}} {\text{number of connected triples}} - If weights are given, the following definition is used + If weights are given, the following definition is used: .. math:: - c = \frac{\operatorname{Tr}{{\boldsymbol A}^3}}{\sum_{i\ne j}[{\boldsymbol A}^2]_{ij}}, + c = \frac{\mathrm{Tr}{{\boldsymbol A}^3}}{\sum_{i\ne j}[{\boldsymbol A}^2]_{ij}}, where :math:\boldsymbol A is the weighted adjacency matrix, and it is assumed that the weights are normalized, i.e. :math:A_{ij} \le 1. diff --git a/src/graph_tool/inference/latent_multigraph.py b/src/graph_tool/inference/latent_multigraph.py index 0516e8913edec9d6aa6c19fda60f0ddb2d3108fe..fb9c0d5b5575300e7f9305d78837f2f3b732826a 100644 --- a/src/graph_tool/inference/latent_multigraph.py +++ b/src/graph_tool/inference/latent_multigraph.py @@ -25,27 +25,33 @@ dl_import("from . import libgraph_tool_inference as libinference") from numpy import sqrt -def latent_multigraph(g, epsilon=1e-8, max_niter=10000): - r""" +def latent_multigraph(g, epsilon=1e-8, max_niter=0): + r"""Infer latent Poisson multigraph model given an "erased" simple graph. + Parameters ---------- g : :class:~graph_tool.Graph Graph to be used. + epsilon : float (optional, default: 1e-8) + Convergence criterion. + max_niter : int (optional, default: 0) + Maximum number of iterations allowed (if 0, no maximum is assumed). Returns ------- - - Notes - ----- + u : :class:~graph_tool.Graph + Latent graph. + w : :class:~graph_tool.EdgePropertyMap` + Edge property map with inferred multiplicity parameter. Examples -------- - >>> g = gt.collection.data["football"] - >>> gt.modularity(g, g.vp.value_tsevans) - 0.5744393497... - - References - ---------- + >>> g = gt.collection.data["as-22july06"] + >>> gt.scalar_assortativity(g, "out") + (-0.198384..., 0.001338...) + >>> u, w = gt.latent_multigraph(g) + >>> scalar_assortativity(u, "out", eweight=w) + (-0.048426..., 0.034526...) """ g = g.copy() diff --git a/src/graph_tool/topology/__init__.py b/src/graph_tool/topology/__init__.py index e512f8936fcac6b1d4251418e26c810cb99b1c29..b8247b66fa860e34c1d16d25ad184f4ca97989e6 100644 --- a/src/graph_tool/topology/__init__.py +++ b/src/graph_tool/topology/__init__.py @@ -1394,7 +1394,7 @@ def vertex_percolation(g, vertices, second=False): Text(...) >>> ylabel("Size of largest component") Text(...) - >>> legend(loc="lower right") + >>> legend(loc="upper left") <...> >>> savefig("vertex-percolation.svg")