Commit 265e9635 authored by Tiago Peixoto's avatar Tiago Peixoto

Documentation improvements and fixes

parent be6f450d
...@@ -161,10 +161,10 @@ figure. ...@@ -161,10 +161,10 @@ figure.
.. note:: .. note::
We emphasize that no constraints are imposed on what `kind` of With the SBM no constraints are imposed on what `kind` of modular
modular structure is allowed, as the matrix of edge counts :math:`e` structure is allowed, as the matrix of edge counts :math:`e` is
is unconstrained. Hence, we can detect the putatively typical pattern unconstrained. Hence, we can detect the putatively typical pattern of
of `"community structure" `"community structure"
<https://en.wikipedia.org/wiki/Community_structure>`_, i.e. when <https://en.wikipedia.org/wiki/Community_structure>`_, i.e. when
nodes are connected mostly to other nodes of the same group, if it nodes are connected mostly to other nodes of the same group, if it
happens to be the most likely network description, but we can also happens to be the most likely network description, but we can also
......
...@@ -13,6 +13,7 @@ the observed data, the network can be reconstructed according to the ...@@ -13,6 +13,7 @@ the observed data, the network can be reconstructed according to the
posterior distribution, posterior distribution,
.. math:: .. math::
:label: posterior-reconstruction
P(\boldsymbol A, \boldsymbol b | \boldsymbol{\mathcal{D}}) = P(\boldsymbol A, \boldsymbol b | \boldsymbol{\mathcal{D}}) =
\frac{P(\boldsymbol{\mathcal{D}} | \boldsymbol A)P(\boldsymbol A, \boldsymbol b)}{P(\boldsymbol{\mathcal{D}})} \frac{P(\boldsymbol{\mathcal{D}} | \boldsymbol A)P(\boldsymbol A, \boldsymbol b)}{P(\boldsymbol{\mathcal{D}})}
...@@ -111,14 +112,14 @@ In this situation the priors :math:`P(p|\alpha=1,\beta=1)` and ...@@ -111,14 +112,14 @@ In this situation the priors :math:`P(p|\alpha=1,\beta=1)` and
.. note:: .. note::
It is important to emphasize that since this approach makes use of Since this approach also makes use of the *correlations* between
the *correlations* between edges to inform the reconstruction, as edges to inform the reconstruction, as described by the inferred SBM,
described by the inferred SBM, this means it can also be used when this means it can also be used when only single measurements have
only single measurements have been performed, :math:`n_{ij}=1`, and been performed, :math:`n_{ij}=1`, and the error magnitudes :math:`p`
the error magnitudes :math:`p` and :math:`q` are unknown. Since every and :math:`q` are unknown. Since every arbitrary adjacency matrix can
arbitrary adjacency matrix can be cast in this setting, this method be cast in this setting, this method can be used to reconstruct
can be used to reconstruct networks for which no error assessments of networks for which no error assessments of any kind have been
any kind have been provided. provided.
Below, we illustrate how the reconstruction can be performed with a Below, we illustrate how the reconstruction can be performed with a
simple example, using simple example, using
...@@ -173,7 +174,8 @@ simple example, using ...@@ -173,7 +174,8 @@ simple example, using
global pv, u, cs global pv, u, cs
u = s.collect_marginal(u) u = s.collect_marginal(u)
bstate = s.get_block_state() bstate = s.get_block_state()
pv = bstate.levels[0].collect_vertex_marginals(pv) b = gt.perfect_prop_hash([bstate.levels[0].b])[0]
pv = bstate.levels[0].collect_vertex_marginals(pv, b=b)
cs.append(gt.local_clustering(s.get_graph()).fa.mean()) cs.append(gt.local_clustering(s.get_graph()).fa.mean())
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
...@@ -310,9 +312,9 @@ Which yields: ...@@ -310,9 +312,9 @@ Which yields:
.. testoutput:: measured .. testoutput:: measured
Posterior probability of edge (11, 36): 0.515651... Posterior probability of edge (11, 36): 0.967896...
Posterior probability of non-edge (15, 73): 0.009000... Posterior probability of non-edge (15, 73): 0.038703...
Estimated average local clustering: 0.571673 ± 0.003228... Estimated average local clustering: 0.572129 ± 0.005409...
The results are very similar to the ones obtained with the uniform model The results are very similar to the ones obtained with the uniform model
in this case, but can be quite different in situations where a large in this case, but can be quite different in situations where a large
...@@ -423,7 +425,8 @@ inference: ...@@ -423,7 +425,8 @@ inference:
global pv, u, cs global pv, u, cs
u = s.collect_marginal(u) u = s.collect_marginal(u)
bstate = s.get_block_state() bstate = s.get_block_state()
pv = bstate.levels[0].collect_vertex_marginals(pv) b = gt.perfect_prop_hash([bstate.levels[0].b])[0]
pv = bstate.levels[0].collect_vertex_marginals(pv, b=b)
cs.append(gt.local_clustering(s.get_graph()).fa.mean()) cs.append(gt.local_clustering(s.get_graph()).fa.mean())
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10), gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
......
...@@ -3,7 +3,76 @@ ...@@ -3,7 +3,76 @@
Reconstruction from dynamics Reconstruction from dynamics
++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++
In some cases direct measurements of the edges of a network are either
impossible to be done, or can be done only at significant experimental
cost. In such situations, we are required to infer the network of
interactions from the observed functional behavior
[peixoto-network-2019]_. In graph-tool this can be done for epidemic
spreading (via
:class:`~graph_tool.inference.uncertain_blockmodel.EpidemicsBlockState`),
for the kinetic Ising model (via
:class:`~graph_tool.inference.uncertain_blockmodel.IsingGlauberBlockState`
and
:class:`~graph_tool.inference.uncertain_blockmodel.CIsingGlauberBlockState`),
and for the equilibrium Ising model (via
:class:`~graph_tool.inference.uncertain_blockmodel.PseudoIsingBlockState`
and
:class:`~graph_tool.inference.uncertain_blockmodel.PseudoCIsingBlockState`).
We consider the general reconstruction framework outlined above, where
the observed data :math:`\mathcal{D}` in
Eq. :eq:`posterior-reconstruction` are a time-series generated by some
of the supported processes. Just like before, the posterior distribution
includes not only the adjacency matrix, but also the parameters of the
dynamical model and of the SBM that is used as a prior.
For example, in the case of a SIS epidemics, where :math:`\sigma_i(t)=1`
means node :math:`i` is infected at time :math:`t`, or :math:`0`
otherwise, the likelihood for a time-series :math:`\boldsymbol\sigma` is
.. math::
P(\boldsymbol\sigma|\boldsymbol A,\boldsymbol\beta,\gamma)=\prod_t\prod_iP(\sigma_i(t)|\boldsymbol\sigma(t-1)),
where
.. math::
P(\sigma_i(t)|\boldsymbol\sigma(t-1)) =
f(e^{m_i(t-1)}, \sigma_i(t))^{1-\sigma_i(t-1)} \times f(\gamma,\sigma_i(t))^{\sigma_i(t-1)}
is the transition probability for node :math:`i` at time :math:`t`, with
:math:`f(p,\sigma) = (1-p)^{\sigma}p^{1-\sigma}`, and where
.. math::
m_i(t) = \sum_jA_{ij}\ln(1-\beta_{ij})\sigma_j(t)
is the contribution from all
neighbors of node :math:`i` to its infection probability at time :math:`t`. In the
equations above the value :math:`\beta_{ij}` is the probability of an infection
via an existing edge :math:`(i,j)`, and :math:`\gamma` is the :math:`1\to 0`
recovery probability. With these additional parameters, the full
posterior distribution for the reconstruction becomes
.. math::
P(\boldsymbol A,\boldsymbol b,\boldsymbol\beta|\boldsymbol\sigma) =
\frac{P(\boldsymbol\sigma|\boldsymbol A,\boldsymbol b,\gamma)P(\boldsymbol A|\boldsymbol b)P(\boldsymbol b)P(\boldsymbol\beta)}{P(\boldsymbol\sigma|\gamma)}.
Since :math:`\beta_{ij}\in[0,1]` we use the uniform prior
:math:`P(\boldsymbol\beta)=1`. Note also that the recovery probably
:math:`\gamma` plays no role on the reconstruction algorithm, since its
term in the likelihood does not involve :math:`\boldsymbol A` (and
hence, gets cancelled out in the denominator
:math:`P(\boldsymbol\sigma|\gamma)=P(\gamma|\boldsymbol\sigma)P(\boldsymbol\sigma)/P(\gamma)`). This
means the above posterior only depends on the infection events
:math:`0\to 1`, and thus is also valid without any modifications to all
epidemic variants SI, SIR, SEIR, etc, since the infection events occur
with the same probability for all these models.
In the example below is shown how to perform reconstruction from an
epidemic process.
.. testsetup:: dynamics .. testsetup:: dynamics
import os import os
...@@ -16,47 +85,75 @@ Reconstruction from dynamics ...@@ -16,47 +85,75 @@ Reconstruction from dynamics
.. testcode:: dynamics .. testcode:: dynamics
g = gt.collection.data["dolphins"] # We will first simulate the dynamics with a given network
g = gt.collection.data["dolphins"]
ss = []
for i in range(1000):
si_state = gt.SIState(g, beta=1)
s = []
for j in range(10):
si_state.iterate_sync()
s.append(si_state.get_state().copy())
s = gt.group_vector_property(s)
ss.append(s)
u = g.copy() # The algorithm accepts multiple independent time-series for the
u.clear_edges() # reconstruction. We will generate 100 SI cascades starting from a
ss = [u.own_property(s) for s in ss] # random node each time, and uniform infection probability 0.7.
rstate = gt.EpidemicsBlockState(u, s=ss, beta=None, r=1e-6, global_beta=.99, ss = []
state_args=dict(B=1), nested=False) for i in range(100):
si_state = gt.SIState(g, beta=.7)
s = [si_state.get_state().copy()]
for j in range(10):
si_state.iterate_sync()
s.append(si_state.get_state().copy())
# Each time series should be represented as a single vector-valued
# vertex property map with the states for each note at each time.
s = gt.group_vector_property(s)
ss.append(s)
# Now we collect the marginals for exactly 100,000 sweeps, at # Prepare the initial state of the reconstruction as an empty graph
# intervals of 10 sweeps: u = g.copy()
u.clear_edges()
ss = [u.own_property(s) for s in ss] # time series properties need to be 'owned' by graph u
# Create reconstruction state
rstate = gt.EpidemicsBlockState(u, s=ss, beta=None, r=1e-6, global_beta=.1,
state_args=dict(B=1), nested=False, aE=g.num_edges())
# Now we collect the marginals for exactly 100,000 sweeps, at
# intervals of 10 sweeps:
gm = None gm = None
bm = None bm = None
betas = []
def collect_marginals(s):
global gm, bm def collect_marginals(s):
gm = s.collect_marginal(gm) global gm, bm
b = gt.perfect_prop_hash([s.bstate.b])[0] gm = s.collect_marginal(gm)
b = gt.perfect_prop_hash([s.bstate.b])[0]
bm = s.bstate.collect_vertex_marginals(bm, b=b) bm = s.bstate.collect_vertex_marginals(bm, b=b)
betas.append(s.params["global_beta"])
gt.mcmc_equilibrate(rstate, force_niter=10000, mcmc_args=dict(niter=10, xstep=0),
callback=collect_marginals)
print("Posterior similarity: ", gt.similarity(g, gm, g.new_ep("double", 1), gm.ep.eprob))
print("Inferred infection probability: %g ± %g" % (mean(betas), std(betas)))
gt.graph_draw(gm, gm.own_property(g.vp.pos), vertex_shape="pie", vertex_color="black",
vertex_pie_fractions=gm.own_property(bm), vertex_pen_width=1,
edge_pen_width=gt.prop_to_size(gm.ep.eprob, 0, 5),
eorder=gm.ep.eprob, output="dolphins-posterior.svg")
The reconstruction can accurately recover the hidden network and the infection probability:
.. testoutput:: dynamics
gt.mcmc_equilibrate(rstate, force_niter=10000, mcmc_args=dict(niter=10, xstep=0, p=0, h=0), Posterior similarity: 0.978108...
callback=collect_marginals) Inferred infection probability: 0.687213 ± 0.054126
b = bm.new_vp("int", vals=[bm[v].a.argmax() for v in bm.vertices()]) The figure below shows the reconstructed network and the inferred community structure.
.. figure:: dolphins-posterior.*
:align: center
:width: 400px
graph_draw(gm, gm.own_property(g.vp.pos), vertex_shape="square", vertex_color="black", Reconstructed network of associations between 62 dolphins, from the
vertex_fill_color=b, vertex_pen_width=1, dynamics of a SI epidemic model, using the degree-corrected SBM as a
edge_pen_width=prop_to_size(gm.ep.eprob, 0, 5), eorder=gm.ep.eprob, output="dolphins") latent prior. The edge thickness corresponds to the marginal
posterior probability of each edge, and the node pie charts to the
eprob = u.ep.eprob marginal posterior distribution of the node partition.
print("Posterior probability of edge (11, 36):", eprob[u.edge(11, 36)])
print("Posterior probability of non-edge (15, 73):", eprob[u.edge(15, 73)])
print("Estimated average local clustering: %g ± %g" % (np.mean(cs), np.std(cs)))
...@@ -169,7 +169,8 @@ i.e. the posterior probability that a node belongs to a given group: ...@@ -169,7 +169,8 @@ i.e. the posterior probability that a node belongs to a given group:
def collect_marginals(s): def collect_marginals(s):
global pv global pv
pv = s.collect_vertex_marginals(pv) b = gt.perfect_prop_hash([s.b])[0]
pv = s.collect_vertex_marginals(pv, b=b)
# Now we collect the marginals for exactly 100,000 sweeps, at # Now we collect the marginals for exactly 100,000 sweeps, at
# intervals of 10 sweeps: # intervals of 10 sweeps:
......
...@@ -53,6 +53,7 @@ table.docutils.figure ...@@ -53,6 +53,7 @@ table.docutils.figure
{ {
margin-left:auto; margin-left:auto;
margin-right:auto; margin-right:auto;
display: table;
} }
/* stupid workaround to hide ugly c++ signature stuff from sphinx*/ /* stupid workaround to hide ugly c++ signature stuff from sphinx*/
...@@ -172,4 +173,17 @@ div.sphinxsidebar ...@@ -172,4 +173,17 @@ div.sphinxsidebar
div.bodywrapper { div.bodywrapper {
margin: 0 0 0 310px; margin: 0 0 0 310px;
}
div.sidebar {
margin: 1em 0 0.5em 1em;
border: 1px solid #eee;
padding: 7px 7px 0 7px;
background-color: #eee;
width: 40%;
float: right;
}
div.sidebar p {
margin: 0px 7px 7px 7px;
} }
\ No newline at end of file
...@@ -13,7 +13,7 @@ font_size=14 ...@@ -13,7 +13,7 @@ font_size=14
rcParams["backend"] = "PDF" rcParams["backend"] = "PDF"
rcParams["figure.figsize"] = (4, 3) rcParams["figure.figsize"] = (4, 3)
rcParams["font.family"] = "Serif" rcParams["font.family"] = "Serif"
rcParams["font.serif"] = ["Times"] #rcParams["font.serif"] = ["Times"]
rcParams["font.size"] = font_size rcParams["font.size"] = font_size
rcParams["axes.labelsize"] = font_size rcParams["axes.labelsize"] = font_size
rcParams["xtick.labelsize"] = font_size - 2 rcParams["xtick.labelsize"] = font_size - 2
...@@ -36,7 +36,7 @@ rcParams["ps.usedistiller"] = "xpdf" ...@@ -36,7 +36,7 @@ rcParams["ps.usedistiller"] = "xpdf"
rcParams["pdf.compression"] = 9 rcParams["pdf.compression"] = 9
rcParams["ps.useafm"] = True rcParams["ps.useafm"] = True
rcParams["path.simplify"] = True rcParams["path.simplify"] = True
rcParams["text.latex.preamble"] = [r"\usepackage{times}", rcParams["text.latex.preamble"] = [#r"\usepackage{times}",
#r"\usepackage{euler}", #r"\usepackage{euler}",
r"\usepackage{amssymb}", r"\usepackage{amssymb}",
r"\usepackage{amsmath}"] r"\usepackage{amsmath}"]
......
...@@ -2245,8 +2245,7 @@ class Graph(object): ...@@ -2245,8 +2245,7 @@ class Graph(object):
-------- --------
>>> g = gt.collection.data["pgp-strong-2009"] >>> g = gt.collection.data["pgp-strong-2009"]
>>> g.get_out_degrees([42, 666]) >>> g.get_out_degrees([42, 666])
array([20, 38]) array([20, 39], dtype=uint64)
""" """
return libcore.get_degree_list(self.__graph, return libcore.get_degree_list(self.__graph,
numpy.asarray(vs, dtype="uint64"), numpy.asarray(vs, dtype="uint64"),
...@@ -2261,8 +2260,7 @@ class Graph(object): ...@@ -2261,8 +2260,7 @@ class Graph(object):
-------- --------
>>> g = gt.collection.data["pgp-strong-2009"] >>> g = gt.collection.data["pgp-strong-2009"]
>>> g.get_in_degrees([42, 666]) >>> g.get_in_degrees([42, 666])
array([20, 39]) array([20, 38], dtype=uint64)
""" """
return libcore.get_degree_list(self.__graph, return libcore.get_degree_list(self.__graph,
numpy.asarray(vs, dtype="uint64"), numpy.asarray(vs, dtype="uint64"),
......
...@@ -620,7 +620,7 @@ def eigenvector(g, weight=None, vprop=None, epsilon=1e-6, max_iter=None): ...@@ -620,7 +620,7 @@ def eigenvector(g, weight=None, vprop=None, epsilon=1e-6, max_iter=None):
>>> w.a = np.random.random(len(w.a)) * 42 >>> w.a = np.random.random(len(w.a)) * 42
>>> ee, x = gt.eigenvector(g, w) >>> ee, x = gt.eigenvector(g, w)
>>> print(ee) >>> print(ee)
724.302745922... 724.302745...
>>> gt.graph_draw(g, pos=g.vp["pos"], vertex_fill_color=x, >>> gt.graph_draw(g, pos=g.vp["pos"], vertex_fill_color=x,
... vertex_size=gt.prop_to_size(x, mi=5, ma=15), ... vertex_size=gt.prop_to_size(x, mi=5, ma=15),
... vcmap=matplotlib.cm.gist_heat, ... vcmap=matplotlib.cm.gist_heat,
......
...@@ -165,10 +165,10 @@ def global_clustering(g, weight=None): ...@@ -165,10 +165,10 @@ def global_clustering(g, weight=None):
c = 3 \times \frac{\text{number of triangles}} c = 3 \times \frac{\text{number of triangles}}
{\text{number of connected triples}} {\text{number of connected triples}}
If weights are given, the following definition is used If weights are given, the following definition is used:
.. math:: .. math::
c = \frac{\operatorname{Tr}{{\boldsymbol A}^3}}{\sum_{i\ne j}[{\boldsymbol A}^2]_{ij}}, c = \frac{\mathrm{Tr}{{\boldsymbol A}^3}}{\sum_{i\ne j}[{\boldsymbol A}^2]_{ij}},
where :math:`\boldsymbol A` is the weighted adjacency matrix, and it is where :math:`\boldsymbol A` is the weighted adjacency matrix, and it is
assumed that the weights are normalized, i.e. :math:`A_{ij} \le 1`. assumed that the weights are normalized, i.e. :math:`A_{ij} \le 1`.
......
...@@ -25,27 +25,33 @@ dl_import("from . import libgraph_tool_inference as libinference") ...@@ -25,27 +25,33 @@ dl_import("from . import libgraph_tool_inference as libinference")
from numpy import sqrt from numpy import sqrt
def latent_multigraph(g, epsilon=1e-8, max_niter=10000): def latent_multigraph(g, epsilon=1e-8, max_niter=0):
r""" r"""Infer latent Poisson multigraph model given an "erased" simple graph.
Parameters Parameters
---------- ----------
g : :class:`~graph_tool.Graph` g : :class:`~graph_tool.Graph`
Graph to be used. Graph to be used.
epsilon : ``float`` (optional, default: ``1e-8``)
Convergence criterion.
max_niter : ``int`` (optional, default: ``0``)
Maximum number of iterations allowed (if ``0``, no maximum is assumed).
Returns Returns
------- -------
u : :class:`~graph_tool.Graph`
Notes Latent graph.
----- w : :class:`~graph_tool.EdgePropertyMap`
Edge property map with inferred multiplicity parameter.
Examples Examples
-------- --------
>>> g = gt.collection.data["football"] >>> g = gt.collection.data["as-22july06"]
>>> gt.modularity(g, g.vp.value_tsevans) >>> gt.scalar_assortativity(g, "out")
0.5744393497... (-0.198384..., 0.001338...)
>>> u, w = gt.latent_multigraph(g)
References >>> scalar_assortativity(u, "out", eweight=w)
---------- (-0.048426..., 0.034526...)
""" """
g = g.copy() g = g.copy()
......
...@@ -1394,7 +1394,7 @@ def vertex_percolation(g, vertices, second=False): ...@@ -1394,7 +1394,7 @@ def vertex_percolation(g, vertices, second=False):
Text(...) Text(...)
>>> ylabel("Size of largest component") >>> ylabel("Size of largest component")
Text(...) Text(...)
>>> legend(loc="lower right") >>> legend(loc="upper left")
<...> <...>
>>> savefig("vertex-percolation.svg") >>> savefig("vertex-percolation.svg")
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment