_reconstruction_dynamics.rst 6.29 KB
Newer Older
1
2
3
4
5
.. _reconstruction_dynamics:

Reconstruction from dynamics
++++++++++++++++++++++++++++

6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
In some cases direct measurements of the edges of a network are either
impossible to be done, or can be done only at significant experimental
cost. In such situations, we are required to infer the network of
interactions from the observed functional behavior
[peixoto-network-2019]_. In graph-tool this can be done for epidemic
spreading (via
:class:`~graph_tool.inference.uncertain_blockmodel.EpidemicsBlockState`),
for the kinetic Ising model (via
:class:`~graph_tool.inference.uncertain_blockmodel.IsingGlauberBlockState`
and
:class:`~graph_tool.inference.uncertain_blockmodel.CIsingGlauberBlockState`),
and for the equilibrium Ising model (via
:class:`~graph_tool.inference.uncertain_blockmodel.PseudoIsingBlockState`
and
:class:`~graph_tool.inference.uncertain_blockmodel.PseudoCIsingBlockState`).
We consider the general reconstruction framework outlined above, where
the observed data :math:`\mathcal{D}` in
Eq. :eq:`posterior-reconstruction` are a time-series generated by some
of the supported processes. Just like before, the posterior distribution
includes not only the adjacency matrix, but also the parameters of the
dynamical model and of the SBM that is used as a prior.
27

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
For example, in the case of a SIS epidemics, where :math:`\sigma_i(t)=1`
means node :math:`i` is infected at time :math:`t`, or :math:`0`
otherwise, the likelihood for a time-series :math:`\boldsymbol\sigma` is

.. math::

   P(\boldsymbol\sigma|\boldsymbol A,\boldsymbol\beta,\gamma)=\prod_t\prod_iP(\sigma_i(t)|\boldsymbol\sigma(t-1)),

where

.. math::

   P(\sigma_i(t)|\boldsymbol\sigma(t-1)) =
   f(e^{m_i(t-1)}, \sigma_i(t))^{1-\sigma_i(t-1)} \times f(\gamma,\sigma_i(t))^{\sigma_i(t-1)}

is the transition probability for node :math:`i` at time :math:`t`, with
:math:`f(p,\sigma) = (1-p)^{\sigma}p^{1-\sigma}`, and where

.. math::

   m_i(t) = \sum_jA_{ij}\ln(1-\beta_{ij})\sigma_j(t)

is the contribution from all
neighbors of node :math:`i` to its infection probability at time :math:`t`. In the
equations above the value :math:`\beta_{ij}` is the probability of an infection
via an existing edge :math:`(i,j)`, and :math:`\gamma` is the :math:`1\to 0`
recovery probability. With these additional parameters, the full
posterior distribution for the reconstruction becomes

.. math::

   P(\boldsymbol A,\boldsymbol b,\boldsymbol\beta|\boldsymbol\sigma) =
   \frac{P(\boldsymbol\sigma|\boldsymbol A,\boldsymbol b,\gamma)P(\boldsymbol A|\boldsymbol b)P(\boldsymbol b)P(\boldsymbol\beta)}{P(\boldsymbol\sigma|\gamma)}.

Since :math:`\beta_{ij}\in[0,1]` we use the uniform prior
:math:`P(\boldsymbol\beta)=1`. Note also that the recovery probably
:math:`\gamma` plays no role on the reconstruction algorithm, since its
term in the likelihood does not involve :math:`\boldsymbol A` (and
hence, gets cancelled out in the denominator
:math:`P(\boldsymbol\sigma|\gamma)=P(\gamma|\boldsymbol\sigma)P(\boldsymbol\sigma)/P(\gamma)`). This
means the above posterior only depends on the infection events
:math:`0\to 1`, and thus is also valid without any modifications to all
epidemic variants SI, SIR, SEIR, etc, since the infection events occur
with the same probability for all these models.

In the example below is shown how to perform reconstruction from an
epidemic process.
       
76
77
78
79
80
81
82
83
84
85
86
87
.. testsetup:: dynamics

   import os
   try:
      os.chdir("demos/inference")
   except FileNotFoundError:
       pass
   np.random.seed(42)
   gt.seed_rng(44)

.. testcode:: dynamics

88
89
   # We will first simulate the dynamics with a given network
   g = gt.collection.data["dolphins"] 
90

91
92
93
   # The algorithm accepts multiple independent time-series for the
   # reconstruction. We will generate 100 SI cascades starting from a
   # random node each time, and uniform infection probability 0.7.
94

95
96
97
98
99
100
101
102
103
104
105
   ss = [] 
   for i in range(100): 
       si_state = gt.SIState(g, beta=.7) 
       s = [si_state.get_state().copy()] 
       for j in range(10): 
           si_state.iterate_sync() 
           s.append(si_state.get_state().copy())
       # Each time series should be represented as a single vector-valued
       # vertex property map with the states for each note at each time.
       s = gt.group_vector_property(s) 
       ss.append(s)
106

107
108
109
110
   # Prepare the initial state of the reconstruction as an empty graph
   u = g.copy() 
   u.clear_edges() 
   ss = [u.own_property(s) for s in ss]   # time series properties need to be 'owned' by graph u 
111

112
113
   # Create reconstruction state
   rstate = gt.EpidemicsBlockState(u, s=ss, beta=None, r=1e-6, global_beta=.1,  
Tiago Peixoto's avatar
Tiago Peixoto committed
114
                                   nested=False, aE=g.num_edges()) 
115
116
117
118
 
   # Now we collect the marginals for exactly 100,000 sweeps, at 
   # intervals of 10 sweeps: 
 
119
120
   gm = None
   bm = None
121
122
123
124
125
126
   betas = []
 
   def collect_marginals(s): 
      global gm, bm 
      gm = s.collect_marginal(gm) 
      b = gt.perfect_prop_hash([s.bstate.b])[0] 
127
      bm = s.bstate.collect_vertex_marginals(bm, b=b)
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
      betas.append(s.params["global_beta"])
 
   gt.mcmc_equilibrate(rstate, force_niter=10000, mcmc_args=dict(niter=10, xstep=0), 
                       callback=collect_marginals) 

   print("Posterior similarity: ", gt.similarity(g, gm, g.new_ep("double", 1), gm.ep.eprob))
   print("Inferred infection probability: %g ± %g" % (mean(betas), std(betas)))
   
   gt.graph_draw(gm, gm.own_property(g.vp.pos), vertex_shape="pie", vertex_color="black",
                 vertex_pie_fractions=gm.own_property(bm), vertex_pen_width=1,
                 edge_pen_width=gt.prop_to_size(gm.ep.eprob, 0, 5),
                 eorder=gm.ep.eprob, output="dolphins-posterior.svg")


The reconstruction can accurately recover the hidden network and the infection probability:
                 
.. testoutput:: dynamics
145

Tiago Peixoto's avatar
Tiago Peixoto committed
146
147
   Posterior similarity:  0.990587...
   Inferred infection probability: 0.692324 ± 0.0496223
148

149
150
151
152
153
The figure below shows the reconstructed network and the inferred community structure.   
                 
.. figure:: dolphins-posterior.*
   :align: center
   :width: 400px
154

155
156
157
158
159
   Reconstructed network of associations between 62 dolphins, from the
   dynamics of a SI epidemic model, using the degree-corrected SBM as a
   latent prior. The edge thickness corresponds to the marginal
   posterior probability of each edge, and the node pie charts to the
   marginal posterior distribution of the node partition.