inference.rst 54.3 KB
Newer Older
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
one, and is generally considered to be superior. However, formally, it
depends on the graph being sufficiently locally "tree-like", and the
posterior being indeed strongly correlated with the adjacency matrix
itself --- two characteristics which do not hold in general. Although
the approximation often gives reasonable results even when these
conditions do not strictly hold, in some situations when they are
strongly violated this approach can yield meaningless values, such as a
negative entropy. Therefore, it is useful to compare both approaches
whenever possible.

With these approximations, it possible to estimate the full model
evidence efficiently, as we show below, using
:meth:`~graph_tool.inference.BlockState.collect_vertex_marginals`,
:meth:`~graph_tool.inference.BlockState.collect_edge_marginals`,
:meth:`~graph_tool.inference.mf_entropy` and
:meth:`~graph_tool.inference.bethe_entropy`.

.. testcode:: model-evidence

   g = gt.collection.data["lesmis"]

   for deg_corr in [True, False]:
       state = gt.minimize_blockmodel_dl(g, deg_corr=deg_corr)     # Initialize the Markov
                                                                   # chain from the "ground
                                                                   # state"
       state = state.copy(B=g.num_vertices())

       dls = []         # description length history
       vm = None        # vertex marginals
       em = None        # edge marginals

       def collect_marginals(s):
           global vm, em
           vm = s.collect_vertex_marginals(vm)
           em = s.collect_edge_marginals(em)
           dls.append(s.entropy())

Tiago Peixoto's avatar
Tiago Peixoto committed
1038
1039
       # Now we collect the marginal distributions for exactly 200,000 sweeps
       gt.mcmc_equilibrate(state, force_niter=20000, mcmc_args=dict(niter=10),
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
                           callback=collect_marginals)

       S_mf = gt.mf_entropy(g, vm)
       S_bethe = gt.bethe_entropy(g, em)[0]
       L = -mean(dls)

       print("Model evidence for deg_corr = %s:" % deg_corr,
             L + S_mf, "(mean field),", L + S_bethe, "(Bethe)")

.. testoutput:: model-evidence

1051
1052
   Model evidence for deg_corr = True: -568.863857217 (mean field), -816.50712153 (Bethe)
   Model evidence for deg_corr = False: -589.634482135 (mean field), -734.454531863 (Bethe)
1053

Tiago Peixoto's avatar
Tiago Peixoto committed
1054
1055
If we consider the more accurate approximation, the outcome shows a
preference for the non-degree-corrected model.
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101

When using the nested model, the approach is entirely analogous. The
only difference now is that we have a hierarchical partition
:math:`\{\boldsymbol b_l\}` in the equations above, instead of simply
:math:`\boldsymbol b`. In order to make the approach tractable, we
assume the factorization

.. math::

  q(\{\boldsymbol b_l\}) \approx \prod_lq_l(\boldsymbol b_l)

where :math:`q_l(\boldsymbol b_l)` is the marginal posterior for the
partition at level :math:`l`. For :math:`q_0(\boldsymbol b_0)` we may
use again either the mean-field or Bethe approximations, however for
:math:`l>0` only the mean-field approximation is applicable, since the
adjacency matrix of the higher layers is not constant. We show below the
approach for the same network, using the nested model.


.. testcode:: model-evidence

   g = gt.collection.data["lesmis"]

   L = 10

   for deg_corr in [True, False]:
       state = gt.minimize_nested_blockmodel_dl(g, deg_corr=deg_corr)     # Initialize the Markov
                                                                          # chain from the "ground
                                                                          # state"
       bs = state.get_bs()                     # Get hierarchical partition.
       bs += [np.zeros(1)] * (L - len(bs))     # Augment it to L = 10 with
                                               # single-group levels.

       state = state.copy(bs=bs, sampling=True)

       dls = []                               # description length history
       vm = [None] * len(state.get_levels())  # vertex marginals
       em = None                              # edge marginals

       def collect_marginals(s):
           global vm, em
           levels = s.get_levels()
           vm = [sl.collect_vertex_marginals(vm[l]) for l, sl in enumerate(levels)]
           em = levels[0].collect_edge_marginals(em)
           dls.append(s.entropy())

Tiago Peixoto's avatar
Tiago Peixoto committed
1102
1103
       # Now we collect the marginal distributions for exactly 200,000 sweeps
       gt.mcmc_equilibrate(state, force_niter=20000, mcmc_args=dict(niter=10),
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
                           callback=collect_marginals)

       S_mf = [gt.mf_entropy(sl.g, vm[l]) for l, sl in enumerate(state.get_levels())]
       S_bethe = gt.bethe_entropy(g, em)[0]
       L = -mean(dls)

       print("Model evidence for deg_corr = %s:" % deg_corr,
             L + sum(S_mf), "(mean field),", L + S_bethe + sum(S_mf[1:]), "(Bethe)")


.. testoutput:: model-evidence

1116
1117
   Model evidence for deg_corr = True: -339.011500645 (mean field), -665.93103635 (Bethe)
   Model evidence for deg_corr = False: -395.273662985 (mean field), -562.873911796 (Bethe)
1118

Tiago Peixoto's avatar
Tiago Peixoto committed
1119
1120
1121
1122
1123
1124
The results are similar: If we consider the most accurate approximation,
the non-degree-corrected model possesses the largest evidence. Note also
that we observe a better evidence for the nested models themselves, when
comparing to the evidences for the non-nested model --- which is not
quite surprising, since the non-nested model is a special case of the
nested one.
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182

Edge layers and covariates
--------------------------

In many situations, the edges of the network may posses discrete
covariates on them, or they may be distributed in discrete
"layers". Extensions to the SBM may be defined for such data, and they
can be inferred using the exact same interface shown above, except one
should use the :class:`~graph_tool.inference.LayeredBlockState` class,
instead of :class:`~graph_tool.inference.BlockState`. This class takes
two additional parameters: the ``ec`` parameter, that must correspond to
an edge :class:`~graph_tool.PropertyMap` with the layer/covariate
values on the edges, and the Boolean ``layers`` parameter, which if
``True`` specifies a layered model, otherwise one with edge covariates.

If we use :func:`~graph_tool.inference.minimize_blockmodel_dl`, this can
be achieved simply by passing the option ``layers=True`` as well as the
appropriate value of ``state_args``, which will be propagated to
:class:`~graph_tool.inference.LayeredBlockState`'s constructor.

For example, consider again the Les Misérables network, where we
consider the number of co-appearances between characters as edge
covariates.

.. testsetup:: layered-model

   import os
   try:
       os.chdir("demos/inference")
   except FileNotFoundError:
       pass

.. testcode:: layered-model

   g = gt.collection.data["lesmis"]

   # Note the different meaning of the two 'layers' parameters below: The
   # first enables the use of LayeredBlockState, and the second selects
   # the 'edge covariates' version.

   state = gt.minimize_blockmodel_dl(g, deg_corr=False, layers=True,
                                     state_args=dict(ec=g.ep.value, layers=False))

   state.draw(pos=g.vp.pos, edge_color=g.ep.value, edge_gradient=None,
              output="lesmis-sbm-edge-cov.svg")

.. figure:: lesmis-sbm-edge-cov.*
   :align: center
   :width: 350px

   Best fit of the non-degree-corrected SBM with edge covariates for the
   network of characters in the novel Les Misérables, using the number
   of co-appearances as edge covariates. The edge colors correspond to
   the edge covariates.


In the case of the nested model, we still should use the
:class:`~graph_tool.inference.NestedBlockState` class, but it must be
Tiago Peixoto's avatar
Tiago Peixoto committed
1183
initialized with the parameter ``base_type = LayeredBlockState``. But if
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
we use :func:`~graph_tool.inference.minimize_nested_blockmodel_dl`, it
works identically to the above:

.. testcode:: layered-model

   state = gt.minimize_nested_blockmodel_dl(g, deg_corr=False, layers=True,
                                            state_args=dict(ec=g.ep.value, layers=False))

   state.draw(eprops=dict(color=g.ep.value, gradient=None),
              output="lesmis-nested-sbm-edge-cov.svg")

.. figure:: lesmis-nested-sbm-edge-cov.*
   :align: center
   :width: 350px

   Best fit of the nested non-degree-corrected SBM with edge covariates
   for the network of characters in the novel Les Misérables, using the
   number of co-appearances as edge covariates. The edge colors
   correspond to the edge covariates.

It is possible to perform model averaging of all layered variants
exactly like for the regular SBMs as was shown above.

Predicting spurious and missing edges
-------------------------------------

An important application of generative models is to be able to
generalize from observations and make predictions that go beyond what
is seen in the data. This is particularly useful when the network we
observe is incomplete, or contains errors, i.e. some of the edges are
either missing or are outcomes of mistakes in measurement. In this
situation, the fit we make of the observed network can help us
predict missing or spurious edges in the network
[clauset-hierarchical-2008]_ [guimera-missing-2009]_.

1219
1220
We do so by dividing the edges into two sets :math:`\boldsymbol G` and :math:`\delta
\boldsymbol G`, where the former corresponds to the observed network and the latter
1221
either to the missing or spurious edges. In the case of missing edges,
1222
we may compute the posterior of :math:`\delta \boldsymbol G` as
1223
1224
1225
1226

.. math::
   :label: posterior-missing

1227
   P(\delta \boldsymbol G | \boldsymbol G) = \frac{\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)}{P_{\delta}(\boldsymbol G)}
1228
1229
1230
1231
1232

where

.. math::

1233
   P_{\delta}(\boldsymbol G) = \sum_{\delta \boldsymbol G}\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)
1234

1235
1236
1237
is a normalization constant. Although the value of :math:`P_{\delta}(\boldsymbol G)` is
difficult to obtain in general (since we need to perform a sum over all
possible spurious/missing edges), the numerator of
1238
1239
1240
Eq. :eq:`posterior-missing` can be computed by sampling partitions from
the posterior, and then inserting or deleting edges from the graph and
computing the new likelihood. This means that we can easily compare
1241
alternative predictive hypotheses :math:`\{\delta \boldsymbol G_i\}` via their
1242
1243
1244
1245
likelihood ratios

.. math::

1246
1247
1248
   \lambda_i = \frac{P(\delta \boldsymbol G_i | \boldsymbol G)}{\sum_j P(\delta \boldsymbol G_j | \boldsymbol G)}
             = \frac{\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G_i | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)}
                    {\sum_j \sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G_j | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)}
1249

1250
which do not depend on the value of :math:`P_{\delta}(\boldsymbol G)`.
1251

1252
1253
The values :math:`P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b)`
can be computed with
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
:meth:`~graph_tool.inference.BlockState.get_edges_prob`. Hence, we can
compute spurious/missing edge probabilities just as if we were
collecting marginal distributions when doing model averaging.

Below is an example for predicting the two following edges in the
football network, using the nested model (for which we need to replace
:math:`\boldsymbol b` by :math:`\{\boldsymbol b_l\}` in the equations
above).

.. testcode:: missing-edges
   :hide:

   g = gt.collection.data["football"].copy()
   color = g.new_vp("string", val="#cccccc")
   ecolor = g.new_ep("string", val="#cccccc")
   e = g.add_edge(101, 102)
   ecolor[e] = "#a40000"
   e = g.add_edge(17, 56)
   ecolor[e] = "#a40000"
   eorder = g.edge_index.copy("int")

   gt.graph_draw(g, pos=g.vp.pos, vertex_color=color,
                 vertex_fill_color=color, edge_color=ecolor,
                 eorder=eorder,
                 output="football_missing.svg")

.. figure:: football_missing.*
   :align: center
   :width: 350px

   Two non-existing edges in the football network (in red):
   :math:`(101,102)` in the middle, and :math:`(17,56)` in the upper
   right region of the figure.

.. testcode:: missing-edges

   g = gt.collection.data["football"]

   missing_edges = [(101, 102), (17, 56)]
   
   L = 10

   state = gt.minimize_nested_blockmodel_dl(g, deg_corr=True)

   bs = state.get_bs()                     # Get hierarchical partition.
   bs += [np.zeros(1)] * (L - len(bs))     # Augment it to L = 10 with
                                           # single-group levels.

   state = state.copy(bs=bs, sampling=True)

   probs = ([], [])

   def collect_edge_probs(s):
1307
1308
       p1 = s.get_edges_prob([missing_edges[0]], entropy_args=dict(partition_dl=False))
       p2 = s.get_edges_prob([missing_edges[1]], entropy_args=dict(partition_dl=False))
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
       probs[0].append(p1)
       probs[1].append(p2)

   # Now we collect the probabilities for exactly 10,000 sweeps
   gt.mcmc_equilibrate(state, force_niter=1000, mcmc_args=dict(niter=10),
                       callback=collect_edge_probs)


   def get_avg(p):
      p = np.array(p)
      pmax = p.max()
      p -= pmax
      return pmax + log(exp(p).mean())

   p1 = get_avg(probs[0])
   p2 = get_avg(probs[1])

   p_sum = get_avg([p1, p2]) + log(2)
   
   l1 = p1 - p_sum
   l2 = p2 - p_sum

   print("likelihood-ratio for %s: %g" % (missing_edges[0], exp(l1)))
   print("likelihood-ratio for %s: %g" % (missing_edges[1], exp(l2)))


.. testoutput:: missing-edges

1337
1338
   likelihood-ratio for (101, 102): 0.365264
   likelihood-ratio for (17, 56): 0.634736
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356

From which we can conclude that edge :math:`(17, 56)` is around twice as
likely as :math:`(101, 102)` to be a missing edge.

The prediction using the non-nested model can be performed in an
entirely analogous fashion.

References
----------

.. [holland-stochastic-1983] Paul W. Holland, Kathryn Blackmond Laskey,
   Samuel Leinhardt, "Stochastic blockmodels: First steps", Social Networks
   Volume 5, Issue 2, Pages 109-137 (1983), :doi:`10.1016/0378-8733(83)90021-7`

.. [karrer-stochastic-2011] Brian Karrer, M. E. J. Newman "Stochastic
   blockmodels and community structure in networks", Phys. Rev. E 83,
   016107 (2011), :doi:`10.1103/PhysRevE.83.016107`, :arxiv:`1008.3926`
   
1357
1358
1359
.. [peixoto-nonparametric-2016] Tiago P. Peixoto, "Nonparametric
   Bayesian inference of the microcanonical stochastic block model"
   :arxiv:`1610.02703`
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392

.. [peixoto-parsimonious-2013] Tiago P. Peixoto, "Parsimonious module
   inference in large networks", Phys. Rev. Lett. 110, 148701 (2013),
   :doi:`10.1103/PhysRevLett.110.148701`, :arxiv:`1212.4794`.

.. [peixoto-hierarchical-2014] Tiago P. Peixoto, "Hierarchical block
   structures and high-resolution model selection in large networks",
   Phys. Rev. X 4, 011047 (2014), :doi:`10.1103/PhysRevX.4.011047`,
   :arxiv:`1310.4377`.

.. [peixoto-model-2016] Tiago P. Peixoto, "Model selection and hypothesis
   testing for large-scale network models with overlapping groups",
   Phys. Rev. X 5, 011033 (2016), :doi:`10.1103/PhysRevX.5.011033`,
   :arxiv:`1409.3059`.

.. [peixoto-efficient-2014] Tiago P. Peixoto, "Efficient Monte Carlo and
   greedy heuristic for the inference of stochastic block models", Phys.
   Rev. E 89, 012804 (2014), :doi:`10.1103/PhysRevE.89.012804`,
   :arxiv:`1310.4378`

.. [clauset-hierarchical-2008] Aaron Clauset, Cristopher
   Moore, M. E. J. Newman, "Hierarchical structure and the prediction of
   missing links in networks", Nature 453, 98-101 (2008),
   :doi:`10.1038/nature06830`

.. [guimera-missing-2009] Roger Guimerà, Marta Sales-Pardo, "Missing and
   spurious interactions and the reconstruction of complex networks", PNAS
   vol. 106 no. 52 (2009), :doi:`10.1073/pnas.0908366106`
          
.. [mezard-information-2009] Marc Mézard, Andrea Montanari, "Information,
   Physics, and Computation", Oxford Univ Press, 2009.
   :DOI:`10.1093/acprof:oso/9780198570837.001.0001`