Commit c0b1794f authored by Tiago Peixoto's avatar Tiago Peixoto
Browse files

inference: update cookbook and docstrings

parent 100f01d4
......@@ -34,7 +34,7 @@ which yields
<Graph object, undirected, with 115 vertices and 613 edges, 4 internal vertex properties, 2 internal graph properties, at 0x...>
we then fit the degree-corrected model by calling
We then fit the degree-corrected model by calling:
.. testcode:: football
......
......@@ -52,129 +52,65 @@ The computation of the posterior entropy :math:`\mathcal{S}`, however,
is significantly more difficult, since it involves measuring the precise
value of :math:`q(\boldsymbol b)`. A direct "brute force" computation of
:math:`\mathcal{S}` is implemented via
:meth:`~graph_tool.inference.blockmodel.BlockState.collect_partition_histogram` and
:func:`~graph_tool.inference.blockmodel.microstate_entropy`, however this is only
feasible for very small networks. For larger networks, we are forced to
perform approximations. The simplest is a "mean field" one, where we
assume the posterior factorizes as
.. math::
q(\boldsymbol b) \approx \prod_i{q_i(b_i)}
where
.. math::
q_i(r) = P(b_i = r | \boldsymbol A)
is the marginal group membership distribution of node :math:`i`. This
yields an entropy value given by
.. math::
S \approx -\sum_i\sum_rq_i(r)\ln q_i(r).
This approximation should be seen as an upper bound, since any existing
correlation between the nodes (which are ignored here) will yield
smaller entropy values.
A more accurate assumption is called the `Bethe approximation`
[mezard-information-2009]_, and takes into account the correlation
between adjacent nodes in the network,
.. math::
q(\boldsymbol b) \approx \prod_{i<j}q_{ij}(b_i,b_j)^{A_{ij}}\prod_iq_i(b_i)^{1-k_i}
where :math:`A_{ij}` is the `adjacency matrix
<https://en.wikipedia.org/wiki/Adjacency_matrix>`_, :math:`k_i` is the
degree of node :math:`i`, and
.. math::
q_{ij}(r, s) = P(b_i = r, b_j = s|\boldsymbol A)
is the joint group membership distribution of nodes :math:`i` and
:math:`j` (a.k.a. the `edge marginals`). This yields an entropy value
given by
.. math::
S \approx -\sum_{i<j}A_{ij}\sum_{rs}q_{ij}(r,s)\ln q_{ij}(r,s) - \sum_i(1-k_i)\sum_rq_i(r)\ln q_i(r).
Typically, this approximation yields smaller values than the mean field
one, and is generally considered to be superior. However, formally, it
depends on the graph being sufficiently locally "tree-like", and the
posterior being indeed strongly correlated with the adjacency matrix
itself --- two characteristics which do not hold in general. Although
the approximation often gives reasonable results even when these
conditions do not strictly hold, in some situations when they are
strongly violated this approach can yield meaningless values, such as a
negative entropy. Therefore, it is useful to compare both approaches
whenever possible.
With these approximations, it possible to estimate the full model
evidence efficiently, as we show below, using
:meth:`~graph_tool.inference.blockmodel.BlockState.collect_vertex_marginals`,
:meth:`~graph_tool.inference.blockmodel.BlockState.collect_edge_marginals`,
:meth:`~graph_tool.inference.blockmodel.mf_entropy` and
:meth:`~graph_tool.inference.blockmodel.bethe_entropy`.
:meth:`~graph_tool.inference.blockmodel.BlockState.collect_partition_histogram`
and :func:`~graph_tool.inference.blockmodel.microstate_entropy`, however
this is only feasible for very small networks. For larger networks, we
are forced to perform approximations. One possibility is to employ the
method described in [peixoto-revealing-2020]_, based on fitting a
mixture "random label" model to the posterior distribution, which allows
us to compute its entropy. In graph-tool this is done by using
:class:`~graph_tool.inference.partition_modes.PartitionModeState`, as we
show in the example below.
.. testcode:: model-evidence
from scipy.special import gammaln
g = gt.collection.data["lesmis"]
for deg_corr in [True, False]:
state = gt.minimize_blockmodel_dl(g, deg_corr=deg_corr) # Initialize the Markov
# chain from the "ground
# state"
state = state.copy(B=g.num_vertices())
dls = [] # description length history
vm = None # vertex marginals
em = None # edge marginals
bs = [] # partitions
def collect_marginals(s):
global vm, em
vm = s.collect_vertex_marginals(vm)
em = s.collect_edge_marginals(em)
def collect_partitions(s):
global bs, dls
bs.append(s.get_state().a.copy())
dls.append(s.entropy())
# Now we collect the marginal distributions for exactly 200,000 sweeps
gt.mcmc_equilibrate(state, force_niter=20000, mcmc_args=dict(niter=10),
callback=collect_marginals)
# Now we collect 2000 partitions; but the larger this is, the
# more accurate will be the calculation
gt.mcmc_equilibrate(state, force_niter=2000, mcmc_args=dict(niter=10),
callback=collect_partitions)
S_mf = gt.mf_entropy(g, vm)
S_bethe = gt.bethe_entropy(g, em)[0]
L = -mean(dls)
# Infer partition modes
pmode = gt.ModeClusterState(bs)
print("Model evidence for deg_corr = %s:" % deg_corr,
L + S_mf, "(mean field),", L + S_bethe, "(Bethe)")
# Minimize the mode state itself
gt.mcmc_equilibrate(pmode, wait=1, mcmc_args=dict(niter=1, beta=np.inf))
.. testoutput:: model-evidence
# Posterior entropy
H = pmode.posterior_entropy()
Model evidence for deg_corr = True: -383.787042... (mean field), -1053.691486... (Bethe)
Model evidence for deg_corr = False: -363.454006... (mean field), -948.144381... (Bethe)
# log(B!) term
logB = mean(gammaln(np.array([len(np.unique(b)) for b in bs]) + 1))
If we consider the more accurate approximation, the outcome shows a
preference for the non-degree-corrected model.
# Evidence
L = -mean(dls) + logB + H
print(f"Model log-evidence for deg_corr = {deg_corr}: {L}")
When using the nested model, the approach is entirely analogous. The
only difference now is that we have a hierarchical partition
:math:`\{\boldsymbol b_l\}` in the equations above, instead of simply
:math:`\boldsymbol b`. In order to make the approach tractable, we
assume the factorization
.. testoutput:: model-evidence
.. math::
Model log-evidence for deg_corr = True: -678.785176...
Model log-evidence for deg_corr = False: -672.643870...
q(\{\boldsymbol b_l\}) \approx \prod_lq_l(\boldsymbol b_l)
The outcome shows a preference for the non-degree-corrected model.
where :math:`q_l(\boldsymbol b_l)` is the marginal posterior for the
partition at level :math:`l`. For :math:`q_0(\boldsymbol b_0)` we may
use again either the mean-field or Bethe approximations, however for
:math:`l>0` only the mean-field approximation is applicable, since the
adjacency matrix of the higher layers is not constant. We show below the
When using the nested model, the approach is entirely analogous. We show below the
approach for the same network, using the nested model.
......@@ -182,49 +118,50 @@ approach for the same network, using the nested model.
g = gt.collection.data["lesmis"]
nL = 10
for deg_corr in [True, False]:
state = gt.minimize_nested_blockmodel_dl(g, deg_corr=deg_corr) # Initialize the Markov
# chain from the "ground
# state"
bs = state.get_bs() # Get hierarchical partition.
bs += [np.zeros(1)] * (nL - len(bs)) # Augment it to L = 10 with
# single-group levels.
state = state.copy(bs=bs, sampling=True)
dls = [] # description length history
vm = [None] * len(state.get_levels()) # vertex marginals
em = None # edge marginals
def collect_marginals(s):
global vm, em
levels = s.get_levels()
vm = [sl.collect_vertex_marginals(vm[l]) for l, sl in enumerate(levels)]
em = levels[0].collect_edge_marginals(em)
state = gt.NestedBlockState(g, deg_corr=deg_corr)
# Equilibrate
gt.mcmc_equilibrate(state, force_niter=1000, mcmc_args=dict(niter=10))
dls = [] # description length history
bs = [] # partitions
def collect_partitions(s):
global bs, dls
bs.append(s.get_state())
dls.append(s.entropy())
# Now we collect the marginal distributions for exactly 200,000 sweeps
gt.mcmc_equilibrate(state, force_niter=20000, mcmc_args=dict(niter=10),
callback=collect_marginals)
# Now we collect 2000 partitions; but the larger this is, the
# more accurate will be the calculation
gt.mcmc_equilibrate(state, force_niter=2000, mcmc_args=dict(niter=10),
callback=collect_partitions)
# Infer partition modes
pmode = gt.ModeClusterState(bs, nested=True)
# Minimize the mode state itself
gt.mcmc_equilibrate(pmode, wait=1, mcmc_args=dict(niter=1, beta=np.inf))
S_mf = [gt.mf_entropy(sl.g, vm[l]) for l, sl in enumerate(state.get_levels())]
S_bethe = gt.bethe_entropy(g, em)[0]
L = -mean(dls)
# Posterior entropy
H = pmode.posterior_entropy()
print("Model evidence for deg_corr = %s:" % deg_corr,
L + sum(S_mf), "(mean field),", L + S_bethe + sum(S_mf[1:]), "(Bethe)")
# log(B!) term
logB = mean([sum(gammaln(len(np.unique(bl))+1) for bl in b) for b in bs])
# Evidence
L = -mean(dls) + logB + H
print(f"Model log-evidence for deg_corr = {deg_corr}: {L}")
.. testoutput:: model-evidence
Model evidence for deg_corr = True: -183.999497... (mean field), -840.108164... (Bethe)
Model evidence for deg_corr = False: -187.344952... (mean field), -738.797625... (Bethe)
Model log-evidence for deg_corr = True: -660.331060...
Model log-evidence for deg_corr = False: -657.839574...
The results are similar: If we consider the most accurate approximation,
the non-degree-corrected model possesses the largest evidence. Note also
that we observe a better evidence for the nested models themselves, when
comparing to the evidences for the non-nested model --- which is not
quite surprising, since the non-nested model is a special case of the
nested one.
The results are similar: The non-degree-corrected model possesses the
largest evidence. Note also that we observe a better evidence for the
nested models themselves, when comparing to the evidences for the
non-nested model --- which is not quite surprising, since the non-nested
model is a special case of the nested one.
......@@ -25,8 +25,8 @@ we have
.. testoutput:: model-selection
:options: +NORMALIZE_WHITESPACE
Non-degree-corrected DL: 8520.825480...
Degree-corrected DL: 8227.987410...
Non-degree-corrected DL: 8553.474528...
Degree-corrected DL: 8266.554118...
Since it yields the smallest description length, the degree-corrected
fit should be preferred. The statistical significance of the choice can
......@@ -52,7 +52,7 @@ fits. In our particular case, we have
.. testoutput:: model-selection
:options: +NORMALIZE_WHITESPACE
ln Λ: -292.838070...
ln Λ: -286.920410...
The precise threshold that should be used to decide when to `reject a
hypothesis <https://en.wikipedia.org/wiki/Hypothesis_testing>`_ is
......@@ -80,11 +80,11 @@ example, for the American football network above, we have:
.. testoutput:: model-selection
:options: +NORMALIZE_WHITESPACE
Non-degree-corrected DL: 1757.843826...
Degree-corrected DL: 1809.861996...
ln Λ: -52.018170...
Non-degree-corrected DL: 1738.138494...
Degree-corrected DL: 1780.576716...
ln Λ: -42.438221...
Hence, with a posterior odds ratio of :math:`\Lambda \approx \mathrm{e}^{-52} \approx
10^{-22}` in favor of the non-degree-corrected model, we conclude that the
Hence, with a posterior odds ratio of :math:`\Lambda \approx \mathrm{e}^{-42} \approx
10^{-19}` in favor of the non-degree-corrected model, we conclude that the
degree-corrected variant is an unnecessarily complex description for
this network.
......@@ -164,15 +164,14 @@ simple example, using
# intervals of 10 sweeps:
u = None # marginal posterior edge probabilities
pv = None # marginal posterior group membership probabilities
bs = [] # partitions
cs = [] # average local clustering coefficient
def collect_marginals(s):
global pv, u, cs
global u, bs, cs
u = s.collect_marginal(u)
bstate = s.get_block_state()
b = gt.perfect_prop_hash([bstate.levels[0].b])[0]
pv = bstate.levels[0].collect_vertex_marginals(pv, b=b)
bs.append(bstate.levels[0].b.a.copy())
cs.append(gt.local_clustering(s.get_graph()).fa.mean())
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
......@@ -223,7 +222,10 @@ reconstructed network:
bstate = state.get_block_state()
bstate = bstate.levels[0].copy(g=u)
pv = u.own_property(pv)
# Disambiguate partitions and obtain marginals
pmode = gt.PartitionModeState(bs, converge=True)
pv = pmode.get_marginal(u)
edash = u.new_ep("vector<double>")
edash[u.edge(15, 73)] = [.1, .1, 0]
bstate.draw(pos=u.own_property(g.vp.pos), vertex_shape="pie", vertex_pie_fractions=pv,
......@@ -293,7 +295,7 @@ with uniform error rates, as we see with the same example:
# intervals of 10 sweeps:
u = None # marginal posterior edge probabilities
pv = None # marginal posterior group membership probabilities
bs = [] # partitions
cs = [] # average local clustering coefficient
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
......@@ -412,15 +414,14 @@ inference:
# intervals of 10 sweeps:
u = None # marginal posterior edge probabilities
pv = None # marginal posterior group membership probabilities
bs = [] # partitions
cs = [] # average local clustering coefficient
def collect_marginals(s):
global pv, u, cs
global bs, u, cs
u = s.collect_marginal(u)
bstate = s.get_block_state()
b = gt.perfect_prop_hash([bstate.levels[0].b])[0]
pv = bstate.levels[0].collect_vertex_marginals(pv, b=b)
bs.append(bstate.levels[0].b.a.copy())
cs.append(gt.local_clustering(s.get_graph()).fa.mean())
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
......@@ -465,7 +466,11 @@ the same measurement probability. The reconstructed network is visualized below.
bstate = state.get_block_state()
bstate = bstate.levels[0].copy(g=u)
pv = u.own_property(pv)
# Disambiguate partitions and obtain marginals
pmode = gt.PartitionModeState(bs, converge=True)
pv = pmode.get_marginal(u)
bstate.draw(pos=u.own_property(g.vp.pos), vertex_shape="pie", vertex_pie_fractions=pv,
edge_color=ecolor, edge_dash_style=edash, edge_gradient=None,
output="lesmis-uncertain-reconstruction-marginals.svg")
......@@ -516,14 +521,13 @@ latent multiedges of a network of political books:
# intervals of 10 sweeps:
u = None # marginal posterior multigraph
pv = None # marginal posterior group membership probabilities
bs = [] # partitions
def collect_marginals(s):
global pv, u
global bs, u
u = s.collect_marginal_multigraph(u)
bstate = state.get_block_state()
b = gt.perfect_prop_hash([bstate.levels[0].b])[0]
pv = bstate.levels[0].collect_vertex_marginals(pv, b=b)
bs.append(bstate.levels[0].b.a.copy())
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
callback=collect_marginals)
......@@ -538,7 +542,11 @@ latent multiedges of a network of political books:
bstate = state.get_block_state()
bstate = bstate.levels[0].copy(g=u)
pv = u.own_property(pv)
# Disambiguate partitions and obtain marginals
pmode = gt.PartitionModeState(bs, converge=True)
pv = pmode.get_marginal(u)
bstate.draw(pos=u.own_property(g.vp.pos), vertex_shape="pie", vertex_pie_fractions=pv,
edge_pen_width=gt.prop_to_size(ew, .1, 8, power=1), edge_gradient=None,
output="polbooks-erased-poisson.svg")
......
......@@ -117,14 +117,13 @@ epidemic process.
# intervals of 10 sweeps:
gm = None
bm = None
bs = []
betas = []
def collect_marginals(s):
global gm, bm
global gm, bs
gm = s.collect_marginal(gm)
b = gt.perfect_prop_hash([s.bstate.b])[0]
bm = s.bstate.collect_vertex_marginals(bm, b=b)
bs.append(s.bstate.b.a.copy())
betas.append(s.params["global_beta"])
gt.mcmc_equilibrate(rstate, force_niter=10000, mcmc_args=dict(niter=10, xstep=0),
......@@ -132,9 +131,13 @@ epidemic process.
print("Posterior similarity: ", gt.similarity(g, gm, g.new_ep("double", 1), gm.ep.eprob))
print("Inferred infection probability: %g ± %g" % (mean(betas), std(betas)))
# Disambiguate partitions and obtain marginals
pmode = gt.PartitionModeState(bs, converge=True)
pv = pmode.get_marginal(gm)
gt.graph_draw(gm, gm.own_property(g.vp.pos), vertex_shape="pie", vertex_color="black",
vertex_pie_fractions=gm.own_property(bm), vertex_pen_width=1,
vertex_pie_fractions=pv, vertex_pen_width=1,
edge_pen_width=gt.prop_to_size(gm.ep.eprob, 0, 5),
eorder=gm.ep.eprob, output="dolphins-posterior.svg")
......
......@@ -81,37 +81,41 @@ Note that the value of ``wait`` above was made purposefully low so that
the output would not be overly long. The most appropriate value requires
experimentation, but a typically good value is ``wait=1000``.
The function :func:`~graph_tool.inference.mcmc.mcmc_equilibrate` accepts a
``callback`` argument that takes an optional function to be invoked
The function :func:`~graph_tool.inference.mcmc.mcmc_equilibrate` accepts
a ``callback`` argument that takes an optional function to be invoked
after each call to
:meth:`~graph_tool.inference.blockmodel.BlockState.multiflip_mcmc_sweep`. This function
should accept a single parameter which will contain the actual
:class:`~graph_tool.inference.blockmodel.BlockState` instance. We will use this in
the example below to collect the posterior vertex marginals (via
:class:`~graph_tool.inference.blockmodel.BlockState.collect_vertex_marginals`),
i.e. the posterior probability that a node belongs to a given group:
:meth:`~graph_tool.inference.blockmodel.BlockState.multiflip_mcmc_sweep`. This
function should accept a single parameter which will contain the actual
:class:`~graph_tool.inference.blockmodel.BlockState` instance. We will
use this in the example below to collect the posterior vertex marginals
(via :class:`~graph_tool.inference.partition_modes.PartitionModeState`,
which disambiguates group labels [peixoto-revealing-2020]_), i.e. the
posterior probability that a node belongs to a given group:
.. testcode:: model-averaging
# We will first equilibrate the Markov chain
gt.mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10))
pv = None
bs = [] # collect some partitions
def collect_marginals(s):
global pv
b = gt.perfect_prop_hash([s.b])[0]
pv = s.collect_vertex_marginals(pv, b=b)
def collect_partitions(s):
global bs
bs.append(s.b.a.copy())
# Now we collect the marginals for exactly 100,000 sweeps, at
# intervals of 10 sweeps:
# Now we collect partitions for exactly 100,000 sweeps, at intervals
# of 10 sweeps:
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
callback=collect_marginals)
callback=collect_partitions)
# Disambiguate partitions and obtain marginals
pmode = gt.PartitionModeState(bs, converge=True)
pv = pmode.get_marginal(g)
# Now the node marginals are stored in property map pv. We can
# visualize them as pie charts on the nodes:
state.draw(pos=g.vp.pos, vertex_shape="pie", vertex_pie_fractions=pv,
edge_gradient=None, output="lesmis-sbm-marginals.svg")
output="lesmis-sbm-marginals.svg")
.. figure:: lesmis-sbm-marginals.*
:align: center
......@@ -135,8 +139,8 @@ itself, as follows.
B = s.get_nonempty_B()
h[B] += 1
# Now we collect the marginals for exactly 100,000 sweeps, at
# intervals of 10 sweeps:
# Now we collect partitions for exactly 100,000 sweeps, at intervals
# of 10 sweeps:
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
callback=collect_num_groups)
......@@ -194,7 +198,6 @@ network as above.
Change in description length: -73.716766...
Number of accepted vertex moves: 366160
.. warning::
When using
......@@ -212,28 +215,34 @@ Similarly to the the non-nested case, we can use
:func:`~graph_tool.inference.mcmc.mcmc_equilibrate` to do most of the boring
work, and we can now obtain vertex marginals on all hierarchical levels:
.. testcode:: nested-model-averaging
# We will first equilibrate the Markov chain
gt.mcmc_equilibrate(state, wait=1000, mcmc_args=dict(niter=10))
pv = [None] * len(state.get_levels())
# collect nested partitions
bs = []
def collect_marginals(s):
global pv
bs = [gt.perfect_prop_hash([s.b])[0] for s in state.get_levels()]
pv = [s.collect_vertex_marginals(pv[l], b=bs[l]) for l, s in enumerate(s.get_levels())]
def collect_partitions(s):
global bs
bs.append(s.get_bs())
# Now we collect the marginals for exactly 100,000 sweeps
gt.mcmc_equilibrate(state, force_niter=10000, mcmc_args=dict(niter=10),
callback=collect_marginals)
callback=collect_partitions)
# Disambiguate partitions and obtain marginals
pmode = gt.PartitionModeState(bs, nested=True, converge=True)
pv = pmode.get_marginal(g)
# Get consensus estimate
bs = pmode.get_max_nested()
state = state.copy(bs=bs)
# Now the node marginals for all levels are stored in property map
# list pv. We can visualize the first level as pie charts on the nodes:
state_0 = state.get_levels()[0]
state_0.draw(pos=g.vp.pos, vertex_shape="pie", vertex_pie_fractions=pv[0],
edge_gradient=None, output="lesmis-nested-sbm-marginals.svg")
# We can visualize the marginals as pie charts on the nodes:
state.draw(vertex_shape="pie", vertex_pie_fractions=pv,
output="lesmis-nested-sbm-marginals.svg")
.. figure:: lesmis-nested-sbm-marginals.*
:align: center
......@@ -316,3 +325,79 @@ distribution.
:width: 200px
.. image:: lesmis-partition-sample-9.svg
:width: 200px
Characterizing the posterior distribution
+++++++++++++++++++++++++++++++++++++++++
The posterior distribution of partitions can have an elaborate
structure, containing multiple possible explanations for the data. In
order to summarize it, we can infer the modes of the distribution using
:class:`~graph_tool.inference.partition_modes.ModeClusterState`, as
described in [peixoto-revealing-2020]_. This amounts to identifying
clusters of partitions that are very similar to each other, but
sufficiently different from those that belong to other
clusters. Collective, such "modes" represent the different stories that
the data is telling us through the model. Here is an example using again
the Les Misérables network:
.. testcode:: partition-modes
g = gt.collection.data["lesmis"]
state = gt.NestedBlockState(g)
# Equilibration
gt.mcmc_equilibrate(state, force_niter=1000, mcmc_args=dict(niter=10))
bs = []
def collect_partitions(s):
global bs
bs.append(s.get_bs())
# We will collect only partitions 1000 partitions. For more accurate
# results, this number should be increased.
gt.mcmc_equilibrate(state, force_niter=1000, mcmc_args=dict(niter=10),
callback=collect_partitions)
# Infer partition modes
pmode = gt.ModeClusterState(bs, nested=True)
# Minimize the mode state itself
gt.mcmc_equilibrate(pmode, wait=1, mcmc_args=dict(niter=1, beta=np.inf))
# Get inferred modes
modes = pmode.get_modes()
for i, mode in enumerate(modes):
b = mode.get_max_nested() # mode's maximum
pv = mode.get_marginal(g) # mode's marginal distribution
print(f"Mode {i} with size {mode.get_M()/len(bs)}")
state = state.copy(bs=b)
state.draw(vertex_shape="pie", vertex_pie_fractions=pv,
output="lesmis-partition-mode-%i.svg" % i)
Running the above code gives us the relative size of each mode,
corresponding to their collective posterior probability.
.. testoutput:: partition-modes
Mode 0 with size 0.389389...
Mode 1 with size 0.352352...
Mode 2 with size 0.129129...