Commit 7477469e authored by Tiago Peixoto's avatar Tiago Peixoto
Browse files

Fix problem with and improve BlockState.get_edges_prob()

parent 666ccd25
......@@ -21,43 +21,44 @@ by `inferring <https://en.wikipedia.org/wiki/Statistical_inference>`_
the model parameters from data. More precisely, given the partition
:math:`\boldsymbol b = \{b_i\}` of the network into :math:`B` groups,
where :math:`b_i\in[0,B-1]` is the group membership of node :math:`i`,
we define a model that generates a network :math:`G` with a probability
we define a model that generates a network :math:`\boldsymbol G` with a
probability
.. math::
:label: model-likelihood
P(G|\theta, \boldsymbol b)
P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)
where :math:`\theta` are additional model parameters. Therefore, if we
observe a network :math:`G`, the likelihood that it was generated by a
where :math:`\boldsymbol\theta` are additional model parameters. Therefore, if we
observe a network :math:`\boldsymbol G`, the likelihood that it was generated by a
given partition :math:`\boldsymbol b` is obtained via the `Bayesian
<https://en.wikipedia.org/wiki/Bayesian_inference>`_ posterior
.. math::
:label: model-posterior-sum
P(\boldsymbol b | G) = \frac{\sum_{\theta}P(G|\theta, \boldsymbol b)P(\theta, \boldsymbol b)}{P(G)}
P(\boldsymbol b | \boldsymbol G) = \frac{\sum_{\boldsymbol\theta}P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol G)}
where :math:`P(\theta, \boldsymbol b)` is the `prior likelihood` of the
where :math:`P(\boldsymbol\theta, \boldsymbol b)` is the `prior likelihood` of the
model parameters, and
.. math::
:label: model-evidence
P(G) = \sum_{\theta,\boldsymbol b}P(G|\theta, \boldsymbol b)P(\theta, \boldsymbol b)
P(\boldsymbol G) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)
is called the `model evidence`. The particular types of model that will
be considered here have "hard constraints", such that there is only one
choice for the remaining parameters :math:`\theta` that is compatible
choice for the remaining parameters :math:`\boldsymbol\theta` that is compatible
with the generated network, such that Eq. :eq:`model-posterior-sum` simplifies to
.. math::
:label: model-posterior
P(\boldsymbol b | G) = \frac{P(G|\theta, \boldsymbol b)P(\theta, \boldsymbol b)}{P(G)}
P(\boldsymbol b | \boldsymbol G) = \frac{P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol G)}
with :math:`\theta` above being the only choice compatible with
:math:`G` and :math:`\boldsymbol b`. The inference procedures considered
with :math:`\boldsymbol\theta` above being the only choice compatible with
:math:`\boldsymbol G` and :math:`\boldsymbol b`. The inference procedures considered
here will consist in either finding a network partition that maximizes
Eq. :eq:`model-posterior`, or sampling different partitions according
its posterior probability.
......@@ -73,22 +74,22 @@ We note that Eq. :eq:`model-posterior` can be written as
.. math::
P(\boldsymbol b | G) = \frac{\exp(-\Sigma)}{P(G)}
P(\boldsymbol b | \boldsymbol G) = \frac{\exp(-\Sigma)}{P(\boldsymbol G)}
where
.. math::
:label: model-dl
\Sigma = -\ln P(G|\theta, \boldsymbol b) - \ln P(\theta, \boldsymbol b)
\Sigma = -\ln P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b) - \ln P(\boldsymbol\theta, \boldsymbol b)
is called the **description length** of the network :math:`G`. It
is called the **description length** of the network :math:`\boldsymbol G`. It
measures the amount of `information
<https://en.wikipedia.org/wiki/Information_theory>`_ required to
describe the data, if we `encode
<https://en.wikipedia.org/wiki/Entropy_encoding>`_ it using the
particular parametrization of the generative model given by
:math:`\theta` and :math:`\boldsymbol b`, as well as the parameters
:math:`\boldsymbol\theta` and :math:`\boldsymbol b`, as well as the parameters
themselves. Therefore, if we choose to maximize the posterior likelihood
of Eq. :eq:`model-posterior` it will be fully equivalent to the
so-called `minimum description length
......@@ -109,7 +110,7 @@ The `stochastic block model
the simplest generative process based on the notion of groups of
nodes [holland-stochastic-1983]_. The `microcanonical
<https://en.wikipedia.org/wiki/Microcanonical_ensemble>`_ formulation
[peixoto-entropy-2012]_ of the basic or "traditional" version takes
[peixoto-nonparametric-2016]_ of the basic or "traditional" version takes
as parameters the partition of the nodes into groups
:math:`\boldsymbol b` and a :math:`B\times B` matrix of edge counts
:math:`\boldsymbol e`, where :math:`e_{rs}` is the number of edges
......@@ -181,7 +182,7 @@ degree distributions. A better model for such networks is called the
it is defined just like the traditional model, with the addition of the
degree sequence :math:`\boldsymbol k = \{k_i\}` of the graph as an
additional set of parameters (assuming again a microcanonical
formulation [peixoto-entropy-2012]_).
formulation [peixoto-nonparametric-2016]_).
The nested stochastic block model
......@@ -458,14 +459,13 @@ case of the `C. elegans` network we have
Since it yields the smallest description length, the degree-corrected
fit should be preferred. The statistical significance of the choice can
be accessed by inspecting the posterior odds ratio (or more precisely,
the `Bayes factor <https://en.wikipedia.org/wiki/Bayes_factor>`_)
[peixoto-model-2016]_
be accessed by inspecting the posterior odds ratio
[peixoto-nonparametric-2016]_
.. math::
\Lambda &= \frac{P(\boldsymbol b, \mathcal{H}_\text{NDC} | G)}{P(\boldsymbol b, \mathcal{H}_\text{DC} | G)} \\
&= \frac{P(G, \boldsymbol b | \mathcal{H}_\text{NDC})}{P(G, \boldsymbol b | \mathcal{H}_\text{DC})}\times\frac{P(\mathcal{H}_\text{NDC})}{P(\mathcal{H}_\text{DC})} \\
\Lambda &= \frac{P(\boldsymbol b, \mathcal{H}_\text{NDC} | \boldsymbol G)}{P(\boldsymbol b, \mathcal{H}_\text{DC} | \boldsymbol G)} \\
&= \frac{P(\boldsymbol G, \boldsymbol b | \mathcal{H}_\text{NDC})}{P(\boldsymbol G, \boldsymbol b | \mathcal{H}_\text{DC})}\times\frac{P(\mathcal{H}_\text{NDC})}{P(\mathcal{H}_\text{DC})} \\
&= \exp(-\Delta\Sigma)
where :math:`\mathcal{H}_\text{NDC}` and :math:`\mathcal{H}_\text{DC}`
......@@ -733,7 +733,7 @@ itself, as follows.
bar(Bs[idx] - .5, h[idx] / h.sum(), width=1, color="#ccb974")
gca().set_xticks([6,7,8,9])
xlabel("$B$")
ylabel("$P(B|G)$")
ylabel(r"$P(B|\boldsymbol G)$")
savefig("lesmis-B-posterior.svg")
.. figure:: lesmis-B-posterior.*
......@@ -858,14 +858,14 @@ itself, as follows.
:hide:
figure()
f, ax = plt.subplots(1, 5, figsize=(10, 3/2))
f, ax = plt.subplots(1, 5, figsize=(10, 3))
for i, h_ in enumerate(h[:5]):
Bs = np.arange(len(h_))
idx = h_ > 0
ax[i].bar(Bs[idx] - .5, h_[idx] / h_.sum(), width=1, color="#ccb974")
ax[i].set_xticks(Bs[idx])
ax[i].set_xlabel("$B_{%d}$" % i)
ax[i].set_ylabel("$P(B_{%d}|G)$" % i)
ax[i].set_ylabel(r"$P(B_{%d}|\boldsymbol G)$" % i)
locator = MaxNLocator(prune='both', nbins=5)
ax[i].yaxis.set_major_locator(locator)
tight_layout()
......@@ -913,12 +913,13 @@ Model class selection
+++++++++++++++++++++
When averaging over partitions, we may be interested in evaluating which
**model class** provides a better fit of the data, considering all possible
parameter choices. This is done by evaluating the model evidence
**model class** provides a better fit of the data, considering all
possible parameter choices. This is done by evaluating the model
evidence [peixoto-nonparametric-2016]_
.. math::
P(G) = \sum_{\theta,\boldsymbol b}P(G,\theta, \boldsymbol b) = \sum_{\boldsymbol b}P(G,\boldsymbol b).
P(\boldsymbol G) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol G,\boldsymbol\theta, \boldsymbol b) = \sum_{\boldsymbol b}P(\boldsymbol G,\boldsymbol b).
This quantity is analogous to a `partition function
<https://en.wikipedia.org/wiki/Partition_function_(statistical_mechanics)>`_
......@@ -930,14 +931,14 @@ its logarithm
.. math::
:label: free-energy
\ln P(G) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(G,\boldsymbol b)}_{-\left<\Sigma\right>}\;
\ln P(\boldsymbol G) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(\boldsymbol G,\boldsymbol b)}_{-\left<\Sigma\right>}\;
\underbrace{- \sum_{\boldsymbol b}q(\boldsymbol b)\ln q(\boldsymbol b)}_{\mathcal{S}}
where
.. math::
q(\boldsymbol b) = \frac{P(G,\boldsymbol b)}{\sum_{\boldsymbol b'}P(G,\boldsymbol b')}
q(\boldsymbol b) = \frac{P(\boldsymbol G,\boldsymbol b)}{\sum_{\boldsymbol b'}P(\boldsymbol G,\boldsymbol b')}
is the posterior likelihood of partition :math:`\boldsymbol b`. The
first term of Eq. :eq:`free-energy` (the "negative energy") is minus the
......@@ -976,7 +977,7 @@ where
.. math::
q_i(r) = P(b_i = r | G)
q_i(r) = P(b_i = r | \boldsymbol G)
is the marginal group membership distribution of node :math:`i`. This
yields an entropy value given by
......@@ -1003,7 +1004,7 @@ degree of node :math:`i`, and
.. math::
q_{ij}(r, s) = P(b_i = r, b_j = s|G)
q_{ij}(r, s) = P(b_i = r, b_j = s|\boldsymbol G)
is the joint group membership distribution of nodes :math:`i` and
:math:`j` (a.k.a. the `edge marginals`). This yields an entropy value
......@@ -1240,43 +1241,41 @@ situation, the fit we make of the observed network can help us
predict missing or spurious edges in the network
[clauset-hierarchical-2008]_ [guimera-missing-2009]_.
We do so by dividing the edges into two sets :math:`G` and :math:`\delta
G`, where the former corresponds to the observed network and the latter
We do so by dividing the edges into two sets :math:`\boldsymbol G` and :math:`\delta
\boldsymbol G`, where the former corresponds to the observed network and the latter
either to the missing or spurious edges. In the case of missing edges,
we may compute the posterior of :math:`\delta G` as
we may compute the posterior of :math:`\delta \boldsymbol G` as
.. math::
:label: posterior-missing
P(\delta G | G) = \frac{\sum_{\boldsymbol b}P(G\cup\delta G | \boldsymbol b)P(\boldsymbol b | G)}{P(G)}
P(\delta \boldsymbol G | \boldsymbol G) = \frac{\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)}{P_{\delta}(\boldsymbol G)}
where
.. math::
P(G) = \sum_{\delta G}\sum_{\boldsymbol b}P(G\cup\delta G | \boldsymbol b)P(\boldsymbol b | G)
P_{\delta}(\boldsymbol G) = \sum_{\delta \boldsymbol G}\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)
is a normalization constant. For spurious edges, we have analogous
expressions by replacing :math:`P(G\cup\delta G | \boldsymbol b)` by
:math:`P(G\setminus\delta G | \boldsymbol b)`. Although the value of
:math:`P(G)` is difficult to obtain in general (since we need to perform
a sum over all possible spurious/missing edges), the numerator of
is a normalization constant. Although the value of :math:`P_{\delta}(\boldsymbol G)` is
difficult to obtain in general (since we need to perform a sum over all
possible spurious/missing edges), the numerator of
Eq. :eq:`posterior-missing` can be computed by sampling partitions from
the posterior, and then inserting or deleting edges from the graph and
computing the new likelihood. This means that we can easily compare
alternative predictive hypotheses :math:`\{\delta G_i\}` via their
alternative predictive hypotheses :math:`\{\delta \boldsymbol G_i\}` via their
likelihood ratios
.. math::
\lambda_i = \frac{P(\delta G_i | G)}{\sum_j P(\delta G_j | G)}
= \frac{\sum_{\boldsymbol b}P(G\cup\delta G_i | \boldsymbol b)P(\boldsymbol b | G)}
{\sum_j \sum_{\boldsymbol b}P(G\cup\delta G_j | \boldsymbol b)P(\boldsymbol b | G)}
\lambda_i = \frac{P(\delta \boldsymbol G_i | \boldsymbol G)}{\sum_j P(\delta \boldsymbol G_j | \boldsymbol G)}
= \frac{\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G_i | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)}
{\sum_j \sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G_j | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)}
which do not depend on the value of :math:`P(G)`.
which do not depend on the value of :math:`P_{\delta}(\boldsymbol G)`.
The values :math:`P(G\cup\delta G | \boldsymbol b)` and
:math:`P(G\setminus\delta G | \boldsymbol b)` can be computed with
The values :math:`P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b)`
can be computed with
:meth:`~graph_tool.inference.BlockState.get_edges_prob`. Hence, we can
compute spurious/missing edge probabilities just as if we were
collecting marginal distributions when doing model averaging.
......@@ -1338,8 +1337,8 @@ above).
probs = ([], [])
def collect_edge_probs(s):
p1 = s.get_edges_prob([missing_edges[0]], missing=True, entropy_args=dict(partition_dl=False))
p2 = s.get_edges_prob([missing_edges[1]], missing=True, entropy_args=dict(partition_dl=False))
p1 = s.get_edges_prob([missing_edges[0]], entropy_args=dict(partition_dl=False))
p2 = s.get_edges_prob([missing_edges[1]], entropy_args=dict(partition_dl=False))
probs[0].append(p1)
probs[1].append(p2)
......@@ -1368,8 +1367,8 @@ above).
.. testoutput:: missing-edges
likelihood-ratio for (101, 102): 0.36398
likelihood-ratio for (17, 56): 0.63602
likelihood-ratio for (101, 102): 0.365264
likelihood-ratio for (17, 56): 0.634736
From which we can conclude that edge :math:`(17, 56)` is around twice as
likely as :math:`(101, 102)` to be a missing edge.
......@@ -1388,9 +1387,9 @@ References
blockmodels and community structure in networks", Phys. Rev. E 83,
016107 (2011), :doi:`10.1103/PhysRevE.83.016107`, :arxiv:`1008.3926`
.. [peixoto-entropy-2012] Tiago P. Peixoto, "Entropy of stochastic
blockmodel ensembles", Phys. Rev. E 85 5 056122 (2012),
:doi:`10.1103/PhysRevE.85.056122`, :arxiv:`1112.6028`
.. [peixoto-nonparametric-2016] Tiago P. Peixoto, "Nonparametric
Bayesian inference of the microcanonical stochastic block model"
:arxiv:`1610.02703`
.. [peixoto-parsimonious-2013] Tiago P. Peixoto, "Parsimonious module
inference in large networks", Phys. Rev. Lett. 110, 148701 (2013),
......
......@@ -33,6 +33,7 @@ from numpy import *
import numpy
import copy
import collections
import itertools
from . util import *
......@@ -977,84 +978,91 @@ class BlockState(object):
else:
return self._state.get_move_prob(int(v), s, self.b[v], c, True)
def get_edges_prob(self, edge_list, missing=True, entropy_args={}):
"""Compute the unnormalized log-probability of the missing (or spurious if
``missing=False``) edges given by ``edge_list`` (a list of ``(source,
target)`` tuples, or :meth:`~graph_tool.Edge` instances). The values in
``entropy_args`` are passed to :meth:`graph_tool.BlockState.entropy()`
to calculate the log-probability.
def get_edges_prob(self, missing, spurious=[], entropy_args={}):
"""Compute the joint log-probability of the missing and spurious edges given by
``missing`` and ``spurious`` (a list of ``(source, target)``
tuples, or :meth:`~graph_tool.Edge` instances), together with the
observed edges.
More precisely, the log-likelihood returned is
.. math::
\ln P(\boldsymbol G + \delta \boldsymbol G | \boldsymbol b)
where :math:`\boldsymbol G + \delta \boldsymbol G` is the modified graph
(with missing edges added and spurious edges deleted).
The values in ``entropy_args`` are passed to
:meth:`graph_tool.BlockState.entropy()` to calculate the
log-probability.
"""
pos = {}
for u, v in edge_list:
for u, v in itertools.chain(missing, spurious):
pos[u] = self.b[u]
pos[v] = self.b[v]
Si = self.entropy(**entropy_args)
self.remove_vertex(pos.keys())
try:
if missing:
new_es = []
for u, v in edge_list:
if not self.is_weighted:
new_es = []
for u, v in missing:
if not self.is_weighted:
e = self.g.add_edge(u, v)
else:
e = self.g.edge(u, v)
if e is None:
e = self.g.add_edge(u, v)
else:
e = self.g.edge(u, v)
if e is None:
e = self.g.add_edge(u, v)
self.eweight[e] = 0
self.eweight[e] += 1
new_es.append(e)
else:
old_es = []
for e in edge_list:
u, v = e
if isinstance(e, tuple):
e = self.g.edge(u, v)
if e is None:
raise ValueError("edge not found: (%d, %d)" % (int(u),
int(v)))
if self.is_weighted:
self.eweight[e] -= 1
if self.eweight[e] == 0:
self.g.remove_edge(e)
else:
self.eweight[e] = 0
self.eweight[e] += 1
new_es.append(e)
old_es = []
for e in spurious:
u, v = e
if isinstance(e, tuple):
e = self.g.edge(u, v)
if e is None:
raise ValueError("edge not found: (%d, %d)" % (int(u),
int(v)))
if self.is_weighted:
self.eweight[e] -= 1
if self.eweight[e] == 0:
self.g.remove_edge(e)
old_es.append((u, v))
else:
self.g.remove_edge(e)
old_es.append((u, v))
self.add_vertex(pos.keys(), pos.values())
Sf = self.entropy(**entropy_args)
Sf = self.entropy(**overlay(dict(partition_dl=False),
**entropy_args))
self.remove_vertex(pos.keys())
finally:
if missing:
if self.is_weighted:
for e in reversed(new_es):
self.eweight[e] -= 1
if self.eweight[e] == 0:
self.g.remove_edge(e)
else:
for e in reversed(new_es):
if self.is_weighted:
for e in reversed(new_es):
self.eweight[e] -= 1
if self.eweight[e] == 0:
self.g.remove_edge(e)
else:
for u, v in old_es:
if self.is_weighted:
e = self.g.edge(u, v)
if e is None:
e = self.g.add_edge(u, v)
self.eweight[e] = 0
self.eweight[e] += 1
else:
self.g.add_edge(u, v)
for e in reversed(new_es):
self.g.remove_edge(e)
for u, v in old_es:
if self.is_weighted:
e = self.g.edge(u, v)
if e is None:
e = self.g.add_edge(u, v)
self.eweight[e] = 0
self.eweight[e] += 1
else:
self.g.add_edge(u, v)
self.add_vertex(pos.keys(), pos.values())
L = Si - Sf
L = -Sf
if _bm_test():
state = self.copy()
......
......@@ -644,16 +644,28 @@ class LayeredBlockState(OverlapBlockState, BlockState):
u = self.vmap[v][i]
return u
def get_edges_prob(self, edge_list, missing=True, entropy_args={}):
"""Compute the log-probability of the missing (or spurious if ``missing=False``)
edges given by ``edge_list`` (a list of ``(source, target, ec)`` tuples, or
:meth:`~graph_tool.Edge` instances). The values in ``entropy_args`` are
passed to :meth:`graph_tool.LayeredBlockState.entropy()` to calculate the
def get_edges_prob(self, missing, spurious=[], entropy_args={}):
"""Compute the joint log-probability of the missing and spurious edges given by
``missing`` and ``spurious`` (a list of ``(source, target, layer)``
tuples, or :meth:`~graph_tool.Edge` instances), together with the
observed edges.
More precisely, the log-likelihood returned is
.. math::
\ln P(\boldsymbol G + \delta \boldsymbol G | \boldsymbol b)
where :math:`\boldsymbol G + \delta \boldsymbol G` is the modified graph
(with missing edges added and spurious edges deleted).
The values in ``entropy_args`` are passed to
:meth:`graph_tool.BlockState.entropy()` to calculate the
log-probability.
"""
pos = {}
nes = []
for e in edge_list:
for e in itertools.chain(missing, spurious):
try:
u, v = e
l = self.ec[e]
......@@ -669,87 +681,83 @@ class LayeredBlockState(OverlapBlockState, BlockState):
edge_list = nes
Si = self.entropy(**entropy_args)
self.remove_vertex(pos.keys())
agg_state = self.agg_state
try:
if missing:
new_es = []
for u, v, l in edge_list:
if not l[1]:
state = self.agg_state
else:
state = self.layer_states[l[0]]
e = state.g.add_edge(u, v)
if not l[1]:
self.ec[e] = l[0]
if state.is_weighted:
state.eweight[e] = 1
new_es.append((e, l))
else:
old_es = []
for u, v, l in edge_list:
if not l[1]:
state = self.agg_state
es = state.g.edge(u, v, all_edges=True)
es = [e for e in es if self.ec[e] == l[0]]
if len(es) > 0:
e = es[0]
else:
e = None
else:
state = self.layer_states[l[0]]
e = state.g.edge(u, v)
if e is None:
raise ValueError("edge not found: (%d, %d, %d)" % \
(int(u), int(v), l[0]))
if state.is_weighted:
staete.eweight[e] -= 1
if state.eweight[e] == 0:
state.g.remove_edge(e)
new_es = []
for u, v, l in missing:
if not l[1]:
state = self.agg_state
else:
state = self.layer_states[l[0]]
e = state.g.add_edge(u, v)
if not l[1]:
self.ec[e] = l[0]
if state.is_weighted:
state.eweight[e] = 1
new_es.append((e, l))
old_es = []
for u, v, l in spurious:
if not l[1]:
state = self.agg_state
es = state.g.edge(u, v, all_edges=True)
es = [e for e in es if self.ec[e] == l[0]]
if len(es) > 0:
e = es[0]
else:
e = None
else:
state = self.layer_states[l[0]]
e = state.g.edge(u, v)
if e is None:
raise ValueError("edge not found: (%d, %d, %d)" % \
(int(u), int(v), l[0]))
if state.is_weighted:
staete.eweight[e] -= 1
if state.eweight[e] == 0:
state.g.remove_edge(e)
old_es.append((u, v, l))
else:
state.g.remove_edge(e)
old_es.append((u, v, l))
self.add_vertex(pos.keys(), pos.values())
Sf = self.entropy(**entropy_args)
Sf = self.entropy(**overlay(dict(partition_dl=False),
**entropy_args))
self.remove_vertex(pos.keys())
finally:
if missing:
for e, l in new_es:
if not l[1]:
state = self.agg_state
else:
state = self.layer_states[l[0]]
state.g.remove_edge(e)
else:
for u, v, l in old_es:
if not l[1]:
state = self.agg_state