Commit 7041c30f authored by Tiago Peixoto's avatar Tiago Peixoto

Inference HOWTO minor text improvements

parent 7849a23e
......@@ -7,22 +7,22 @@ structures in simple terms, often by dividing the nodes into modules or
A principled approach to perform this task is to formulate `generative
models <https://en.wikipedia.org/wiki/Generative_model>`_ that include
the idea of "modules" in their descriptions, which then can be detected
the idea of modules in their descriptions, which then can be detected
by `inferring <https://en.wikipedia.org/wiki/Statistical_inference>`_
the model parameters from data. More precisely, given the partition
:math:`\boldsymbol b = \{b_i\}` of the network into :math:`B` groups,
where :math:`b_i\in[0,B-1]` is the group membership of node :math:`i`,
we define a model that generates a network :math:`\boldsymbol G` with a
we define a model that generates a network :math:`\boldsymbol A` with a
probability
.. math::
:label: model-likelihood
P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)
P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b)
where :math:`\boldsymbol\theta` are additional model parameters that
control how the node partition affects the structure of the
network. Therefore, if we observe a network :math:`\boldsymbol G`, the
network. Therefore, if we observe a network :math:`\boldsymbol A`, the
likelihood that it was generated by a given partition :math:`\boldsymbol
b` is obtained via the `Bayesian
<https://en.wikipedia.org/wiki/Bayesian_inference>`_ posterior probability
......@@ -30,7 +30,7 @@ b` is obtained via the `Bayesian
.. math::
:label: model-posterior-sum
P(\boldsymbol b | \boldsymbol G) = \frac{\sum_{\boldsymbol\theta}P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol G)}
P(\boldsymbol b | \boldsymbol A) = \frac{\sum_{\boldsymbol\theta}P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol A)}
where :math:`P(\boldsymbol\theta, \boldsymbol b)` is the `prior
probability <https://en.wikipedia.org/wiki/Prior_probability>`_ of the
......@@ -39,7 +39,7 @@ model parameters, and
.. math::
:label: model-evidence
P(\boldsymbol G) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)
P(\boldsymbol A) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)
is called the `evidence`, and corresponds to the total probability of
the data summed over all model parameters. The particular types of model
......@@ -51,10 +51,10 @@ Eq. :eq:`model-posterior-sum` simplifies to
.. math::
:label: model-posterior
P(\boldsymbol b | \boldsymbol G) = \frac{P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol G)}
P(\boldsymbol b | \boldsymbol A) = \frac{P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol A)}
with :math:`\boldsymbol\theta` above being the only choice compatible with
:math:`\boldsymbol G` and :math:`\boldsymbol b`. The inference procedures considered
:math:`\boldsymbol A` and :math:`\boldsymbol b`. The inference procedures considered
here will consist in either finding a network partition that maximizes
Eq. :eq:`model-posterior`, or sampling different partitions according
its posterior probability.
......@@ -70,14 +70,14 @@ We note that Eq. :eq:`model-posterior` can be written as
.. math::
P(\boldsymbol b | \boldsymbol G) = \frac{\exp(-\Sigma)}{P(\boldsymbol G)}
P(\boldsymbol b | \boldsymbol A) = \frac{\exp(-\Sigma)}{P(\boldsymbol A)}
where
.. math::
:label: model-dl
\Sigma = -\ln P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b) - \ln P(\boldsymbol\theta, \boldsymbol b)
\Sigma = -\ln P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b) - \ln P(\boldsymbol\theta, \boldsymbol b)
is called the **description length** of the network :math:`\boldsymbol
G`. It measures the amount of `information
......
......@@ -13,17 +13,17 @@ partition [aicher-learning-2015]_ [peixoto-weighted-2017]_, i.e.
.. math::
P(\boldsymbol x,\boldsymbol G|\boldsymbol b) =
P(\boldsymbol x|\boldsymbol G,\boldsymbol b) P(\boldsymbol G|\boldsymbol b),
P(\boldsymbol x,\boldsymbol A|\boldsymbol b) =
P(\boldsymbol x|\boldsymbol A,\boldsymbol b) P(\boldsymbol A|\boldsymbol b),
where :math:`P(\boldsymbol G|\boldsymbol b)` is the likelihood of the
where :math:`P(\boldsymbol A|\boldsymbol b)` is the likelihood of the
unweighted SBM described previously, and :math:`P(\boldsymbol
x|\boldsymbol G,\boldsymbol b)` is the integrated likelihood of the edge
x|\boldsymbol A,\boldsymbol b)` is the integrated likelihood of the edge
weights
.. math::
P(\boldsymbol x|\boldsymbol G,\boldsymbol b) =
P(\boldsymbol x|\boldsymbol A,\boldsymbol b) =
\prod_{r\le s}\int P({\boldsymbol x}_{rs}|\gamma)P(\gamma)\,\mathrm{d}\gamma,
where :math:`P({\boldsymbol x}_{rs}|\gamma)` is some model for the weights
......@@ -36,9 +36,9 @@ partition distribution is then simply
.. math::
P(\boldsymbol b | \boldsymbol G,\boldsymbol x) =
\frac{P(\boldsymbol x|\boldsymbol G,\boldsymbol b) P(\boldsymbol G|\boldsymbol b)
P(\boldsymbol b)}{P(\boldsymbol G,\boldsymbol x)},
P(\boldsymbol b | \boldsymbol A,\boldsymbol x) =
\frac{P(\boldsymbol x|\boldsymbol A,\boldsymbol b) P(\boldsymbol A|\boldsymbol b)
P(\boldsymbol b)}{P(\boldsymbol A,\boldsymbol x)},
which can be sampled from, or maximized, just like with the unweighted
case, but will use the information on the weights to guide the partitions.
......@@ -161,7 +161,7 @@ follows:
g = gt.collection.konect_data["foodweb-baywet"]
# This network contains an internal edge property map with name
# "weight" that contains the biomass flow between species. The values
# "weight" that contains the energy flow between species. The values
# are continuous in the range [0, infinity].
state = gt.minimize_nested_blockmodel_dl(g, state_args=dict(recs=[g.ep.weight],
......@@ -176,7 +176,7 @@ follows:
:width: 350px
Best fit of the exponential-weighted degree-corrected SBM for a food
web, using the biomass flow as edge covariates (indicated by the edge
web, using the energy flow as edge covariates (indicated by the edge
colors and widths).
Alternatively, we may consider a transformation of the type
......@@ -211,7 +211,7 @@ can fit this alternative model simply by using the transformed weights:
:width: 350px
Best fit of the log-normal-weighted degree-corrected SBM for a food
web, using the biomass flow as edge covariates (indicated by the edge
web, using the energy flow as edge covariates (indicated by the edge
colors and widths).
At this point, we ask ourselves which of the above models yields the
......@@ -222,8 +222,8 @@ incurred by the variable transformation, i.e.
.. math::
P(\boldsymbol x | \boldsymbol G, \boldsymbol b) =
P(\boldsymbol y(\boldsymbol x) | \boldsymbol G, \boldsymbol b)
P(\boldsymbol x | \boldsymbol A, \boldsymbol b) =
P(\boldsymbol y(\boldsymbol x) | \boldsymbol A, \boldsymbol b)
\prod_{ij}\left[\frac{\mathrm{d}y_{ij}}{\mathrm{d}x_{ij}}(x_{ij})\right]^{A_{ij}}.
In the particular case of Eq. :eq:`log_transform`, we have
......
......@@ -8,7 +8,7 @@ evidence summed over all possible partitions [peixoto-nonparametric-2017]_:
.. math::
P(\boldsymbol G) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol G,\boldsymbol\theta, \boldsymbol b) = \sum_{\boldsymbol b}P(\boldsymbol G,\boldsymbol b).
P(\boldsymbol A) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol A,\boldsymbol\theta, \boldsymbol b) = \sum_{\boldsymbol b}P(\boldsymbol A,\boldsymbol b).
This quantity is analogous to a `partition function
<https://en.wikipedia.org/wiki/Partition_function_(statistical_mechanics)>`_
......@@ -20,14 +20,14 @@ its logarithm
.. math::
:label: free-energy
\ln P(\boldsymbol G) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(\boldsymbol G,\boldsymbol b)}_{-\left<\Sigma\right>}\;
\ln P(\boldsymbol A) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(\boldsymbol A,\boldsymbol b)}_{-\left<\Sigma\right>}\;
\underbrace{- \sum_{\boldsymbol b}q(\boldsymbol b)\ln q(\boldsymbol b)}_{\mathcal{S}}
where
.. math::
q(\boldsymbol b) = \frac{P(\boldsymbol G,\boldsymbol b)}{\sum_{\boldsymbol b'}P(\boldsymbol G,\boldsymbol b')}
q(\boldsymbol b) = \frac{P(\boldsymbol A,\boldsymbol b)}{\sum_{\boldsymbol b'}P(\boldsymbol A,\boldsymbol b')}
is the posterior probability of partition :math:`\boldsymbol b`. The
first term of Eq. :eq:`free-energy` (the "negative energy") is minus the
......@@ -66,7 +66,7 @@ where
.. math::
q_i(r) = P(b_i = r | \boldsymbol G)
q_i(r) = P(b_i = r | \boldsymbol A)
is the marginal group membership distribution of node :math:`i`. This
yields an entropy value given by
......@@ -93,7 +93,7 @@ degree of node :math:`i`, and
.. math::
q_{ij}(r, s) = P(b_i = r, b_j = s|\boldsymbol G)
q_{ij}(r, s) = P(b_i = r, b_j = s|\boldsymbol A)
is the joint group membership distribution of nodes :math:`i` and
:math:`j` (a.k.a. the `edge marginals`). This yields an entropy value
......
......@@ -35,8 +35,8 @@ be accessed by inspecting the posterior odds ratio
.. math::
\Lambda &= \frac{P(\boldsymbol b, \mathcal{H}_\text{NDC} | \boldsymbol G)}{P(\boldsymbol b, \mathcal{H}_\text{DC} | \boldsymbol G)} \\
&= \frac{P(\boldsymbol G, \boldsymbol b | \mathcal{H}_\text{NDC})}{P(\boldsymbol G, \boldsymbol b | \mathcal{H}_\text{DC})}\times\frac{P(\mathcal{H}_\text{NDC})}{P(\mathcal{H}_\text{DC})} \\
\Lambda &= \frac{P(\boldsymbol b, \mathcal{H}_\text{NDC} | \boldsymbol A)}{P(\boldsymbol b, \mathcal{H}_\text{DC} | \boldsymbol A)} \\
&= \frac{P(\boldsymbol A, \boldsymbol b | \mathcal{H}_\text{NDC})}{P(\boldsymbol A, \boldsymbol b | \mathcal{H}_\text{DC})}\times\frac{P(\mathcal{H}_\text{NDC})}{P(\mathcal{H}_\text{DC})} \\
&= \exp(-\Delta\Sigma)
where :math:`\mathcal{H}_\text{NDC}` and :math:`\mathcal{H}_\text{DC}`
......
......@@ -12,22 +12,22 @@ intrinsically assuming it to be uniform), and as a consequence of the
overall setup, only *relative probabilities* between individual missing
and spurious edges can be produced, instead of the full posterior
distribution considered in the last section. Since this limits the
overall network reconstruction, and does not yields confidence
overall network reconstruction, and does not yield confidence
intervals, it is a less powerful approach. Nevertheless, it is a popular
procedure, which can also be performed with graph-tool, as we describe
in the following.
We set up the classification task by dividing the edges/non-edges into
two sets :math:`\boldsymbol G` and :math:`\delta \boldsymbol G`, where
two sets :math:`\boldsymbol A` and :math:`\delta \boldsymbol A`, where
the former corresponds to the observed network and the latter either to
the missing or spurious edges. We may compute the posterior of
:math:`\delta \boldsymbol G` as [valles-catala-consistency-2017]_
:math:`\delta \boldsymbol A` as [valles-catala-consistency-2017]_
.. math::
:label: posterior-missing
P(\delta \boldsymbol G | \boldsymbol G) \propto
\sum_{\boldsymbol b}\frac{P(\boldsymbol G \cup \delta\boldsymbol G| \boldsymbol b)}{P(\boldsymbol G| \boldsymbol b)}P(\boldsymbol b | \boldsymbol G)
P(\delta \boldsymbol A | \boldsymbol A) \propto
\sum_{\boldsymbol b}\frac{P(\boldsymbol A \cup \delta\boldsymbol A| \boldsymbol b)}{P(\boldsymbol A| \boldsymbol b)}P(\boldsymbol b | \boldsymbol A)
up to a normalization constant [#prediction_posterior]_. Although the
normalization constant is difficult to obtain in general (since we need
......@@ -36,15 +36,15 @@ numerator of Eq. :eq:`posterior-missing` can be computed by sampling
partitions from the posterior, and then inserting or deleting edges from
the graph and computing the new likelihood. This means that we can
easily compare alternative predictive hypotheses :math:`\{\delta
\boldsymbol G_i\}` via their likelihood ratios
\boldsymbol A_i\}` via their likelihood ratios
.. math::
\lambda_i = \frac{P(\delta \boldsymbol G_i | \boldsymbol G)}{\sum_j P(\delta \boldsymbol G_j | \boldsymbol G)}
\lambda_i = \frac{P(\delta \boldsymbol A_i | \boldsymbol A)}{\sum_j P(\delta \boldsymbol A_j | \boldsymbol A)}
which do not depend on the normalization constant.
The values :math:`P(\delta \boldsymbol G | \boldsymbol G, \boldsymbol b)`
The values :math:`P(\delta \boldsymbol A | \boldsymbol A, \boldsymbol b)`
can be computed with
:meth:`~graph_tool.inference.blockmodel.BlockState.get_edges_prob`. Hence, we can
compute spurious/missing edge probabilities just as if we were
......
......@@ -245,7 +245,7 @@ Heterogeneous errors
In a more general scenario the measurement errors can be different for
each node pair, i.e. :math:`p_{ij}` and :math:`q_{ij}` are the missing
and spurious edge probability for node pair :math:`(i,j)`. The
and spurious edge probabilities for node pair :math:`(i,j)`. The
measurement likelihood then becomes
.. math::
......
......@@ -217,7 +217,7 @@ itself, as follows.
bar(Bs[idx], h[idx] / h.sum(), width=1, color="#ccb974")
gca().set_xticks([6,7,8,9])
xlabel("$B$")
ylabel(r"$P(B|\boldsymbol G)$")
ylabel(r"$P(B|\boldsymbol A)$")
savefig("lesmis-B-posterior.svg")
.. figure:: lesmis-B-posterior.*
......@@ -341,7 +341,7 @@ itself, as follows.
ax[i].bar(Bs[idx], h_[idx] / h_.sum(), width=1, color="#ccb974")
ax[i].set_xticks(Bs[idx])
ax[i].set_xlabel("$B_{%d}$" % i)
ax[i].set_ylabel(r"$P(B_{%d}|\boldsymbol G)$" % i)
ax[i].set_ylabel(r"$P(B_{%d}|\boldsymbol A)$" % i)
locator = MaxNLocator(prune='both', nbins=5)
ax[i].yaxis.set_major_locator(locator)
tight_layout()
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment