Commit 7041c30f by Tiago Peixoto

Inference HOWTO minor text improvements

parent 7849a23e
 ... ... @@ -7,22 +7,22 @@ structures in simple terms, often by dividing the nodes into modules or A principled approach to perform this task is to formulate generative models _ that include the idea of "modules" in their descriptions, which then can be detected the idea of modules in their descriptions, which then can be detected by inferring _ the model parameters from data. More precisely, given the partition :math:\boldsymbol b = \{b_i\} of the network into :math:B groups, where :math:b_i\in[0,B-1] is the group membership of node :math:i, we define a model that generates a network :math:\boldsymbol G with a we define a model that generates a network :math:\boldsymbol A with a probability .. math:: :label: model-likelihood P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b) P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b) where :math:\boldsymbol\theta are additional model parameters that control how the node partition affects the structure of the network. Therefore, if we observe a network :math:\boldsymbol G, the network. Therefore, if we observe a network :math:\boldsymbol A, the likelihood that it was generated by a given partition :math:\boldsymbol b is obtained via the Bayesian _ posterior probability ... ... @@ -30,7 +30,7 @@ b is obtained via the Bayesian .. math:: :label: model-posterior-sum P(\boldsymbol b | \boldsymbol G) = \frac{\sum_{\boldsymbol\theta}P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol G)} P(\boldsymbol b | \boldsymbol A) = \frac{\sum_{\boldsymbol\theta}P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol A)} where :math:P(\boldsymbol\theta, \boldsymbol b) is the prior probability _ of the ... ... @@ -39,7 +39,7 @@ model parameters, and .. math:: :label: model-evidence P(\boldsymbol G) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b) P(\boldsymbol A) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b) is called the evidence, and corresponds to the total probability of the data summed over all model parameters. The particular types of model ... ... @@ -51,10 +51,10 @@ Eq. :eq:model-posterior-sum simplifies to .. math:: :label: model-posterior P(\boldsymbol b | \boldsymbol G) = \frac{P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol G)} P(\boldsymbol b | \boldsymbol A) = \frac{P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol A)} with :math:\boldsymbol\theta above being the only choice compatible with :math:\boldsymbol G and :math:\boldsymbol b. The inference procedures considered :math:\boldsymbol A and :math:\boldsymbol b. The inference procedures considered here will consist in either finding a network partition that maximizes Eq. :eq:model-posterior, or sampling different partitions according its posterior probability. ... ... @@ -70,14 +70,14 @@ We note that Eq. :eq:model-posterior can be written as .. math:: P(\boldsymbol b | \boldsymbol G) = \frac{\exp(-\Sigma)}{P(\boldsymbol G)} P(\boldsymbol b | \boldsymbol A) = \frac{\exp(-\Sigma)}{P(\boldsymbol A)} where .. math:: :label: model-dl \Sigma = -\ln P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b) - \ln P(\boldsymbol\theta, \boldsymbol b) \Sigma = -\ln P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b) - \ln P(\boldsymbol\theta, \boldsymbol b) is called the **description length** of the network :math:\boldsymbol G. It measures the amount of information ... ...
 ... ... @@ -13,17 +13,17 @@ partition [aicher-learning-2015]_ [peixoto-weighted-2017]_, i.e. .. math:: P(\boldsymbol x,\boldsymbol G|\boldsymbol b) = P(\boldsymbol x|\boldsymbol G,\boldsymbol b) P(\boldsymbol G|\boldsymbol b), P(\boldsymbol x,\boldsymbol A|\boldsymbol b) = P(\boldsymbol x|\boldsymbol A,\boldsymbol b) P(\boldsymbol A|\boldsymbol b), where :math:P(\boldsymbol G|\boldsymbol b) is the likelihood of the where :math:P(\boldsymbol A|\boldsymbol b) is the likelihood of the unweighted SBM described previously, and :math:P(\boldsymbol x|\boldsymbol G,\boldsymbol b) is the integrated likelihood of the edge x|\boldsymbol A,\boldsymbol b) is the integrated likelihood of the edge weights .. math:: P(\boldsymbol x|\boldsymbol G,\boldsymbol b) = P(\boldsymbol x|\boldsymbol A,\boldsymbol b) = \prod_{r\le s}\int P({\boldsymbol x}_{rs}|\gamma)P(\gamma)\,\mathrm{d}\gamma, where :math:P({\boldsymbol x}_{rs}|\gamma) is some model for the weights ... ... @@ -36,9 +36,9 @@ partition distribution is then simply .. math:: P(\boldsymbol b | \boldsymbol G,\boldsymbol x) = \frac{P(\boldsymbol x|\boldsymbol G,\boldsymbol b) P(\boldsymbol G|\boldsymbol b) P(\boldsymbol b)}{P(\boldsymbol G,\boldsymbol x)}, P(\boldsymbol b | \boldsymbol A,\boldsymbol x) = \frac{P(\boldsymbol x|\boldsymbol A,\boldsymbol b) P(\boldsymbol A|\boldsymbol b) P(\boldsymbol b)}{P(\boldsymbol A,\boldsymbol x)}, which can be sampled from, or maximized, just like with the unweighted case, but will use the information on the weights to guide the partitions. ... ... @@ -161,7 +161,7 @@ follows: g = gt.collection.konect_data["foodweb-baywet"] # This network contains an internal edge property map with name # "weight" that contains the biomass flow between species. The values # "weight" that contains the energy flow between species. The values # are continuous in the range [0, infinity]. state = gt.minimize_nested_blockmodel_dl(g, state_args=dict(recs=[g.ep.weight], ... ... @@ -176,7 +176,7 @@ follows: :width: 350px Best fit of the exponential-weighted degree-corrected SBM for a food web, using the biomass flow as edge covariates (indicated by the edge web, using the energy flow as edge covariates (indicated by the edge colors and widths). Alternatively, we may consider a transformation of the type ... ... @@ -211,7 +211,7 @@ can fit this alternative model simply by using the transformed weights: :width: 350px Best fit of the log-normal-weighted degree-corrected SBM for a food web, using the biomass flow as edge covariates (indicated by the edge web, using the energy flow as edge covariates (indicated by the edge colors and widths). At this point, we ask ourselves which of the above models yields the ... ... @@ -222,8 +222,8 @@ incurred by the variable transformation, i.e. .. math:: P(\boldsymbol x | \boldsymbol G, \boldsymbol b) = P(\boldsymbol y(\boldsymbol x) | \boldsymbol G, \boldsymbol b) P(\boldsymbol x | \boldsymbol A, \boldsymbol b) = P(\boldsymbol y(\boldsymbol x) | \boldsymbol A, \boldsymbol b) \prod_{ij}\left[\frac{\mathrm{d}y_{ij}}{\mathrm{d}x_{ij}}(x_{ij})\right]^{A_{ij}}. In the particular case of Eq. :eq:log_transform, we have ... ...
 ... ... @@ -8,7 +8,7 @@ evidence summed over all possible partitions [peixoto-nonparametric-2017]_: .. math:: P(\boldsymbol G) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol G,\boldsymbol\theta, \boldsymbol b) = \sum_{\boldsymbol b}P(\boldsymbol G,\boldsymbol b). P(\boldsymbol A) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol A,\boldsymbol\theta, \boldsymbol b) = \sum_{\boldsymbol b}P(\boldsymbol A,\boldsymbol b). This quantity is analogous to a partition function _ ... ... @@ -20,14 +20,14 @@ its logarithm .. math:: :label: free-energy \ln P(\boldsymbol G) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(\boldsymbol G,\boldsymbol b)}_{-\left<\Sigma\right>}\; \ln P(\boldsymbol A) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(\boldsymbol A,\boldsymbol b)}_{-\left<\Sigma\right>}\; \underbrace{- \sum_{\boldsymbol b}q(\boldsymbol b)\ln q(\boldsymbol b)}_{\mathcal{S}} where .. math:: q(\boldsymbol b) = \frac{P(\boldsymbol G,\boldsymbol b)}{\sum_{\boldsymbol b'}P(\boldsymbol G,\boldsymbol b')} q(\boldsymbol b) = \frac{P(\boldsymbol A,\boldsymbol b)}{\sum_{\boldsymbol b'}P(\boldsymbol A,\boldsymbol b')} is the posterior probability of partition :math:\boldsymbol b. The first term of Eq. :eq:free-energy (the "negative energy") is minus the ... ... @@ -66,7 +66,7 @@ where .. math:: q_i(r) = P(b_i = r | \boldsymbol G) q_i(r) = P(b_i = r | \boldsymbol A) is the marginal group membership distribution of node :math:i. This yields an entropy value given by ... ... @@ -93,7 +93,7 @@ degree of node :math:i, and .. math:: q_{ij}(r, s) = P(b_i = r, b_j = s|\boldsymbol G) q_{ij}(r, s) = P(b_i = r, b_j = s|\boldsymbol A) is the joint group membership distribution of nodes :math:i and :math:j (a.k.a. the edge marginals). This yields an entropy value ... ...
 ... ... @@ -35,8 +35,8 @@ be accessed by inspecting the posterior odds ratio .. math:: \Lambda &= \frac{P(\boldsymbol b, \mathcal{H}_\text{NDC} | \boldsymbol G)}{P(\boldsymbol b, \mathcal{H}_\text{DC} | \boldsymbol G)} \\ &= \frac{P(\boldsymbol G, \boldsymbol b | \mathcal{H}_\text{NDC})}{P(\boldsymbol G, \boldsymbol b | \mathcal{H}_\text{DC})}\times\frac{P(\mathcal{H}_\text{NDC})}{P(\mathcal{H}_\text{DC})} \\ \Lambda &= \frac{P(\boldsymbol b, \mathcal{H}_\text{NDC} | \boldsymbol A)}{P(\boldsymbol b, \mathcal{H}_\text{DC} | \boldsymbol A)} \\ &= \frac{P(\boldsymbol A, \boldsymbol b | \mathcal{H}_\text{NDC})}{P(\boldsymbol A, \boldsymbol b | \mathcal{H}_\text{DC})}\times\frac{P(\mathcal{H}_\text{NDC})}{P(\mathcal{H}_\text{DC})} \\ &= \exp(-\Delta\Sigma) where :math:\mathcal{H}_\text{NDC} and :math:\mathcal{H}_\text{DC} ... ...
 ... ... @@ -12,22 +12,22 @@ intrinsically assuming it to be uniform), and as a consequence of the overall setup, only *relative probabilities* between individual missing and spurious edges can be produced, instead of the full posterior distribution considered in the last section. Since this limits the overall network reconstruction, and does not yields confidence overall network reconstruction, and does not yield confidence intervals, it is a less powerful approach. Nevertheless, it is a popular procedure, which can also be performed with graph-tool, as we describe in the following. We set up the classification task by dividing the edges/non-edges into two sets :math:\boldsymbol G and :math:\delta \boldsymbol G, where two sets :math:\boldsymbol A and :math:\delta \boldsymbol A, where the former corresponds to the observed network and the latter either to the missing or spurious edges. We may compute the posterior of :math:\delta \boldsymbol G as [valles-catala-consistency-2017]_ :math:\delta \boldsymbol A as [valles-catala-consistency-2017]_ .. math:: :label: posterior-missing P(\delta \boldsymbol G | \boldsymbol G) \propto \sum_{\boldsymbol b}\frac{P(\boldsymbol G \cup \delta\boldsymbol G| \boldsymbol b)}{P(\boldsymbol G| \boldsymbol b)}P(\boldsymbol b | \boldsymbol G) P(\delta \boldsymbol A | \boldsymbol A) \propto \sum_{\boldsymbol b}\frac{P(\boldsymbol A \cup \delta\boldsymbol A| \boldsymbol b)}{P(\boldsymbol A| \boldsymbol b)}P(\boldsymbol b | \boldsymbol A) up to a normalization constant [#prediction_posterior]_. Although the normalization constant is difficult to obtain in general (since we need ... ... @@ -36,15 +36,15 @@ numerator of Eq. :eq:posterior-missing can be computed by sampling partitions from the posterior, and then inserting or deleting edges from the graph and computing the new likelihood. This means that we can easily compare alternative predictive hypotheses :math:\{\delta \boldsymbol G_i\} via their likelihood ratios \boldsymbol A_i\} via their likelihood ratios .. math:: \lambda_i = \frac{P(\delta \boldsymbol G_i | \boldsymbol G)}{\sum_j P(\delta \boldsymbol G_j | \boldsymbol G)} \lambda_i = \frac{P(\delta \boldsymbol A_i | \boldsymbol A)}{\sum_j P(\delta \boldsymbol A_j | \boldsymbol A)} which do not depend on the normalization constant. The values :math:P(\delta \boldsymbol G | \boldsymbol G, \boldsymbol b) The values :math:P(\delta \boldsymbol A | \boldsymbol A, \boldsymbol b) can be computed with :meth:~graph_tool.inference.blockmodel.BlockState.get_edges_prob. Hence, we can compute spurious/missing edge probabilities just as if we were ... ...
 ... ... @@ -245,7 +245,7 @@ Heterogeneous errors In a more general scenario the measurement errors can be different for each node pair, i.e. :math:p_{ij} and :math:q_{ij} are the missing and spurious edge probability for node pair :math:(i,j). The and spurious edge probabilities for node pair :math:(i,j)`. The measurement likelihood then becomes .. math:: ... ...
 ... ... @@ -217,7 +217,7 @@ itself, as follows. bar(Bs[idx], h[idx] / h.sum(), width=1, color="#ccb974") gca().set_xticks([6,7,8,9]) xlabel("$B$") ylabel(r"$P(B|\boldsymbol G)$") ylabel(r"$P(B|\boldsymbol A)$") savefig("lesmis-B-posterior.svg") .. figure:: lesmis-B-posterior.* ... ... @@ -341,7 +341,7 @@ itself, as follows. ax[i].bar(Bs[idx], h_[idx] / h_.sum(), width=1, color="#ccb974") ax[i].set_xticks(Bs[idx]) ax[i].set_xlabel("$B_{%d}$" % i) ax[i].set_ylabel(r"$P(B_{%d}|\boldsymbol G)$" % i) ax[i].set_ylabel(r"$P(B_{%d}|\boldsymbol A)$" % i) locator = MaxNLocator(prune='both', nbins=5) ax[i].yaxis.set_major_locator(locator) tight_layout() ... ...
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!