### Fix problem with and improve BlockState.get_edges_prob()

parent 666ccd25
 ... ... @@ -21,43 +21,44 @@ by inferring _ the model parameters from data. More precisely, given the partition :math:\boldsymbol b = \{b_i\} of the network into :math:B groups, where :math:b_i\in[0,B-1] is the group membership of node :math:i, we define a model that generates a network :math:G with a probability we define a model that generates a network :math:\boldsymbol G with a probability .. math:: :label: model-likelihood P(G|\theta, \boldsymbol b) P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b) where :math:\theta are additional model parameters. Therefore, if we observe a network :math:G, the likelihood that it was generated by a where :math:\boldsymbol\theta are additional model parameters. Therefore, if we observe a network :math:\boldsymbol G, the likelihood that it was generated by a given partition :math:\boldsymbol b is obtained via the Bayesian _ posterior .. math:: :label: model-posterior-sum P(\boldsymbol b | G) = \frac{\sum_{\theta}P(G|\theta, \boldsymbol b)P(\theta, \boldsymbol b)}{P(G)} P(\boldsymbol b | \boldsymbol G) = \frac{\sum_{\boldsymbol\theta}P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol G)} where :math:P(\theta, \boldsymbol b) is the prior likelihood of the where :math:P(\boldsymbol\theta, \boldsymbol b) is the prior likelihood of the model parameters, and .. math:: :label: model-evidence P(G) = \sum_{\theta,\boldsymbol b}P(G|\theta, \boldsymbol b)P(\theta, \boldsymbol b) P(\boldsymbol G) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b) is called the model evidence. The particular types of model that will be considered here have "hard constraints", such that there is only one choice for the remaining parameters :math:\theta that is compatible choice for the remaining parameters :math:\boldsymbol\theta that is compatible with the generated network, such that Eq. :eq:model-posterior-sum simplifies to .. math:: :label: model-posterior P(\boldsymbol b | G) = \frac{P(G|\theta, \boldsymbol b)P(\theta, \boldsymbol b)}{P(G)} P(\boldsymbol b | \boldsymbol G) = \frac{P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b)P(\boldsymbol\theta, \boldsymbol b)}{P(\boldsymbol G)} with :math:\theta above being the only choice compatible with :math:G and :math:\boldsymbol b. The inference procedures considered with :math:\boldsymbol\theta above being the only choice compatible with :math:\boldsymbol G and :math:\boldsymbol b. The inference procedures considered here will consist in either finding a network partition that maximizes Eq. :eq:model-posterior, or sampling different partitions according its posterior probability. ... ... @@ -73,22 +74,22 @@ We note that Eq. :eq:model-posterior can be written as .. math:: P(\boldsymbol b | G) = \frac{\exp(-\Sigma)}{P(G)} P(\boldsymbol b | \boldsymbol G) = \frac{\exp(-\Sigma)}{P(\boldsymbol G)} where .. math:: :label: model-dl \Sigma = -\ln P(G|\theta, \boldsymbol b) - \ln P(\theta, \boldsymbol b) \Sigma = -\ln P(\boldsymbol G|\boldsymbol\theta, \boldsymbol b) - \ln P(\boldsymbol\theta, \boldsymbol b) is called the **description length** of the network :math:G. It is called the **description length** of the network :math:\boldsymbol G. It measures the amount of information _ required to describe the data, if we encode _ it using the particular parametrization of the generative model given by :math:\theta and :math:\boldsymbol b, as well as the parameters :math:\boldsymbol\theta and :math:\boldsymbol b, as well as the parameters themselves. Therefore, if we choose to maximize the posterior likelihood of Eq. :eq:model-posterior it will be fully equivalent to the so-called minimum description length ... ... @@ -109,7 +110,7 @@ The stochastic block model the simplest generative process based on the notion of groups of nodes [holland-stochastic-1983]_. The microcanonical _ formulation [peixoto-entropy-2012]_ of the basic or "traditional" version takes [peixoto-nonparametric-2016]_ of the basic or "traditional" version takes as parameters the partition of the nodes into groups :math:\boldsymbol b and a :math:B\times B matrix of edge counts :math:\boldsymbol e, where :math:e_{rs} is the number of edges ... ... @@ -181,7 +182,7 @@ degree distributions. A better model for such networks is called the it is defined just like the traditional model, with the addition of the degree sequence :math:\boldsymbol k = \{k_i\} of the graph as an additional set of parameters (assuming again a microcanonical formulation [peixoto-entropy-2012]_). formulation [peixoto-nonparametric-2016]_). The nested stochastic block model ... ... @@ -458,14 +459,13 @@ case of the C. elegans network we have Since it yields the smallest description length, the degree-corrected fit should be preferred. The statistical significance of the choice can be accessed by inspecting the posterior odds ratio (or more precisely, the Bayes factor _) [peixoto-model-2016]_ be accessed by inspecting the posterior odds ratio [peixoto-nonparametric-2016]_ .. math:: \Lambda &= \frac{P(\boldsymbol b, \mathcal{H}_\text{NDC} | G)}{P(\boldsymbol b, \mathcal{H}_\text{DC} | G)} \\ &= \frac{P(G, \boldsymbol b | \mathcal{H}_\text{NDC})}{P(G, \boldsymbol b | \mathcal{H}_\text{DC})}\times\frac{P(\mathcal{H}_\text{NDC})}{P(\mathcal{H}_\text{DC})} \\ \Lambda &= \frac{P(\boldsymbol b, \mathcal{H}_\text{NDC} | \boldsymbol G)}{P(\boldsymbol b, \mathcal{H}_\text{DC} | \boldsymbol G)} \\ &= \frac{P(\boldsymbol G, \boldsymbol b | \mathcal{H}_\text{NDC})}{P(\boldsymbol G, \boldsymbol b | \mathcal{H}_\text{DC})}\times\frac{P(\mathcal{H}_\text{NDC})}{P(\mathcal{H}_\text{DC})} \\ &= \exp(-\Delta\Sigma) where :math:\mathcal{H}_\text{NDC} and :math:\mathcal{H}_\text{DC} ... ... @@ -733,7 +733,7 @@ itself, as follows. bar(Bs[idx] - .5, h[idx] / h.sum(), width=1, color="#ccb974") gca().set_xticks([6,7,8,9]) xlabel("$B$") ylabel("$P(B|G)$") ylabel(r"$P(B|\boldsymbol G)$") savefig("lesmis-B-posterior.svg") .. figure:: lesmis-B-posterior.* ... ... @@ -858,14 +858,14 @@ itself, as follows. :hide: figure() f, ax = plt.subplots(1, 5, figsize=(10, 3/2)) f, ax = plt.subplots(1, 5, figsize=(10, 3)) for i, h_ in enumerate(h[:5]): Bs = np.arange(len(h_)) idx = h_ > 0 ax[i].bar(Bs[idx] - .5, h_[idx] / h_.sum(), width=1, color="#ccb974") ax[i].set_xticks(Bs[idx]) ax[i].set_xlabel("$B_{%d}$" % i) ax[i].set_ylabel("$P(B_{%d}|G)$" % i) ax[i].set_ylabel(r"$P(B_{%d}|\boldsymbol G)$" % i) locator = MaxNLocator(prune='both', nbins=5) ax[i].yaxis.set_major_locator(locator) tight_layout() ... ... @@ -913,12 +913,13 @@ Model class selection +++++++++++++++++++++ When averaging over partitions, we may be interested in evaluating which **model class** provides a better fit of the data, considering all possible parameter choices. This is done by evaluating the model evidence **model class** provides a better fit of the data, considering all possible parameter choices. This is done by evaluating the model evidence [peixoto-nonparametric-2016]_ .. math:: P(G) = \sum_{\theta,\boldsymbol b}P(G,\theta, \boldsymbol b) = \sum_{\boldsymbol b}P(G,\boldsymbol b). P(\boldsymbol G) = \sum_{\boldsymbol\theta,\boldsymbol b}P(\boldsymbol G,\boldsymbol\theta, \boldsymbol b) = \sum_{\boldsymbol b}P(\boldsymbol G,\boldsymbol b). This quantity is analogous to a partition function _ ... ... @@ -930,14 +931,14 @@ its logarithm .. math:: :label: free-energy \ln P(G) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(G,\boldsymbol b)}_{-\left<\Sigma\right>}\; \ln P(\boldsymbol G) = \underbrace{\sum_{\boldsymbol b}q(\boldsymbol b)\ln P(\boldsymbol G,\boldsymbol b)}_{-\left<\Sigma\right>}\; \underbrace{- \sum_{\boldsymbol b}q(\boldsymbol b)\ln q(\boldsymbol b)}_{\mathcal{S}} where .. math:: q(\boldsymbol b) = \frac{P(G,\boldsymbol b)}{\sum_{\boldsymbol b'}P(G,\boldsymbol b')} q(\boldsymbol b) = \frac{P(\boldsymbol G,\boldsymbol b)}{\sum_{\boldsymbol b'}P(\boldsymbol G,\boldsymbol b')} is the posterior likelihood of partition :math:\boldsymbol b. The first term of Eq. :eq:free-energy (the "negative energy") is minus the ... ... @@ -976,7 +977,7 @@ where .. math:: q_i(r) = P(b_i = r | G) q_i(r) = P(b_i = r | \boldsymbol G) is the marginal group membership distribution of node :math:i. This yields an entropy value given by ... ... @@ -1003,7 +1004,7 @@ degree of node :math:i, and .. math:: q_{ij}(r, s) = P(b_i = r, b_j = s|G) q_{ij}(r, s) = P(b_i = r, b_j = s|\boldsymbol G) is the joint group membership distribution of nodes :math:i and :math:j (a.k.a. the edge marginals). This yields an entropy value ... ... @@ -1240,43 +1241,41 @@ situation, the fit we make of the observed network can help us predict missing or spurious edges in the network [clauset-hierarchical-2008]_ [guimera-missing-2009]_. We do so by dividing the edges into two sets :math:G and :math:\delta G, where the former corresponds to the observed network and the latter We do so by dividing the edges into two sets :math:\boldsymbol G and :math:\delta \boldsymbol G, where the former corresponds to the observed network and the latter either to the missing or spurious edges. In the case of missing edges, we may compute the posterior of :math:\delta G as we may compute the posterior of :math:\delta \boldsymbol G as .. math:: :label: posterior-missing P(\delta G | G) = \frac{\sum_{\boldsymbol b}P(G\cup\delta G | \boldsymbol b)P(\boldsymbol b | G)}{P(G)} P(\delta \boldsymbol G | \boldsymbol G) = \frac{\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)}{P_{\delta}(\boldsymbol G)} where .. math:: P(G) = \sum_{\delta G}\sum_{\boldsymbol b}P(G\cup\delta G | \boldsymbol b)P(\boldsymbol b | G) P_{\delta}(\boldsymbol G) = \sum_{\delta \boldsymbol G}\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b)P(\boldsymbol b | \boldsymbol G) is a normalization constant. For spurious edges, we have analogous expressions by replacing :math:P(G\cup\delta G | \boldsymbol b) by :math:P(G\setminus\delta G | \boldsymbol b). Although the value of :math:P(G) is difficult to obtain in general (since we need to perform a sum over all possible spurious/missing edges), the numerator of is a normalization constant. Although the value of :math:P_{\delta}(\boldsymbol G) is difficult to obtain in general (since we need to perform a sum over all possible spurious/missing edges), the numerator of Eq. :eq:posterior-missing can be computed by sampling partitions from the posterior, and then inserting or deleting edges from the graph and computing the new likelihood. This means that we can easily compare alternative predictive hypotheses :math:\{\delta G_i\} via their alternative predictive hypotheses :math:\{\delta \boldsymbol G_i\} via their likelihood ratios .. math:: \lambda_i = \frac{P(\delta G_i | G)}{\sum_j P(\delta G_j | G)} = \frac{\sum_{\boldsymbol b}P(G\cup\delta G_i | \boldsymbol b)P(\boldsymbol b | G)} {\sum_j \sum_{\boldsymbol b}P(G\cup\delta G_j | \boldsymbol b)P(\boldsymbol b | G)} \lambda_i = \frac{P(\delta \boldsymbol G_i | \boldsymbol G)}{\sum_j P(\delta \boldsymbol G_j | \boldsymbol G)} = \frac{\sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G_i | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)} {\sum_j \sum_{\boldsymbol b}P(\boldsymbol G+\delta \boldsymbol G_j | \boldsymbol b)P(\boldsymbol b | \boldsymbol G)} which do not depend on the value of :math:P(G). which do not depend on the value of :math:P_{\delta}(\boldsymbol G). The values :math:P(G\cup\delta G | \boldsymbol b) and :math:P(G\setminus\delta G | \boldsymbol b) can be computed with The values :math:P(\boldsymbol G+\delta \boldsymbol G | \boldsymbol b) can be computed with :meth:~graph_tool.inference.BlockState.get_edges_prob. Hence, we can compute spurious/missing edge probabilities just as if we were collecting marginal distributions when doing model averaging. ... ... @@ -1338,8 +1337,8 @@ above). probs = ([], []) def collect_edge_probs(s): p1 = s.get_edges_prob([missing_edges], missing=True, entropy_args=dict(partition_dl=False)) p2 = s.get_edges_prob([missing_edges], missing=True, entropy_args=dict(partition_dl=False)) p1 = s.get_edges_prob([missing_edges], entropy_args=dict(partition_dl=False)) p2 = s.get_edges_prob([missing_edges], entropy_args=dict(partition_dl=False)) probs.append(p1) probs.append(p2) ... ... @@ -1368,8 +1367,8 @@ above). .. testoutput:: missing-edges likelihood-ratio for (101, 102): 0.36398 likelihood-ratio for (17, 56): 0.63602 likelihood-ratio for (101, 102): 0.365264 likelihood-ratio for (17, 56): 0.634736 From which we can conclude that edge :math:(17, 56) is around twice as likely as :math:(101, 102) to be a missing edge. ... ... @@ -1388,9 +1387,9 @@ References blockmodels and community structure in networks", Phys. Rev. E 83, 016107 (2011), :doi:10.1103/PhysRevE.83.016107, :arxiv:1008.3926 .. [peixoto-entropy-2012] Tiago P. Peixoto, "Entropy of stochastic blockmodel ensembles", Phys. Rev. E 85 5 056122 (2012), :doi:10.1103/PhysRevE.85.056122, :arxiv:1112.6028 .. [peixoto-nonparametric-2016] Tiago P. Peixoto, "Nonparametric Bayesian inference of the microcanonical stochastic block model" :arxiv:1610.02703 .. [peixoto-parsimonious-2013] Tiago P. Peixoto, "Parsimonious module inference in large networks", Phys. Rev. Lett. 110, 148701 (2013), ... ...
 ... ... @@ -33,6 +33,7 @@ from numpy import * import numpy import copy import collections import itertools from . util import * ... ... @@ -977,84 +978,91 @@ class BlockState(object): else: return self._state.get_move_prob(int(v), s, self.b[v], c, True) def get_edges_prob(self, edge_list, missing=True, entropy_args={}): """Compute the unnormalized log-probability of the missing (or spurious if missing=False) edges given by edge_list (a list of (source, target) tuples, or :meth:~graph_tool.Edge instances). The values in entropy_args are passed to :meth:graph_tool.BlockState.entropy() to calculate the log-probability. def get_edges_prob(self, missing, spurious=[], entropy_args={}): """Compute the joint log-probability of the missing and spurious edges given by missing and spurious (a list of (source, target) tuples, or :meth:~graph_tool.Edge instances), together with the observed edges. More precisely, the log-likelihood returned is .. math:: \ln P(\boldsymbol G + \delta \boldsymbol G | \boldsymbol b) where :math:\boldsymbol G + \delta \boldsymbol G is the modified graph (with missing edges added and spurious edges deleted). The values in entropy_args are passed to :meth:graph_tool.BlockState.entropy() to calculate the log-probability. """ pos = {} for u, v in edge_list: for u, v in itertools.chain(missing, spurious): pos[u] = self.b[u] pos[v] = self.b[v] Si = self.entropy(**entropy_args) self.remove_vertex(pos.keys()) try: if missing: new_es = [] for u, v in edge_list: if not self.is_weighted: new_es = [] for u, v in missing: if not self.is_weighted: e = self.g.add_edge(u, v) else: e = self.g.edge(u, v) if e is None: e = self.g.add_edge(u, v) else: e = self.g.edge(u, v) if e is None: e = self.g.add_edge(u, v) self.eweight[e] = 0 self.eweight[e] += 1 new_es.append(e) else: old_es = [] for e in edge_list: u, v = e if isinstance(e, tuple): e = self.g.edge(u, v) if e is None: raise ValueError("edge not found: (%d, %d)" % (int(u), int(v))) if self.is_weighted: self.eweight[e] -= 1 if self.eweight[e] == 0: self.g.remove_edge(e) else: self.eweight[e] = 0 self.eweight[e] += 1 new_es.append(e) old_es = [] for e in spurious: u, v = e if isinstance(e, tuple): e = self.g.edge(u, v) if e is None: raise ValueError("edge not found: (%d, %d)" % (int(u), int(v))) if self.is_weighted: self.eweight[e] -= 1 if self.eweight[e] == 0: self.g.remove_edge(e) old_es.append((u, v)) else: self.g.remove_edge(e) old_es.append((u, v)) self.add_vertex(pos.keys(), pos.values()) Sf = self.entropy(**entropy_args) Sf = self.entropy(**overlay(dict(partition_dl=False), **entropy_args)) self.remove_vertex(pos.keys()) finally: if missing: if self.is_weighted: for e in reversed(new_es): self.eweight[e] -= 1 if self.eweight[e] == 0: self.g.remove_edge(e) else: for e in reversed(new_es): if self.is_weighted: for e in reversed(new_es): self.eweight[e] -= 1 if self.eweight[e] == 0: self.g.remove_edge(e) else: for u, v in old_es: if self.is_weighted: e = self.g.edge(u, v) if e is None: e = self.g.add_edge(u, v) self.eweight[e] = 0 self.eweight[e] += 1 else: self.g.add_edge(u, v) for e in reversed(new_es): self.g.remove_edge(e) for u, v in old_es: if self.is_weighted: e = self.g.edge(u, v) if e is None: e = self.g.add_edge(u, v) self.eweight[e] = 0 self.eweight[e] += 1 else: self.g.add_edge(u, v) self.add_vertex(pos.keys(), pos.values()) L = Si - Sf L = -Sf if _bm_test(): state = self.copy() ... ...
 ... ... @@ -644,16 +644,28 @@ class LayeredBlockState(OverlapBlockState, BlockState): u = self.vmap[v][i] return u def get_edges_prob(self, edge_list, missing=True, entropy_args={}): """Compute the log-probability of the missing (or spurious if missing=False) edges given by edge_list (a list of (source, target, ec) tuples, or :meth:~graph_tool.Edge instances). The values in entropy_args are passed to :meth:graph_tool.LayeredBlockState.entropy() to calculate the def get_edges_prob(self, missing, spurious=[], entropy_args={}): """Compute the joint log-probability of the missing and spurious edges given by missing and spurious (a list of (source, target, layer) tuples, or :meth:~graph_tool.Edge instances), together with the observed edges. More precisely, the log-likelihood returned is .. math:: \ln P(\boldsymbol G + \delta \boldsymbol G | \boldsymbol b) where :math:\boldsymbol G + \delta \boldsymbol G is the modified graph (with missing edges added and spurious edges deleted). The values in entropy_args are passed to :meth:graph_tool.BlockState.entropy() to calculate the log-probability. """ pos = {} nes = [] for e in edge_list: for e in itertools.chain(missing, spurious): try: u, v = e l = self.ec[e] ... ... @@ -669,87 +681,83 @@ class LayeredBlockState(OverlapBlockState, BlockState): edge_list = nes Si = self.entropy(**entropy_args) self.remove_vertex(pos.keys()) agg_state = self.agg_state try: if missing: new_es = [] for u, v, l in edge_list: if not l: state = self.agg_state else: state = self.layer_states[l] e = state.g.add_edge(u, v) if not l: self.ec[e] = l if state.is_weighted: state.eweight[e] = 1 new_es.append((e, l)) else: old_es = [] for u, v, l in edge_list: if not l: state = self.agg_state es = state.g.edge(u, v, all_edges=True) es = [e for e in es if self.ec[e] == l] if len(es) > 0: e = es else: e = None else: state = self.layer_states[l] e = state.g.edge(u, v) if e is None: raise ValueError("edge not found: (%d, %d, %d)" % \ (int(u), int(v), l)) if state.is_weighted: staete.eweight[e] -= 1 if state.eweight[e] == 0: state.g.remove_edge(e) new_es = [] for u, v, l in missing: if not l: state = self.agg_state else: state = self.layer_states[l] e = state.g.add_edge(u, v) if not l: self.ec[e] = l if state.is_weighted: state.eweight[e] = 1 new_es.append((e, l)) old_es = [] for u, v, l in spurious: if not l: state = self.agg_state es = state.g.edge(u, v, all_edges=True) es = [e for e in es if self.ec[e] == l] if len(es) > 0: e = es else: e = None else: state = self.layer_states[l] e = state.g.edge(u, v) if e is None: raise ValueError("edge not found: (%d, %d, %d)" % \ (int(u), int(v), l)) if state.is_weighted: staete.eweight[e] -= 1 if state.eweight[e] == 0: state.g.remove_edge(e) old_es.append((u, v, l)) else: state.g.remove_edge(e) old_es.append((u, v, l)) self.add_vertex(pos.keys(), pos.values()) Sf = self.entropy(**entropy_args) Sf = self.entropy(**overlay(dict(partition_dl=False), **entropy_args)) self.remove_vertex(pos.keys()) finally: if missing: for e, l in new_es: if not l: state = self.agg_state else: state = self.layer_states[l] state.g.remove_edge(e) else: for u, v, l in old_es