Nearly-optimal prediction of missing links in networks presented by Aaron Clauset
Abstract: Predicting missing links in networks is a fundamental task in network analysis and modeling. However, current link prediction algorithms exhibit wide variations in their accuracy, and we lack a general understanding of which methods work better in which contexts. In this talk, I'll describe a novel meta-learning solution to this problem, which makes predictions that appear to be nearly optimal by learning to combine three classes of prediction methods: community detection algorithms, structural features like degrees and triangles, and network embeddings. We evaluate 203 component methods individually and in stacked generalization on (i) synthetic data with known structure, for which we analytically calculate the optimal link prediction performance, and (ii) a large corpus of 550 structurally diverse networks from social, biological, technological, information, economic, and transportation domains. Across settings, supervised stacking nearly always performs best and produces nearly-optimal performance on synthetic networks. Moreover, we show that accuracy saturates quickly, and near-optimal predictions typically requires only a handful of component methods. Applied to real data, we quantify the utility of each method on different types of networks, and then show that the difficulty of predicting missing links varies considerably across domains: it is easiest in social networks and hardest in technological networks. I'll close with forward-looking comments on the limits of predictability for missing links in complex networks and on the utility of stacked generalizations for achieving them.