Skip to main content

MadSystems Seminar

[Talk] Anvil: Verifying Liveness of Cluster Management Controllers

Event Details

Date
Tuesday, October 22, 2024
Time
4 p.m.
Location
Description

Xudong Sun (UIUC) will give an invited talk in person on his recent work Anvil, which received the OSDI '24 Best Paper Award.

Abstract: Cluster managers, such as Kubernetes, are among the most critical systems in modern clouds. Modern cluster managers are architected as a fleet of controllers that manage all the applications, services, and resources in the cloud, thus making controller correctness paramount. We have developed several testing techniques (including Acto for functional testing and Sieve for fault-tolerance testing) for cluster management controllers in the past few years. Besides many serious controller bugs we discovered through testing, we learned valuable lessons that controller bugs have diverse causes such as concurrency, asynchrony, and failures, and often lead to liveness violations. Formal verification is promising for avoiding bugs in system software, but most work so far focused on safety, instead of liveness.

In this talk, I will present Anvil, our effort for building provably correct cluster managers. Anvil is a framework for developing practical controller implementations and verifying that the controllers correctly implement liveness and safety properties. One key challenge is to develop a formal specification that precludes a broad range of controller bugs and is generally applicable to diverse controllers. I will present how we address this challenge with Eventually Stable Reconciliation, a general and powerful specification written as a concise temporal logic liveness property. I will also present how to use Anvil to verify three Kubernetes controllers for managing ZooKeeper, RabbitMQ, and FluentBit, which can readily be deployed in Kubernetes platforms and achieve feature parity and the same performance compared to widely used unverified controllers. The proof effort is reasonable compared to previous work due to Anvil’s rich features for assisting temporal reasoning. Lastly, I will talk about our ongoing work on extending Anvil to verify cross-controller interactions compositionally and reducing the trusted computing base by verifying Kubernetes core controllers using Anvil.

Bio: Xudong Sun is a final year CS PhD student at the University of Illinois Urbana-Champaign, advised by Professor Tianyin Xu. Xudong's research focuses on improving system reliability using systematic testing and formal verification. Xudong will be on the job market looking for research positions in academia and industry in the 2024 - 2025 cycle. https://marshtompsxd.github.io/

Cost
Free

Tags