Systems Seminar: Scaling PureStorage All-Flash Distributed Storage System by 5x
Event Details
Building a successful enterprise-grade distributed storage system requires deep understanding of both computer science fundamentals and business objectives. At Pure, we build on decades of academic research into topics such as data redundancy, concurrency, distributed coordination, fault-tolerance, just to name a few. Additionally, we often solve engineering problems that come from our particular business requirements, such as elastic scaling, maintenance without downtime and operational simplicity. Recently, we have scaled our storage system by a factor of 5, from 700TB to 3PB of storage, and 17GB/s to 75GB/s of bandwidth. In this presentation, we will talk about our journey to a Petabyte scale, focusing on early architectural decisions that enabled the linear scaling, and discuss some interesting engineering challenges that we had to overcome along the way. Bio: Oksana Aguilera is a software engineer at Pure Storage, working on a distributed storage system that utilizes flash as its storage medium. Prior to coming to Pure, Oksana did academic research in theory of distributed computing. She obtained her PhD from the University of Lisbon in 2014, followed by 2 years of postdoctoral studies at the University of Calgary. Oksana's research was published in venues such as PODC, DISC, and ICDCS. She also served on the program committees of PODC, DISC, OPODIS, and ICDCN.