Skip to main content

Automatic Clustering at Snowflake

Jiaqi Yan

Event Details

Date
Wednesday, November 11, 2020
Time
1-2 p.m.
Location
Description

Snowflake is a database built on top of major cloud computing platforms. Snowflake provides reliable data storage at a low cost, and makes it easy to load large volumes of data, and it is easy for customers to create very large tables for data analytics workloads. To speed up query processing on large tables, Snowflake automatically partitions incoming data and uses zonemap metadata for pruning. For partitioned tables, maintaining good clustering is critical for query performance. However, both the size and volume of data ingestion presents challenges for efficient clustering maintenance on large tables where naive approaches could be prohibitively expensive. In this talk I will introduce Snowflake's approach for automatically maintaining clustering on clustered tables with DMLs. I will focus on our incremental approximate clustering mechanisms that lower the cost of clustering maintenance while ensuring good query performance. I will also dive into our service-oriented approach which significantly simplifies the task of performance tuning and reduces the management overhead.

Cost
Free

Tags