Automatic Clustering at Snowflake

Jiaqi Yan

Date

Wednesday, November 11, 2020

Time

1-2 p.m.

Location

Online

Description

Snowflake is a database built on top of major cloud computing platforms. Snowflake provides reliable data storage at a low cost, and makes it easy to load large volumes of data, and it is easy for customers to create very large tables for data analytics workloads. To speed up query processing on large tables, Snowflake automatically partitions incoming data and uses zonemap metadata for pruning. For partitioned tables, maintaining good clustering is critical for query performance. However, both the size and volume of data ingestion presents challenges for efficient clustering maintenance on large tables where naive approaches could be prohibitively expensive. In this talk I will introduce Snowflake's approach for automatically maintaining clustering on clustered tables with DMLs. I will focus on our incremental approximate clustering mechanisms that lower the cost of clustering maintenance while ensuring good query performance. I will also dive into our service-oriented approach which significantly simplifies the task of performance tuning and reduces the management overhead.

Cost

Free

Calendar

Click a date to see events on that day.

		April
S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Automatic Clustering at Snowflake

Tags

Calendar

Search

Categories

Browse events by tag

Automatic Clustering at Snowflake

Event Details

Tags

Calendar

View events by date

Search

Search for events

Categories

Browse events by tag