【论文笔记】Building An Elastic Query Engine on Disaggregated Storage

发表于 2023-10-13

作者 Yushu

1 分钟阅读

Brief

Snowflake decouple compute and persistent storage, providing warehousing as a service. It is in production for over 7 years.

Shared-nothing cause hardware-workload mismatch

Design

Architecture

Intermediate data is generated by query operators (e.g., joins) and is usually consumed by nodes participating in executing that query. It is short-lived, and prefer low-latency high-throughput access.

As nodes are added and removed, the ephemeral storage system does not require data repartitioning or reshuffling. Tasks executing query operations (e.g., joins) on a given compute node write intermediate data locally; and, tasks consuming the intermediate data read it either locally or remotely over the network (depending on the node where the task is scheduled).

Ephemeral storage system

Elasticity:

Persistent storage - easy, offloaded to S3
Compute - easy, pre-warmed pool of VMs
Ephemeral storage - challenging, due to co-location with compute

Lazy consistent hashing - elasticity without data re-shuffle

Key finding:

Customers submit a wide variety of query types.
Intermediate data sizes can vary over multiple orders of magnitude across queries.
Even with a small amount of local storage capacity, skewed access distributions and temporal access patterns common in data warehouses enable reasonably high average cache hit rates (60-80% depending on the type of query) for persistent data accesses.
Several of our customers exploit our support for elasticity (for 20% of the clusters).
Peak resource utilization can be high, the average resource utilization is usually low.

Reference

M. Vuppalapati, J. Miron, R. Agarwal, D. Truong, A. Motivala, and T. Cruanes, “Building an elastic query engine on disaggregated storage,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020, pp. 449–462. Accessed: Oct. 02, 2023. [paper] [video]

【论文笔记】Building An Elastic Query Engine on Disaggregated Storage

Brief

Design

Reference

相关文章

【论文分享】Pando: Enhanced Data Skipping with Logical Data Partitioning

【论文笔记】Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design

【论文笔记】Automatic Database Knob Tuning: A Survey