【论文分享】Pando: Enhanced Data Skipping with Logical Data Partitioning

发表于 2023-09-23

作者 Yushu

1 分钟阅读

幻灯片1.JPG

幻灯片2.JPG

Learned Layouts

幻灯片6.JPG

幻灯片12.JPG

幻灯片14.JPG

幻灯片15.JPG

幻灯片18.JPG

幻灯片19.JPG

Qd-tree

幻灯片26.JPG

幻灯片29.JPG

幻灯片34.JPG

MTO

幻灯片42.JPG

幻灯片44.JPG

Pando

幻灯片48.JPG

幻灯片50.JPG

幻灯片52.JPG

幻灯片53.JPG

幻灯片57.JPG

Results

幻灯片62.JPG

*dip: Data-induced predicates is a concept in the database field that uses data statistics to convert predicates on a table into data-induced predicates suitable for joining tables. Doing this can significantly speed up multi-relational queries because the benefits of predicate pushdown can now be applied to tables other than the table with the predicate.

Summary

Pando: metadata-rich data layout framework.

Significant reduction in the amount of I/O performed

jointly optimizing the physical layout of the data
multiple correlation-aware logical partitionings(not covered)

参考

Yang, Z. et al. 2020. Qd-tree: Learning data layouts for big data analytics. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (2020), 193–208. [paper] [video]
[⭐] Ding, J. et al. 2021. Instance-optimized data layouts for cloud analytics workloads. Proceedings of the 2021 International Conference on Management of Data (2021), 418–431. [paper] [video]
Sudhir, S. et al. 2023. Pando: Enhanced Data Skipping with Logical Data Partitioning. Proceedings of the VLDB Endowment. 16, 9 (2023), 2316–2329. [paper]

【论文分享】Pando: Enhanced Data Skipping with Logical Data Partitioning

Learned Layouts

Qd-tree

MTO

Pando

Results

Summary

参考

相关文章

【论文笔记】Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design

【论文笔记】Building An Elastic Query Engine on Disaggregated Storage

【论文笔记】Automatic Database Knob Tuning: A Survey