首页 【论文分享】Pando: Enhanced Data Skipping with Logical Data Partitioning
文章
取消

【论文分享】Pando: Enhanced Data Skipping with Logical Data Partitioning

幻灯片1.JPG

幻灯片2.JPG

Learned Layouts

幻灯片6.JPG

幻灯片12.JPG

幻灯片14.JPG

幻灯片15.JPG

幻灯片18.JPG

幻灯片19.JPG

Qd-tree

幻灯片26.JPG

幻灯片29.JPG

幻灯片34.JPG

MTO

幻灯片42.JPG

幻灯片44.JPG

Pando

幻灯片48.JPG

幻灯片50.JPG

幻灯片52.JPG

幻灯片53.JPG

幻灯片57.JPG

Results

幻灯片62.JPG

*dip: Data-induced predicates is a concept in the database field that uses data statistics to convert predicates on a table into data-induced predicates suitable for joining tables. Doing this can significantly speed up multi-relational queries because the benefits of predicate pushdown can now be applied to tables other than the table with the predicate.

Summary

Pando: metadata-rich data layout framework.

Significant reduction in the amount of I/O performed

  • jointly optimizing the physical layout of the data
  • multiple correlation-aware logical partitionings(not covered)

参考

  1. Yang, Z. et al. 2020. Qd-tree: Learning data layouts for big data analytics. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (2020), 193–208. [paper] [video]

  2. [⭐] Ding, J. et al. 2021. Instance-optimized data layouts for cloud analytics workloads. Proceedings of the 2021 International Conference on Management of Data (2021), 418–431. [paper] [video]

  3. Sudhir, S. et al. 2023. Pando: Enhanced Data Skipping with Logical Data Partitioning. Proceedings of the VLDB Endowment. 16, 9 (2023), 2316–2329. [paper]

本文由作者按照 CC BY 4.0 进行授权

【论文笔记】Optimal column layout for hybrid workloads

【论文笔记】Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design