If you’ve ever designed or used SQL databases in your data science projects perhaps you’ve cringed at the lack of relational structure and data duplication in the design of big data storage and processing. On the other hand, if you’ve spent any considerable time getting dirty with Polars’ vectorized and columnar processing, you’ll also know that this can be somewhat of a moot point. So why bother?
Outline of the talk:
5 minutes: Introduction & origin story. What are Polars nested types? How do they work? Why do they matter? 5 minutes: Back to the future. Advanced queries on nested types, past & present. 5 minutes: Query structure - “Group by” forever baby, versus element-wise. 5 minutes: Storage comparison and the gigabyte scrooge - how a miser decides on a nested Polars structure. 5 minutes: Time is money – How performance stacks up. 5 minutes: Q&A
By the end of the talk, participants will have seen several straightforward examples, as well more advanced illustrations of nested structures in Polars using real-world data. They will be able to identify some key considerations informing their use of nested structures, including query logic, storage and performance.