To nest, or not to nest? Nested data types in Polars with big data

Daniel Finnan

Data Handling & Data Engineering
Python Skill Intermediate
Domain Expertise Intermediate

If you’ve ever designed or used SQL databases in your data science projects perhaps you’ve cringed at the lack of relational structure and data duplication in the design of big data storage and processing. On the other hand, if you’ve spent any considerable time getting dirty with Polars’ vectorized and columnar processing, you’ll also know that this can be somewhat of a moot point. So why bother?

Outline of the talk:

5 minutes: Introduction & origin story. What are Polars nested types? How do they work? Why do they matter? 5 minutes: Back to the future. Advanced queries on nested types, past & present. 5 minutes: Query structure - “Group by” forever baby, versus element-wise. 5 minutes: Storage comparison and the gigabyte scrooge - how a miser decides on a nested Polars structure. 5 minutes: Time is money – How performance stacks up. 5 minutes: Q&A

By the end of the talk, participants will have seen several straightforward examples, as well more advanced illustrations of nested structures in Polars using real-world data. They will be able to identify some key considerations informing their use of nested structures, including query logic, storage and performance.

Daniel Finnan

Daniel Finnan is a 2nd year PhD candidate at the Lirsa laboratory, Conservatoire national des arts et métiers (CNAM), in Paris. His thesis focuses on decentralized finance, specifically decentralized exchanges, applying a quantitative methodology using blockchain data, techniques in data science, and time series econometrics. He codes in Python, R, and occasionally Rust and JavaScript, specifically using Python to manage data pipelines. He has a professional certification in full-stack development and holds a Master’s degree in Economics, with a specialization in Economic, Digital and Data strategies from CNAM’s department of Economics, Finance, Insurance and Banking.