The core promise of open table formats is engine interoperability with ACID guarantees, mutability, and schema evolution for massive datasets stored on cheap, reliable cloud object storage. Modern data platforms demand far more than just interoperable, analytical batch processing. Engineers now require native support for CDC, incremental processing, streaming workloads, low-latency access, and point lookups - especially for AI-driven applications. Ideally, all of this would be covered by a single, unified solution.
However, Parquet - the foundational format for physically storing much of today’s data - predates both the AI boom and the era of unified batch and streaming systems. Likewise, Iceberg’s original design DNA was firmly rooted in large-scale, batch-oriented analytical workloads. This raises an uncomfortable question: are Parquet and Iceberg truly up to the task?
This talk explores that question through real-world use cases and architectural constraints. While the focus is on conveying key ideas and practical insights, the session is aimed at an intermediate to advanced audience. If you are new to the topic, you may want to watch last year’s episode on Apache Parquet and Delta Lake, which provides a gentle introduction to the fundamentals of open table formats.
After this talk, attendees will:
The Past (10 min)
The Present (15 min)
The Future (15 min)