Description

The core promise of open table formats is engine interoperability with ACID guarantees, mutability, and schema evolution for massive datasets stored on cheap, reliable cloud object storage. Modern data platforms demand far more than just interoperable, analytical batch processing. Engineers now require native support for CDC, incremental processing, streaming workloads, low-latency access, and point lookups - especially for AI-driven applications. Ideally, all of this would be covered by a single, unified solution.

However, Parquet - the foundational format for physically storing much of today’s data - predates both the AI boom and the era of unified batch and streaming systems. Likewise, Iceberg’s original design DNA was firmly rooted in large-scale, batch-oriented analytical workloads. This raises an uncomfortable question: are Parquet and Iceberg truly up to the task?

This talk explores that question through real-world use cases and architectural constraints. While the focus is on conveying key ideas and practical insights, the session is aimed at an intermediate to advanced audience. If you are new to the topic, you may want to watch last year’s episode on Apache Parquet and Delta Lake, which provides a gentle introduction to the fundamentals of open table formats.

Takeaways

After this talk, attendees will:

Understand why incremental processing is not a native concept in Apache Iceberg
Recognize how Iceberg’s metadata model creates hard limits for low-latency streaming workloads
Learn why Parquet’s physical layout becomes a bottleneck for point lookups and AI-driven access patterns
Get an early look at DuckLake and Vortex as emerging alternatives

Agenda

The Past (10 min)

Rationale - The Idealized Model
Implications - The Engineering Trade-offs

The Present (15 min)

Incremental Processing - The Missing Primitive
Streaming Workloads - The Batch Inheritance
AI Applications & Point Lookups - The Access Wall

The Future (15 min)

DuckLake - The Return of Relational Databases
Vortex - The Parquet of Tomorrow

Franz Wöllert

Hi my name is Franz and I’m an open source and python enthuisiast:

father of 3 girls
major in psychology
chess hobbiyst
former competitive ultimate frisbee player
likes cooking and baking sourdough bread

Open Table Formats in the Wild™ - Reloaded: Vortexing Ducks over Floating Icebergs

Franz Wöllert

Description

Takeaways

Agenda

Franz Wöllert