Open Table Formats in the Wild™ - Reloaded: Vortexing Ducks over Floating Icebergs

Franz Wöllert

Data Handling & Data Engineering
Python Skill Novice
Domain Expertise Intermediate

Description

The core promise of open table formats is engine interoperability with ACID guarantees, mutability, and schema evolution for massive datasets stored on cheap, reliable cloud object storage. Modern data platforms demand far more than just interoperable, analytical batch processing. Engineers now require native support for CDC, incremental processing, streaming workloads, low-latency access, and point lookups - especially for AI-driven applications. Ideally, all of this would be covered by a single, unified solution.

However, Parquet - the foundational format for physically storing much of today’s data - predates both the AI boom and the era of unified batch and streaming systems. Likewise, Iceberg’s original design DNA was firmly rooted in large-scale, batch-oriented analytical workloads. This raises an uncomfortable question: are Parquet and Iceberg truly up to the task?

This talk explores that question through real-world use cases and architectural constraints. While the focus is on conveying key ideas and practical insights, the session is aimed at an intermediate to advanced audience. If you are new to the topic, you may want to watch last year’s episode on Apache Parquet and Delta Lake, which provides a gentle introduction to the fundamentals of open table formats.

Takeaways

After this talk, attendees will:

  • Understand why incremental processing is not a native concept in Apache Iceberg
  • Recognize how Iceberg’s metadata model creates hard limits for low-latency streaming workloads
  • Learn why Parquet’s physical layout becomes a bottleneck for point lookups and AI-driven access patterns
  • Get an early look at DuckLake and Vortex as emerging alternatives

Agenda

The Past (10 min)

  • Rationale - The Idealized Model
  • Implications - The Engineering Trade-offs

The Present (15 min)

  • Incremental Processing - The Missing Primitive
  • Streaming Workloads - The Batch Inheritance
  • AI Applications & Point Lookups - The Access Wall

The Future (15 min)

  • DuckLake - The Return of Relational Databases
  • Vortex - The Parquet of Tomorrow

Franz Wöllert

Hi my name is Franz and I’m an open source and python enthuisiast:

  • father of 3 girls
  • major in psychology
  • chess hobbiyst
  • former competitive ultimate frisbee player
  • likes cooking and baking sourdough bread