Your Data Is Leaking: A Hands-On Introduction to Differential Privacy with OpenDP

Shlomi Hod, Marcel Neunhoeffer

Ethics & Privacy
Python Skill Intermediate
Domain Expertise None

Aggregate statistics feel safe to release - just counts, means, and totals, no individual records. But a long history of privacy failures has shown otherwise. From the AOL search data leak to the Netflix Prize re-identification attack to LLM memorization, "anonymized" data has repeatedly revealed more than intended.

Differential privacy offers a different approach: a mathematical framework that quantifies and bounds the information any release reveals about any individual. It has moved from theory to practice in recent years, with deployments at the US Census, Wikimedia, Israel’s national birth registry, Google, Apple, Linkedin and more.

In this tutorial, we provide a hands-on introduction to differential privacy. We'll start by making the problem concrete - executing an attack on aggregate statistics - and then explore how differential privacy addresses it. The focus will be on practical implementation rather than underlying theory.

What You'll Learn

  1. Why traditional anonymization and aggregation fail to protect privacy
  2. The core ideas of differential privacy: what it guarantees, what epsilon means, and when DP is a suitable solution
  3. How to use OpenDP's building blocks
  4. How to build differentially private data analyses using OpenDP's Polars integration
  5. Where to go next: resources for AI/ML with DP, synthetic data, and further learning

Tutorial Outline

Part 1 - The Privacy Problem (20 minutes)

  • Real-world privacy failures (such as AOL search data, Netflix Prize, LLM memorization)
  • Hands-on: execute a reconstruction attack on aggregate statistics
  • Discussion: why traditional approaches fail

Part 2 - Introduction to Differential Privacy (20 minutes)

  • Core ideas: masking the contribution of a single individual through calibrated noise; protection against membership inference attack
  • Learning by doing: exploring DP with OpenDP's building blocks
  • Tuning privacy protection with f-DP; the privacy-utility tradeoff
  • Real-world deployments (such as US Census, Israel birth registry, LinkedIn API)

Part 3 - Data Analysis with OpenDP (40 minutes)

  • OpenDP fundamentals: domains, transformations, measurements, chaining
  • Working with tabular data using OpenDP's Polars integration
  • Building a complete DP data analysis pipeline
  • Revisiting the attack: does it still work?

Part 4 - What's Next (10 minutes)

  • Beyond the basics: AI/ML with differential privacy, synthetic data generation
  • Resources and community
  • Q&A

Prerequisites

  • Python: Comfortable writing functions and working with notebooks
  • Statistics: Basic familiarity with mean, counts, histograms
  • Differential privacy: No prior knowledge required

Materials

Participants will have access to interactive Jupyter notebooks with all code and exercises. Materials will be publicly available after the tutorial.

Shlomi Hod

Shlomi Hod is a researcher at the Weizenbaum Institute. His work focuses on creating tools for the real-world deployment of responsible computing systems, with particular emphasis on differential privacy. He has led workshops on operationalizing Responsible AI for policymakers, regulators, and diplomats across organizations worldwide, including the US Congress and the German Federal Foreign Office. Shlomi recently earned his Computer Science PhD from Boston University and completed an OpenDP fellowship at Harvard University and a one-year research visit at Columbia University during his doctoral studies.

Marcel Neunhoeffer