Aggregate statistics feel safe to release - just counts, means, and totals, no individual records. But a long history of privacy failures has shown otherwise. From the AOL search data leak to the Netflix Prize re-identification attack to LLM memorization, "anonymized" data has repeatedly revealed more than intended.

Differential privacy offers a different approach: a mathematical framework that quantifies and bounds the information any release reveals about any individual. It has moved from theory to practice in recent years, with deployments at the US Census, Wikimedia, Israel’s national birth registry, Google, Apple, Linkedin and more.

In this tutorial, we provide a hands-on introduction to differential privacy. We'll start by making the problem concrete - executing an attack on aggregate statistics - and then explore how differential privacy addresses it. The focus will be on practical implementation rather than underlying theory.

What You'll Learn

Why traditional anonymization and aggregation fail to protect privacy
The core ideas of differential privacy: what it guarantees, what epsilon means, and when DP is a suitable solution
How to use OpenDP's building blocks
How to build differentially private data analyses using OpenDP's Polars integration
Where to go next: resources for AI/ML with DP, synthetic data, and further learning

Tutorial Outline

Part 1 - The Privacy Problem (20 minutes)

Real-world privacy failures (such as AOL search data, Netflix Prize, LLM memorization)
Hands-on: execute a reconstruction attack on aggregate statistics
Discussion: why traditional approaches fail

Part 2 - Introduction to Differential Privacy (20 minutes)

Core ideas: masking the contribution of a single individual through calibrated noise; protection against membership inference attack
Learning by doing: exploring DP with OpenDP's building blocks
Tuning privacy protection with f-DP; the privacy-utility tradeoff
Real-world deployments (such as US Census, Israel birth registry, LinkedIn API)

Part 3 - Data Analysis with OpenDP (40 minutes)

OpenDP fundamentals: domains, transformations, measurements, chaining
Working with tabular data using OpenDP's Polars integration
Building a complete DP data analysis pipeline
Revisiting the attack: does it still work?

Part 4 - What's Next (10 minutes)

Beyond the basics: AI/ML with differential privacy, synthetic data generation
Resources and community
Q&A

Prerequisites

Python: Comfortable writing functions and working with notebooks
Statistics: Basic familiarity with mean, counts, histograms
Differential privacy: No prior knowledge required

Materials

Participants will have access to interactive Jupyter notebooks with all code and exercises. Materials will be publicly available after the tutorial.

Shlomi Hod

Shlomi Hod is a researcher at the Weizenbaum Institute. His work focuses on creating tools for the real-world deployment of responsible computing systems, with particular emphasis on differential privacy. He has led workshops on operationalizing Responsible AI for policymakers, regulators, and diplomats across organizations worldwide, including the US Congress and the German Federal Foreign Office. Shlomi recently earned his Computer Science PhD from Boston University and completed an OpenDP fellowship at Harvard University and a one-year research visit at Columbia University during his doctoral studies.

Your Data Is Leaking: A Hands-On Introduction to Differential Privacy with OpenDP

Shlomi Hod, Marcel Neunhoeffer