During the talk we will cover:
- Why Watermarking Matters?
- What can go wrong when AI-generated content becomes indistinguishable from human writing
- Why provenance and transparency are becoming essential to trust and safety.
- How LLM Watermarking Works?
- What is a watermark and what isn't
- The core idea behind statistical watermarking
- Two Key Algorithms implemented using Python's established frameworks
- EXP Watermark: modifying logits with pseudo-random perturbations.
- KGW Green-List Watermark: partitioning tokens into “green” and “red” lists to bias sampling.
- Python code walkthrough of implementing the KGW method and comparing it with the EXP method.
- How you can use MarkLLM (open-source toolkit)
- Generate and detect watermarked text.
- How to use the toolkit for experiments in your own workflows.
- Real-World Challenges and Limitations
- How robust and evasive are the current algorithms
Key Takeaways:
- Watermarking is a promising tool for provenance.
- Developers can implement and test watermarking fully in Python.
- Understanding these methods helps build more transparent and trustworthy AI systems.
This talk is for people who:
- Care about ethics and privacy in AI and want to understand what watermarking can (and cannot) solve.
- Build applications using LLMs and want mechanisms for verifying or auditing generated text.
- Are ML researchers or hobbyists interested in how watermarking algorithms function at a technical level.
- Work in AI safety, trust & transparency, or responsible AI and need practical tools for content provenance.
Note: No prior experience with LLM architecture is required, basic familiarity with probability is recommended; no advanced math needed.
Subhosri Basu
I am a GenAI researcher at Fraunhofer Institute, Germany. Born in India, I decided to move to Germany in search of new challenges. My professional journey has been shaped by a passion to solve problems in various domains. Academically, I have graduated with a Master's degree from the department of electrical and computer science. My focus has always been around statistics. I have been able to work on projects related to artificial intelligence and deep learning, especially in the field of signal processing and imaging. With my experience, I want to guide the growth of next generation of ML researcher. When I am not working, you will find me exploring Europe.