In the talk we share our expirience from the project implemented in Q3 2025. We start with the motivation for the project, involved stakeholders and their needs. We will then define the criteria for a successful data quality monitoring solution and share findings from our evaluation of existing frameworks. We will also discuss why popular frameworks like Great Expectations or SODA did not meet our requirements.
Next, we will demonstrate our implementation based on DQX—a lightweight, open-source Python library designed for traceable, row-level data quality checks before and after data is persisted. DQX, developed and maintained by Databricks labs, allows developers to concentrate on the core implementation while providing business users YAML files for maintenance of business rules. Furthermore, DQX’s seamless integration with PySpark enables efficient and cost-effective quality monitoring within our IoT data lake.
Finally, we move beyond the code to the organisational reality. We will discuss how we embedded Data Quality Monitor into the organisation and share our opinion on the hard questions: who is responsible for maintaining rules? who monitors the results?
Talk outline
Motivation for the project
Framework evaluation
Evaluation criteria for a successful data quality monitoring
Comparison of available frameworks
Our implementation with DQX
How to use built-in data quality checks
How to add custom data quality checks
Automated rule generation with DQX Profiler
Output and visualisation options
Python project structure
Embedding in organisation
Rule maintenance
How to communicate data quality issues
Summary
Key takeaways
Understanding of most important criteria when choosing the framework for data quality monitoring from perspective of a data engineer and an architect
Understanding of DQX framework
Ideas how to integrate data quality monitoring into organisations