As a data science team working on forecasting, we rarely rely on a single model to produce optimal results. Instead, we run composite pipelines: short and long-term models followed by ensemble strategies, granularity reconciliation, and post-processing. We found that optimizing the parameters of one model in isolation often degraded the performance of the final ensemble or the post-processed output. We needed to treat the entire pipeline as the function to optimize.
This talk details how we implemented a "Pipeline-as-a-Trial" architecture utilising Ray with cloud infrastructure (Sagemaker + Databricks + custom solution).
The solution architecture consists of 2 pieces:
Operational Challenges: We will deep dive into the different trade-offs and hurdles of this implementation: