Exploring real-world challenges in MLOps implementation : a case study approach to design effective data pipelines
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With the increasing significance of machine learning (ML) systems across various industries, leveraging Machine Learning Operations (MLOps) to effectively streamline the lifecycle of ML models has become a focal point. Central to the MLOps is the effective management of data, which is a core component of machine learning models. In this context, a data pipeline is utilised, which consists of a series of automated processes for moving and transforming data to be analysed using machine learning systems. However, to stay ahead in the ever-evolving field of Machine Learning Operations (MLOps), it is critical to understand the challenges involved in building efficient and reliable data pipelines. This study investigates these challenges using a comprehensive approach to gain diverse perspectives. It starts with a systematic literature review to identify academic viewpoints, followed by a multiple case study that examines various industry projects through interviews with practitioners. This approach seeks to validate the academic findings in real-world contexts and identify gaps in the existing research. The study’s outcome demonstrates a considerable alignment between the challenges documented in academic literature and those faced by industry professionals in their projects. Regardless, the significance of these challenges varied between the two sectors, and insights from industry practitioners highlighted challenges that academic research had overlooked. Based on these insights, the study further explored version management as a significant challenge that was reflected in the case studies as lacking knowledge and being inadequately addressed. This led to the development of a solution that demonstrates practical applicability to effectively mitigate the challenges and ensure adaptability in real-world settings. In conclusion, this study presents a detailed and comprehensive analysis of the challenges impacting the MLOps data pipeline lifecycle, highlighting both the consistency and unique aspects of academic and real-world perspectives. The findings will help shape more effective data pipelines that meet real-world needs. Additionally, the study presents a solution to tackle the identified significant challenge, providing valuable insights into its advantages and limitations. It also reflects on the practical implications of this solution and proposes future directions for the field.