We look forward to presenting Transform 2022 in person again on July 19th and virtually from July 20th to 28th. Join us for insightful conversations and exciting networking opportunities. Register today!
With the massive growth of machine learning (ML) powered services, the term MLops has become a staple of the conversation — and with good reason. MLops is short for “Machine Learning Operations” and refers to a wide range of tools, working features and best practices to ensure that machine learning models are reliably and efficiently deployed and maintained in production. His practice is central to production-ready models—ensuring rapid deployment, facilitating experiments to improve performance, and avoiding model bias or loss in prediction quality. Without them, ML at scale becomes impossible.
With any emerging practice, it’s easy to get confused as to what it actually entails. To help you, we’ve listed seven common myths about MLops to avoid to get you on the right track to successfully deploying ML at scale.
Myth #1: MLops terminates at launch
Reality: Adopting an ML model is just one step in an ongoing process.
ML is an inherently experimental practice. Even after the first launch, new hypotheses need to be tested while signals and parameters are fine-tuned. This allows the model to improve in accuracy and performance over time. MLops processes help engineers effectively manage the experimentation process.
For example, a core component of MLops is version management. This allows teams to track key metrics across a variety of model variants to ensure the optimal one is chosen, while allowing for easy reversal in the event of an error.
It is also important to monitor model performance over time due to the risk of data drift. Data drift occurs when the data a model examines in production differs dramatically from the data the model was originally trained on, resulting in poor-quality predictions. For example, many ML models trained on consumer behavior before the COVID-19 pandemic have suffered greatly in quality after lockdowns changed the way we live. MLops works to address these scenarios by developing strong monitoring practices and building an infrastructure to quickly adapt when a major change occurs. It goes far beyond the launch of a model.
Myth #2: MLops is the same as model development
Reality: MLops is the bridge between model development and the successful use of ML in production.
The process used to develop a model in a test environment is usually not the same that enables it to be successful in production. Running models in production requires robust data pipelines for sourcing, processing, and training models, often spanning much larger datasets than in development.
Databases and computing power usually have to be relocated to distributed environments to handle the increased load. Much of this process needs to be automated to ensure reliable deployments and the ability to rapidly iterate at scale. The tracking also needs to be far more robust as production environments see data outside of the data available in testing and therefore the potential for the unexpected is far greater. MLops consists of all these practices to take a model from development to launch.
Myth #3: MLops is the same as Devops
Reality: MLops has similar goals as Devops, but its implementation differs in several ways.
While both MLops and Devops strive to make delivery scalable and efficient, achieving this goal for ML systems requires a new set of practices. MLops places a greater emphasis on experimentation compared to developers. Unlike standard software deployment, ML models are often deployed with many variants at once, so there is a need for model monitoring to compare them and choose an optimal version. With each redeployment, it is not enough to just land the code – the models must be retrained with each change. This differs from standard Devops deployments as the pipeline must now include a retraining and validation phase.
For many of Devops’ common practices, MLops extends the scope to meet its specific needs. Continuous integration for MLops goes beyond just testing code, it also includes data quality checks in addition to model validation. More than just a series of software packages, continuous deployment now includes a pipeline for modifying or rolling back changes in models.
Myth #4: Fixing a bug is just changing lines of code
Reality: Fixing ML model errors in production requires advance planning and multiple fallbacks.
If a new deployment results in a decrease in performance or some other error, the MLops teams need to have a number of options at hand to resolve the issue. Simply reverting to previous code is often not enough, as models need to be retrained before deployment. Instead, teams should keep multiple versions of models to ensure that a production-ready version is always available in the event of a failure.
Additionally, in scenarios where data is lost or production data distribution is significantly shifted, teams need to have simple fallback heuristics in place to allow the system to maintain at least some level of performance. All of this requires significant upfront planning, which is a core aspect of MLops.
Myth #5: Governance is completely different from MLops
Reality: While governance has distinct goals from MLops, a majority of MLops can help support governance goals.
Model governance manages regulatory compliance and risk associated with the use of ML systems. This includes things like adhering to appropriate privacy policies for users and avoiding bias or discriminatory results in model predictions. While MLops is typically viewed as ensuring that models perform, this is a narrow view of what it can deliver.
Tracking and monitoring models in production can be supplemented with analytics to improve model explainability and find bias in the results. Transparency in model training and deployment pipelines can make it easier to meet data processing goals. MLops should be viewed as a practice that enables scalable ML for all business objectives, including performance, governance, and model risk management.
Myth #6: ML systems can be managed in silos
Reality: Successful MLops systems require collaborative teams with hybrid skills.
Deploying ML models involves many roles, including data scientists, data engineers, ML engineers, and development engineers. Without collaboration and understanding of each other’s work, effective ML systems can become unwieldy at scale.
For example, a data scientist can develop models without much external visibility or input, which can then lead to deployment challenges due to performance and scaling issues. A development team without insight into key ML practices may not develop the appropriate tracking to enable iterative model testing.
For this reason, it is generally important that all team members have a thorough understanding of the model development pipeline and ML practices – collaborating from day one.
Myth #7: Managing ML systems is risky and unsustainable
Reality: Any team can leverage ML at scale with the right tools and practices.
Since MLops is still a growing field, it can seem like there is a lot of complexity. However, the ecosystem is maturing quickly and there are plenty of resources and tools available to help teams succeed at every step of the MLops lifecycle.
With the right processes in place, you can unlock the full potential of ML at scale.
Krishnaram Kenthapadi is the lead scientist at Fiddler AI.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is the place where experts, including technical staff, working with data can share data-related insights and innovations.
If you want to read about innovative ideas and up-to-date information, best practices and the future of data and data technology, visit us at DataDecisionMakers.
You might even consider contributing an article of your own!
Read more from DataDecisionMakers