MLOps: Your Guide To Sustainable Solutions At Scale
The demand for AI continues to increase across private and public organizations and sectors. Bloomberg Government shows the projected growth of the AI market in the federal sector to be $4.3 billion by 2023. Additionally, in January, the U.S. Congress adopted the National Defense Authorization Act of 2021 (NDAA), which includes creating a National Artificial […]
The demand for AI continues to increase across private and public organizations and sectors. Bloomberg Government shows the projected growth of the AI market in the federal sector to be $4.3 billion by 2023.
Additionally, in January, the U.S. Congress adopted the National Defense Authorization Act of 2021 (NDAA), which includes creating a National Artificial Intelligence Initiative Office. Their mission is to:
- Promote U.S. leadership in AI research and development
- Lead the world in the development and use of trustworthy AI in the public and private sectors
- Prepare the present and future U.S. workforce for the integration of AI systems across all sectors of the economy and society
So, with the increasing demand for ML and AI solutions, what is MLOps and why is it important?
What is MLOps?
MLOps is a series of practices and tools that introduce repeatability and automation into the processes that support AI solution training, testing, deployment, monitoring, and governance. The goal is to remove bottlenecks in creating, training, validation, development, deployment, and monitoring an AI model.
AI projects involve a lot of exploration and experimenting because the problems solved by AI aren’t often clear. To approach the problem systematically and efficiently, the AI model implementation needs to be fast, reliable, and repeatable by automating tests and deployment pipelines. Additionally, the project team needs to standardize the infrastructure all the way from development into production and plan solid security built-in throughout.
Monitoring the AI model to validate that it continues to work as expected is an essential element of MLOps. Monitoring is vital because ML algorithms adapt in response to new data and experiences to improve without human direction. We need to monitor the model outputs to detect and correct for model drift over time.
Some of the benefits you can expect to gain by introducing MLOps into your organization are:
Consistently deliver the best recommendations through repeatable, consistent, and well-monitored processes for training, deployment, and production release of your ML models.
Enjoy more opportunities for reuse through well-documented learning and hypothesis testing of ML models and their evolution.
Always perform safely and securely through repeatable validation and testing procedures.
Continually improve through rapid feedback loops employing automated tooling and reliable infrastructure.
Meet your business needs through effective metrics that track model accuracy, bias, and production readiness.
Ultimately, MLOps can reduce bottlenecks and bring machine learning workflows into production.
6 Key Aspects of an MLOps Approach:
The best way to get started with MLOps is to recognize that it covers many different aspects of the lifecycle for ML solutions. Addressing these six areas will build a foundation for an effective MLOps approach. By making progress in each area, you will start to see benefits, even if you do not fully automate all your processes right away.
Six key aspects essential to MLOps:
- Understanding business needs
- Data governance and ingestion
- Model development
- Model operationalization
Understanding the Business Need
Perhaps the most important aspect of MLOps is ensuring that any AI solution solves a clear business problem. AI projects are challenging and don’t follow a linear, predictable path. There are many unknowns, and this means there is often a lot of exploration involved. Because of this, it’s very important that we approach this work in an agile manner with rapid feedback loops in place to apply learning. This approach ensures you are building the right thing and building the thing right.
Understanding the business problems and goals will guide the development approach and influence the design of the model. At the beginning of an AI project, ideas are generated through discussions with users and stakeholders and documented using templates like the Lean Canvas. Teams also capture the associated data sources and possible success metrics to help prioritize the ones needed to move forward. We timebox performing some initial prioritization of scenario ideas.
Data Ingestion and Governance
ML solutions are only as good as their underlying data. An MLOps approach puts processes in place to ensure data is effectively managed and governed so that your models remain as accurate as possible.
Data ingestion pipelines must onboard all data sources reliably to fuel the model result. The processes put in place need to apply data quality business rules, security, and access policies. Your team should automatically apply these as much as possible wherever you’ve established data governance policies and quality metrics.
An example of a potential process could be using automated thresholds to trigger alerts or stop job processing when it reaches its defined limits. This could be setting a minimum level of data populated for key data elements to ensure completeness or a maximum level of unexpected values for a data element to ensure the project meets data quality standards. Experts recommend data volume thresholds for each source to alert for unexpected peaks or drops in volume for further analysis.
Once the data has been ingested, aggregated, and preprocessed, a model is ready to be trained. For a model to maximize its eventual business impact, it has to be performant in solving the business problem. Additionally, it needs to align the business metrics in terms of model accuracy, scalability, reproducibility, and availability.
Projects must control model parameters and effectively manage in model training. Proper training environments should be:
- Repeatable – by changing dependencies and randomness
- Scalable – be able to handle larger training datasets, parallelized pipelines
- Version Controlled – using containerization to envelop all changes
- Lightweight – to reduce expensive compute overhead
- Hardware Support – the models need to be trained according to the hardware it’s using
When we’re ready to build a model, we adopt the approach of building in small increments and regularly validating for technical correctness and added value. Sometimes, we may find that we actually can’t get this to work as we want it to. For example, maybe a model with sufficient accuracy just doesn’t run fast enough, or the data doesn’t support the kind of predictive inference that our users would get value from. In that case, validation serves as a systematic way to cut our losses and pivot to a new solution approach.
MLOps introduces repeatable processes for taking ML models from development and into production so that you can be confident that when a new, improved model is ready, you can take full advantage of it, with confidence that training and deploying were correct.
You can automate the ML production pipelines to retrain the models with new data, depending on your use case. For example, it can retrain on demand, on a schedule, or upon the availability of new training data or if there is significant model performance degradation. Deployment can occur on multiple platforms including Cloud, Hybrid, Edge, and Mobile depending on the model’s size, computational cost, data security and privacy restrictions, and immediacy of results.
There are 3 main categories of model performance metrics that need monitoring as part of an MLOps approach.
- Performance metrics that monitor model speed, availability, and scaling.
- Accuracy metrics that monitor model accuracy, false positives, and negatives.
- Model use metrics that track the frequency of model use and how it’s using it.
It’s important to automate model monitoring and use triggers to send warnings or take action based upon defined thresholds or metric changes.
Many emerging threats can compromise or manipulate a machine learning system. Common AI model security risks include adversarial attacks, data poisoning, online model attacks, distributed denial of service (DDOS), transfer learning, and data phishing.
Ensuring that the data being input into the model is free from bias, accurate, untampered with, and secure is paramount to receiving the correct output from the model. Additionally, while there are many benefits to building models off of existing architecture (cost, effort, accuracy), that also opens up the newer models for attacks without proper security measures. Models that get data input from online sources are also open to bias and misinformation and need close monitoring.
Learn How MLOps Can Work For Your Organization’s Goals
MLOps is a complex set of practices and tools that is nevertheless necessary for organizations and enterprises worldwide to grow and scale their AI investments.
Download our ebook, MLOps 101, to learn more about this critical development in the machine learning and AI sector.