MLOps Zoomcamp
I recently enrolled in the MLOps zoomcamp for the 2024 cohort. On this page, I’ll be documenting my progress and developments throughout the course.
You can find my solutions to the course homework in my GitHub repository.
Module 1 - Introduction
The first module of the course covered the following topics:
- Introduction to MLOps and why do we need it.
- Environment setup using GitHub code spaces or a Virtual Machine (VM) in AWS. Setting up of Docker.
- Training a ride duration prediction model using the NYC taxi data.
- Course overview.
- MLOps maturity model.
The Homework for this module consisted on training a linear regression model on the “Yellow Taxi Trip Records” on the NYC taxi dataset
Module 2 - Experiment tracking and model management
The second module was mainly focused in Experiment tracking, model management and model registry using MLflow. The MLflow setup that one should use depending on the context (a single data scientist participating in an ML competition, a cross-functional team with one data scientist working on an ML model or a multiple data scientists working on multiple ML model) as well as its benefits, limitations and alternatives were also discussed.
The homework of this module was mainly tailored to get familiar with MLflow.
Module 3 - Orchestration
This module was focused in Orchestration of ML Pipelines using Mage, namely on data preparation (ETL and feature engineering), training ML models, observability (monitoring and alerting), triggering (inference and retraind) and deploying (inference and retraining).
The homework consited in the creation of a pipeline using Mage for the trip prediction use case using the NYC taxi dataset.
Module 4 - Deployment
The fouth module focused in model deployment. More specifically, three ways of deployment has been discussed: online (web and streaming) and offline (batch). For web-service deployment, an example of deployment using Flask, Docker and MLflow (for model registry) has been given. With regards to streaming deployment, an example using AWS Kinesis and AWS Lambda functions have been discussed. Finaly an example of batch deployment using Docker was given.
The homework for this module consisted on creating a batch deployment using Docker for the trip prediction use case.
Module 5 - Model Monitoring
This module focused on monitoring, namly on:
- Monitoring ML-based services
- Monitoring web services with Evidently and Grafana
- Monitoring batch jobs with Prefect, Postgres, and Evidently
- Data quality monitoring
The goal of this module’s homework was to get familiarized with monitoring for ML batch services, using PostgreSQL database to store metrics and Grafana to visualize them.
Module 6 - Best Practices
This module focused on best practices, namely on:
- Testing Python code with pytest
- Integration tests with docker-compose
- Testing cloud services with Local Stack
- Code quality: linting and formating
- Git pre-commits hooks
- Makefiles and make
- Infrastructure as code (with Terraform)
- CI/CD