MLOps Zoomcamp

I recently enrolled in the MLOps zoomcamp for the 2024 cohort. On this page, I’ll be documenting my progress and developments throughout the course.

You can find my solutions to the course homework in my GitHub repository.

Module 1 - Introduction

The first module of the course covered the following topics:

Introduction to MLOps and why do we need it.
Environment setup using GitHub code spaces or a Virtual Machine (VM) in AWS. Setting up of Docker.
Training a ride duration prediction model using the NYC taxi data.
Course overview.
MLOps maturity model.

The Homework for this module consisted on training a linear regression model on the “Yellow Taxi Trip Records” on the NYC taxi dataset

Module 2 - Experiment tracking and model management

The second module was mainly focused in Experiment tracking, model management and model registry using MLflow. The MLflow setup that one should use depending on the context (a single data scientist participating in an ML competition, a cross-functional team with one data scientist working on an ML model or a multiple data scientists working on multiple ML model) as well as its benefits, limitations and alternatives were also discussed.

The homework of this module was mainly tailored to get familiar with MLflow.

Module 3 - Orchestration

This module was focused in Orchestration of ML Pipelines using Mage, namely on data preparation (ETL and feature engineering), training ML models, observability (monitoring and alerting), triggering (inference and retraind) and deploying (inference and retraining).

The homework consited in the creation of a pipeline using Mage for the trip prediction use case using the NYC taxi dataset.

Module 4 - Deployment

The fouth module focused in model deployment. More specifically, three ways of deployment has been discussed: online (web and streaming) and offline (batch). For web-service deployment, an example of deployment using Flask, Docker and MLflow (for model registry) has been given. With regards to streaming deployment, an example using AWS Kinesis and AWS Lambda functions have been discussed. Finaly an example of batch deployment using Docker was given.

The homework for this module consisted on creating a batch deployment using Docker for the trip prediction use case.

Module 5 - Model Monitoring

This module focused on monitoring, namly on:

Monitoring ML-based services
Monitoring web services with Evidently and Grafana
Monitoring batch jobs with Prefect, Postgres, and Evidently
Data quality monitoring

The goal of this module’s homework was to get familiarized with monitoring for ML batch services, using PostgreSQL database to store metrics and Grafana to visualize them.

Module 6 - Best Practices

This module focused on best practices, namely on:

Testing Python code with pytest
Integration tests with docker-compose
Testing cloud services with Local Stack
Code quality: linting and formating
Git pre-commits hooks
Makefiles and make
Infrastructure as code (with Terraform)
CI/CD