Voiced by Amazon Polly |
Introduction
Machine Learning (ML) pipelines streamline model development, training, and deployment by automating repetitive tasks. However, manually deploying ML models can be error-prone and time-consuming. Integrating Continuous Integration (CI) and Continuous Deployment (CD) ensures that updates are tested and deployed efficiently, improving reliability and scalability.
In this guide, we’ll build a complete ML pipeline from scratch, covering:
- Understanding the problem
- Data preprocessing
- Model training and evaluation
- Model deployment using Flask
- Implementing CI/CD with automated testing
- Deploying using Docker
Ready to lead the future? Start your AI/ML journey today!
- In- depth knowledge and skill training
- Hands on labs
- Industry use cases
Use Case: Predicting Iris Flower Species
To demonstrate an end-to-end ML pipeline, we’ll develop an ML model to classify iris flowers into three species based on petal and sepal dimensions.
Dataset Overview
We use the well-known Iris dataset, which consists of:
- Features: Sepal length, Sepal width, Petal length, Petal width
- Target Variable: Species (Setosa, Versicolor, Virginica)
ML Model Development
Setting Up the Environment
To ensure reproducibility, install the necessary dependencies:
pip install numpy pandas scikit-learn flask pytest requests
Data Preprocessing and Model Training
Data preprocessing is crucial in ML pipelines to clean and transform raw data before feeding it into a model. Here, we load the dataset, encode categorical values, split it into training and testing sets, train a RandomForest model, and evaluate its accuracy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score import joblib # Load dataset df=pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') # Encode target variable df['species'] = df['species'].astype('category').cat.codes # Split data X = df.drop(columns=['species']) y = df['species'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train model model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X_train, y_train) # Evaluate model y_pred = model.predict(X_test) print("Accuracy:", accuracy_score(y_test, y_pred)) # Save model joblib.dump(model, 'iris_model.pkl') |
Expected Model Performance
An accuracy score is printed, indicating the model’s performance. Example output:
Accuracy: 0.9667
Model Deployment with Flask
Why Use Flask?
Flask is a lightweight Python framework for building web applications and APIs. We use it to serve our trained ML model as a REST API.
Creating the API Server
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
from flask import Flask, request, jsonify import joblib import numpy as np app = Flask(__name__) # Load model model = joblib.load('iris_model.pkl') @app.route('/predict', methods=['POST']) def predict(): data = request.get_json() features = np.array(data['features']).reshape(1, -1) prediction = model.predict(features) return jsonify({'prediction': int(prediction[0])}) if __name__ == '__main__': app.run(debug=True) |
Running the Server
python app.py
Testing the API
We can test the API using curl:
1 |
curl -X POST http://127.0.0.1:5000/predict -H "Content-Type: application/json" -d '{"features": [5.1, 3.5, 1.4, 0.2]}' |
API Response
{“prediction”: 0}
Implementing CI/CD Pipeline
What is CI/CD?
Continuous Integration (CI) ensures that code changes are automatically tested before merging. Continuous Deployment (CD) ensures that successfully tested changes are automatically deployed. We implement this using GitHub Actions.
Setting Up CI with GitHub Actions
Create a .github/workflows/test.yml file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
name: CI Pipeline on: push: branches: - main jobs: test: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: '3.8' - name: Install dependencies run: pip install -r requirements.txt - name: Run tests run: pytest |
Writing Unit Tests
Unit tests ensure that the API functions correctly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import pytest import json from app import app def test_prediction(): tester = app.test_client() response = tester.post('/predict', data=json.dumps({"features": [5.1, 3.5, 1.4, 0.2]}), content_type='application/json') assert response.status_code == 200 assert 'prediction' in response.get_json() |
Running Tests Locally
pytest test_app.py
Deployment with Docker
Why Use Docker?
Docker allows us to containerize our application, making it easy to deploy across different environments.
Creating a Dockerfile
1 2 3 4 5 6 7 8 9 |
FROM python:3.8 WORKDIR /app COPY . /app RUN pip install -r requirements.txt CMD ["python", "app.py"] |
Building and Running the Docker Container
1 2 3 |
docker build -t ml-api. docker run -p 5000:5000 ml-api |
Testing with Docker
1 |
curl -X POST http://127.0.0.1:5000/predict -H "Content-Type: application/json" -d '{"features": [5.1, 3.5, 1.4, 0.2]}' |
Expected Output
{“prediction”: 0}
Conclusion
In this guide, we built an end-to-end ML pipeline that includes:
- Model training and evaluation
- API deployment using Flask
- Automated testing with GitHub Actions
- Containerization with Docker
With CI/CD automation, ML applications can be deployed seamlessly, ensuring efficiency and scalability.
Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.
- Cloud Training
- Customized Training
- Experiential Learning
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront, Amazon OpenSearch, AWS DMS and many more.
To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.
WRITTEN BY Priya Kanere
Comments