KKBox Churn Platform — Abhishek Kapoor

MLOps Platform for Real time Churn Prediction

A production grade Machine Learning platform designed to predict user churn for large scale music streaming data.

The system emphasizes automation, reproducibility, and observability across the entire ML lifecycle, moving beyond notebook based experimentation.

System Architecture

The platform follows a layered MLOps architecture consisting of data ingestion, distributed feature engineering, unified training pipelines, production inference services, and monitoring components.

Lakehouse style data organization using MinIO and Parquet
Distributed ETL with Dask for large scale processing
Centralized metadata and experiment tracking
Containerized deployment for portability

Data Pipeline

The data pipeline is designed to handle high volume user activity logs while maintaining reproducibility and memory efficiency.

Chunked ingestion from object storage
Distributed aggregation into user level features
Versioned Silver and Gold datasets using DVC
Schema validation before training

Training and Feature Consistency

To prevent training serving skew, all feature transformations are encapsulated inside Scikit learn pipelines and serialized with the trained model.

This ensures that the inference service always applies identical preprocessing logic as the training environment.

Unified preprocessing and modeling pipeline
Zero duplication of feature logic
Deterministic model reproduction

Production Deployment

The trained model is deployed as a FastAPI service running inside Docker containers.

REST based real time inference
Containerized execution environment
Config driven deployment

Experiment Tracking

MLflow is used for experiment tracking and model registry, providing full traceability across code, data, and models.

Centralized metrics and artifacts
Model version management
Reproducible experiment history

Monitoring and Observability

The system is continuously monitored using Prometheus, Grafana, and Evidently AI.

API latency and throughput tracking
Infrastructure health monitoring
Data drift detection

Technology Stack

Python, Dask, LightGBM, Scikit learn, MLflow, DVC, FastAPI, Docker, PostgreSQL, Prometheus, Grafana, Evidently AI