New Blog: Bringing (Gen) AI from Laptop to Production with MLRun

Accelerating and Scaling AI Deployments Across Hybrid Environments

AI drives digital transformation for telecom leaders, but scaling it across complex hybrid environments remains one of the toughest challenges. Safaricom, one of Africa’s most advanced and AI-mature mobile operators, faced this head-on. With over 50 million users and an AI ecosystem spanning cloud and on-prem infrastructure, Safaricom needed to move faster, from model development to deployment, without compromising reliability, governance, or business impact.

In this blog, we explore how Safaricom reimagined its AI operations using MLRun and Iguazio to overcome legacy bottlenecks, standardize processes, and achieve 5X faster time-to-production.

This blog post is based on a webinar with Hillary Murefu Wangila, Head of AI, and Anthony Nyaga Irugu, Lead ML Engineer, from Safaricom, and Salesh Bhat, Principal Architect from Iguazio (a McKinsey company). You can dive deeper into the use cases, architectures and demo by watching the full webinar here.

How to Reliably Scale AI for 50M Users

Safaricom is one of the most successful mobile operators in East and Central Africa, serving ~50 M users. However, their legacy Al stack lacked standardization and was complicated and siloed. Achieving secure, production-grade scalability and reliability required immense effort, and production timelines were five times longer than they needed to be.

In addition, their data science and MLE teams were disparate, resulting in long processes which delayed models from reaching production. It took weeks of refactoring just to move code from notebooks to production.

The team needed an Al infrastructure that would allow them to focus on their expertise rather than tech complexity, and enable them to get to production with ease. They were looking for a solution that would let each expert focus on their own craft: data scientists on modeling, MLOps engineers on deployment and scalability, and data engineers on pipelining data.

At the same time, the solution had to enable the same experience across cloud and on-premises, streamline, automate and accelerate the Al process, and provide the foundation for new Gen AI use cases.

What is the Iguazio AI Factory?

The Iguazio AI factory is the enterprise-grade MLRun, and allows for continuous delivery, automatic deployment and monitoring of AI apps. The factory is based on 4 pipelines:

  1. Data management – Ensures data availability, quality and control to feed the ML system.
  2. Development – Standardizes processes and tooling to improve team efficiency and solution performance.
  3. Deployment – Standardizes processes and provisions tooling to reliably deploy solutions with “One Click”.
  4. LiveOps – Monitors models to maintain reliable performance and drive continuous improvement.

The platform integrates seamlessly with CI/CD tools, infrastructure as code, and both on-prem and cloud environments, enabling scalable, production-ready AI pipelines. It also allows continuous model monitoring for drift, bias, and hallucinations.

Users can also use an open-source MLRun marketplace with pre-defined functions, codes and notebooks. This simplifies pipeline development, training, inference and serving across the lifecycle.

Safaricom Deploys MLRun

Safaricom chose MLRun and the Iguazio Gen Al Factory to automate and accelerate operationalization of their Al applications in live environments. The value was demonstrated through the migration of three leading use cases, resulting in 5x acceleration of Al:

  1. Optimizing MPESA Apps – Mini apps that provide personalized services like upselling, cross-selling and customer feedback in real-time. 59% of Kenya’s GDP is processed through MPESA.
  2. Customer segmentation based on 30 metrics to garner feedback & enrich NPS
  3. Predictive modeling of customer actions & customer segmentation to decrease churn and increase upsells

(Below, we’ll dive deeper into use cases #1 and 2).

The impact:

  • 5x faster time to production
  • Standardized & automated AI operationalization
  • Gen AI-ready infrastructure
  • Support for hybrid environment: AWS and on-prem

AI Use Case Migration from On-prem to AWS with Iguazio and MLRun

As part of a strategic exercise, 16 Al use cases were planned to be migrated to AWS from on premise infra. The goal was to unify on-prem and cloud environments into a hybrid setup, enabling seamless failover and deployment between them.

  • Process steps were mapped, and the code and pipelines were moved to Iguazio which was pre deployed on AWS
  • With Iguazio and the underlying MLRun abstraction layer was provided, which ensured no breakages or consistent migration process
  • Lift and shift of pipelines and code on Iguazio, with build, test and deploy steps
  • All use cases were checked for errors, and the minimal errors were recited

Impact:

  • Simplified Al governance
  • Cost saving in terms of access to scalable AWS Infra
  • Actual migration process took just 2-3 days which massively reduced complexity, instead of weeks
  • Increased developer productivity due to enhanced experience on AWS – empowering data scientists to focus purely on modeling while MLOps engineers handled scalability and orchestration, each excelling in their craft without friction.

Use Case #1: Giga and Mini Apps Use Cases

This use case involved serving real-time propensity models to a mobile app built on MPESA. Previously, the workflow relied on manual handoffs: passing Java configs, simulation files, and notebooks between teams.

Old State Architecture

Previous workflow steps:

  1. Data collection – Massive customer behavior data was stored in a big data platform.
  2. Data preparation – Data was pre-stitched and transferred into a Postgres database (or another open-source system).
  3. Scheduling – Airflow was used to run workflows daily or monthly for different outcomes.
  4. System coordination – Multiple tools and servers had to be maintained independently and kept in sync.
  5. Metrics writing – The system wrote back run metrics (e.g., job results, performance) into Postgres.
  6. Modeling – Data scientists worked in notebooks to design and test models, spending about 80% of the manual labor here.
  7. Handover – Once modeling was done, the data scientist handed off the project to the MLOps engineer.
  8. Optimization & scaling – The MLOps engineer optimized and scaled the solution for production.
  9. Containerization – The model was containerized using Docker.
  10. Deployment orchestration – Airflow was used again to orchestrate and deploy the model into production.
  11. System maintenance – All these tools and servers had to work together seamlessly, requiring significant effort to maintain and synchronize.

New State Architecture: Giga and Mini Apps Use Cases on Iguazio Simplify the Development Lifecycle

The new Iguazio-based workflow replaced a fragmented, multi-tool pipeline with a unified, automated MLOps system powered by MLRun, as the open-source orchestration layer.

Instead of using multiple separate tools for data prep, feature engineering, scheduling, and deployment, the team now works within a single environment where preprocessing, feature management, model training, deployment, and monitoring all happen seamlessly in one place.

This reduced the workflow, cutting out manual DevOps work like writing YAMLs, Jenkins pipelines, or Docker files. Data scientists can now build, deploy, and monitor models directly from their Jupyter notebooks, achieving true “DevOps as code” and dramatically speeding up experimentation and time to production.

New workflow steps:

  1. Data ingested into the Iguazio platform HDFS/New Data lake via remote Spark
  2. Feature sets are created as a post step of aggregations from feature engineering
  3. Feature vectors created from all the feature sets
  4. Model development and training using feature vectors as model input
  5. Model Inference as a next step, typically in a batch manner
  6. Monitoring via platform UI & Grafana and retraining (if needed)
  7. Model outputs are served to the downstream applications
  8. Integration with existing Safaricom tech
  9. CI/CD uses existing Jenkins or other OTS Cl tools.

Use Case #2: NPS Corrector

Old State Architecture

This use case focused on NPS and customer behavior initially involved large, diverse datasets and a tangled network of 10+ data sources and pipelines. The original process was so complex it took nearly a year to complete when done manually.

Current State Architecture: NPS Corrector Use Cases on Iguazio to Accelerate AI Development Lifecycle

By identifying overlaps and duplications in preprocessing and aligning it with the same standardized structure used in earlier use cases, the team was able to streamline and automate the entire pipeline using MLRun as the central orchestrator. MLRun automates data ingestion, feature engineering, training, and deployment. MLRun packages their scripts, manages scaling through Kubernetes, and integrates automatically with monitoring tools like Grafana to ensure reliability and performance.

This setup eliminates the need for external schedulers like Airflow or manual DAG scripting. Data scientists can now build, schedule, deploy, and scale their models directly from Jupyter notebooks in a secure, automated, and production-grade environment.

Go deeper into these workflows in the webinar.

Demo Time

To see a demo of how to accelerate and automate complex AI workflows, watch the full webinar. Safaricom presents how the platform automatically handles containerization, networking, scaling, and security. Then, they show how GenAI can accelerate development workflows through a custom code-generation tool. This tool allows users to generate working AI applications and deploy them instantly, using natural-language prompts and a few lines of code.

Watch the full webinar here.

Recent Blog Posts
Introducing MLRun v1.10: Build better agents, monitor everything
We’re proud to announce a series of advancements in mlrun v1.10 designed to power your end-to-end or...
Michal Eshchar
March 24, 2026
Introducing MLRun Community Edition
Gilad Shapira
March 24, 2026

Introducing MLRun Community Edition

MLRun Community Edition (CE) is the out-of-the-box solution of MLRun for AI and ML orchestration and model lifecycle management.

MLRun CE can be installed directly on your Kubernetes cluster or even on your local desktop. It provides a complete, integrated MLOps stack that combines MLRun’s orchestration power with Nuclio’s high-performance serverless engine, along with additional tools for data storage, monitoring, and more.

In this blog, we’ll explain how MLRun CE works and recommended use cases, and share how one of our users leverages MLRun CE for experiment and model tracking.

What is MLRun CE?

MLRun CE is ready to use out of the box. It is designed to simplify the entire lifecycle of LLM and ML projects, and provides a robust solution for complex MLOps needs (see examples below).

Under the Hood

By easily installing the MLRun CE Helm chart on your Kubernetes cluster or local desktop, you get a powerful, integrated environment for development. The platform is built on two cores: MLRun for MLOps orchestration and Nuclio for serverless computing.

MLRun is the MLOps orchestration framework that automates the entire AI pipeline, from data preparation and model training to deployment and management. It automates tasks like model tuning and optimization, enabling you to build and monitor scalable AI applications. With MLRun, you can run real-time applications over elastic resources and gain end-to-end observability.

Nuclio is a high-performance serverless framework that focuses on data, I/O, and compute-intensive workloads. It is the engine that powers the real-time functions within MLRun. Nuclio allows you to deploy your code as serverless functions, which are highly efficient and can process hundreds of thousands of events per second. It supports various data sources, triggers, and execution over CPUs and GPUs. It also supports real-time serving for generative AI use cases.

Key Integrations for a Complete MLOps Solution

MLRun CE easily integrates with several other tools. It includes an internal JupyterLab service for developing your LLM code and supports Kubeflow Pipelines workflow for creating multi-step AI pipelines. It also works with Kafka and TDengine for robust, real-time and batch model monitoring, and provides built-in support for Spark and Grafana for data processing and visualization.

Key Advantages of MLRun CE

Data and developer users of MLRun CE can benefit from:

  • Open-source MLOps Solution: MLRun CE is an open-source MLOps platform that you can quickly install on your Kubernetes cluster or local desktop by deploying the mlrun-ce chart.
  • Integrated MLOps Workflow: MLRun CE combines the MLRun orchestration framework with the Nuclio serverless engine to provide a complete MLOps solution. This allows users to seamlessly automate tasks from data preparation and model training to deployment and monitoring. This integration eliminates the need for teams to stitch together disparate tools, saving time and effort.
  • Rapid Production Deployment: The platform allows you to take your code from a Jupyter Notebook or your local IDE to a scalable k8s cluster with minimal changes. This significantly shortens product development, enabling faster iteration and business impact.
  • Scalability and Efficiency: With Nuclio as its serverless engine and MLRun as an MLOps orchestrator, MLRun CE can automatically and elastically scale resources based on demand. This ensures your workloads, whether batch or real-time, run efficiently, reducing computation costs. It’s particularly useful for resource-intensive tasks like LLM fine-tuning or inference.
  • Real Time and Batch Model Monitoring: MLRun CE includes a real-time and batch model monitoring solution, based on the fully out-of-the-box integration with Kafka and TDengine.  You can track models, compare results and performance metrics, and detect data drift or anomalous behavior. It also supports automated alerts for model exceptions, enabling proactive maintenance and ensuring continued model reliability.

Seamless Integrations: The platform integrates with a wide range of popular open-source tools, including Kubeflow Pipelines for workflow management, and Spark and Grafana for data processing and visualization. This open architecture gives you the flexibility to use the tools you already know and love.

MLRun CE Ecosystem

The following are the components that get installed when installing MLRun CE

  1. MLRun
  2. Jupyter notebook
  3. Kafka & TDengine
  4. Minio
  5. Nuclio (for realtime functions)
  6. Grafana & Prometheus
  7. MPI
  8. Minio
  9. Spark Operator

The picture below describes the relations between them. MLRun is the orchestrator and deploy function by using MLRun, Nuclio, Spark and MPI jobs runtimes. Grafana is used to monitor usage, Jupyter for out-of-the-box development platforms and Minio, MySQL & TDengine to store data.

Use Cases for the Community

MLRun CE can be used for a wide variety of MLOps use cases. In particular:

  • MLRun UI for project management – Track project experiments, artifacts, model performance, and manage project members and secrets
  • Batch jobs for retraining processes and user experimenting
  • Experiment tracking and results tracking, including duplication and debugging
  • Real-time serving for generative AI use cases
  • Alerts mechanism for threshold results like data drift 
  • Real-time functions for model inferences, realtime data processing and more
  • Runtime resource management and scaling
  • Models artifactory that allows users to track datasets and model artifacts by tags, labels and results

Spotlight: How One of Our Users Manages Their Experiments and Models with MLRun

One of our community users has adopted MLRun CE as their MLOps platform to deploy, track, and manage their ML training experiments and models. MLRun CE is deployed across Kubernetes environments.They run two main types of ML workflows run through MLRun CE. The first is manually triggered training jobs. MLRun CE runs the training function, logs metrics and datasets, and registers the model for deployment on edge devices. 

The second is automated periodic insight models, such as drift detection functions that compare recent data against training distributions and generate alerts when anomalies occur.

The team relies on MLRun CE’s full set of components: project management, batch functions, experiment tracking, model monitoring, and alerts.

With MLRun CE, their data science teams can:

  • Run experiment tracking
  • Ensure model integrity
  • Duplicate experiments
  • Easily debug experiment logs
  • Deploy models easily and quickly
  • Manage resources and scale

Getting Started with MLRun CE

Check out these resources for more information:

Recent Blog Posts
Introducing MLRun v1.10: Build better agents, monitor everything
We’re proud to announce a series of advancements in mlrun v1.10 designed to power your end-to-end or...
Michal Eshchar
December 1, 2025
Introducing MLRun Community Edition
Gilad Shapira
December 1, 2025

Bringing (Gen) AI from Laptop to Production with MLRun

MLRun is an open-source framework that orchestrates the entire generative AI lifecycle, from development to deployment in Kubernetes. In this article, we’ll show how MLRun replaces manual deployment processes, allowing you to get from your notebook to production in just a few lines of code.

What is the Traditional AI Application Lifecycle?

As a data professional, you’re probably familiar with the following process:

  1. You want to run a batch fine-tuning job for your LLM, but your code requires a lot of memory, CPUs and/or GPUs. It also needs a number of Python requirements packages to run and fine-tune the LLM.
  2. You must run your code on your K8s cluster because your local computer doesn’t have enough resources. For this, you need to create a K8s resource and maybe a new Docker image with the new Python requirements.
  3. Once you’ve successfully run the function on the K8s cluster, you need to version and track your experiment results (in this case the LLM and fine-tune job results). This is essential to understand where and why you need to improve your fine-tune job.
  4. In some projects, the model inference is done in a batch, in others it’s in real-time. If this is a real-time deployment, you need to create a K8s resource that serves the model with the user prompts or create a batch job that does the same. Both should run in the K8s cluster for production testing, and you’ll need to manage those resources by yourself.
  5. Once you serve the model, you need to monitor and test how your model is behaving and if the model outputs meet your criteria for deployment in production, using accuracy, performance or other custom metrics.
  6. Once your project is ready to deploy in production systems you need to run some of the steps above in the production cluster again.

What are the Challenges in the Traditional AI Lifecycle?

The traditional process described above is fraught with challenges:

  • Engineering Technologies and Resources – Data teams, DevOps and engineers each use different technologies and frameworks. This creates technological friction and inconsistency across AI pipelines and silos between teams, demanding a solution to streamline and automate the entire process.
  • Resource Management – AI models, and especially LLMs, often require substantial memory and GPU resources, which are in low supply and costly. Plus, compute requirements are not consistent throughout the workflow. For example, data processing and training might require more resources. Enterprise teams need a solution to auto-scale and to allocate and monitor deployment resources easily.
  • Versioning and Experiment Tracking – Distributed systems are convoluted and dispersed and teams lack holistic visibility into them, making it complex to track changes, metrics and results for each model or artifact. This requires versioning capabilities and artifact management solutions.
  • Data Privacy – LLMs may handle sensitive user data, which needs to be safeguarded to protect user privacy and abide by compliance requirements. Guardrails must be implemented in any live business application.
  • Monitoring – Production models can degrade over time due to data drift and changing real-world conditions, leading to poor performance. Plus, LLMs might hallucinate or have inherent bias, requiring LiveOps and guardrails.
  • Kubernetes Complexity – Deploying models or running a user workflow in production requires extensive understanding of Kubernetes, like the ability to manage and deploy k8s resources, collect necessary logs  and tune resource requests and limits. Most data professionals typically have expertise in other technologies. As a result, it is challenging to effectively run a job, serve the model and understand how their code is behaving in production for monitoring purposes.

The Core Advantages of MLRun

MLRun addresses these challenges by allowing you to easily run your local code in K8s production environments as a batch job or a remote real-time deployment. MLRun eliminates the need to worry about the complexity of Kubernetes, abstracting and streamlining the process. MLRun also supports scaling and configuring resources, such as GPU, Memory, CPU, etc. It provides a simple way to scale resources, without requiring users to understand the inner workings of Kubernetes.

What’s left is simply to monitor the functionality and behavior of your AI system once it’s live, which can also take place in MLRun.

Here’s how MLRun achieves this:

  • Orchestration – MLRun orchestrates workflows across all AI development and deployment tasks like data preprocessing, model training and fine-tuning, serving, etc. These pipelines are modular and components can be swapped out and replaced, future-proofing the architecture.
  • Auto-Scaling – MLRun allows auto-scaling deployments across the Kubernetes cluster.
  • Containerized Environment – MLRun packages models, code and dependencies into containers for Kubernetes-based deployment.
  • Serverless Model Serving – MLRun integrates with Nuclio, a high-performance serverless framework, to enable lightweight and scalable deployments.
  • Version Control – MLRun provides built-in versioning for datasets, code and models, ensuring reproducibility.
  • Artifact Management System – MLRun manages the artifact registry and enables managing artifacts by types (models, datasets and others), labels and tags. In addition MLRun stores relevant metadata such as model features, stats and more.
  • Real-Time Monitoring – MLRun integrates monitoring capabilities to track model performance, latency and resource utilization of individual workflows and deployments, and more – in real time.
  • Logs Forwarding – MLRun supports logs forwarding, and a clear and easy UI logs screen for debugging and checking your deployment logs.
  • MLRun integrates seamlessly –  with popular ML and deep learning frameworks like TensorFlow, PyTorch, Hugging Face and scikit-learn.

What is the AI model Lifecycle with MLRun? 

Here’s what the same process looks like, but with MLRun:

Before MLRun After MLRun
You want to run a batch fine-tuning job for your LLM, but your code requires a lot of memory, CPU, GPUs. It also needs a number of Python requirements packages to run and fine-tune the LLM. By using MLRun this flow is very simple. You only need to connect your local IDE to MLRun, create a project, create an MLRun function set and run your code using the relevant resources. With this flow, you can develop and run your code in a Kubernetes from the beginning of the development phase with only a few code lines.
You must run your code on your K8s cluster because your local computer doesn’t have enough resources. For this, you need to create a K8s resource and maybe a new Docker image with the new Python requirements. To run your code in a Kubernetes cluster, create an MLRun function that runs your Python code. Then,  add the amount of resources (memory, CPU and GPU), and add Python requirements.
MLRun will use those values and run your fine-tuning job in Kubernetes and manage the deployment.
Once you’ve successfully run the function on the K8s cluster, you need to version and track your experiment results (LLM and the fine-tune job results). This is essential to understand where and why you need to improve your fine-tune job. Now that you have a model that has been fine-tuned by the MLRun function, you can track the model artifact as part of the MLRun model artifactory, with the model version, labels or the model metrics.
In some projects, the model inference is done in a batch, in others it’s in real-time. If this is a real-time deployment, you need to create a K8s resource that serves the model with the user prompts or create a batch job that does the same. Both should run in the K8s cluster for production testing, and you need to manage those resources by yourself. In some projects, the model inference is done in a batch, in others it’s in real-time. In MLRun, you can do both. You can serve your LLM in real-time or collect the prompts and run the same in batch for the LLM evaluations, in just a couple of lines of code.
Once you serve the model, you need to monitor and test how your model is behaving and if the model outputs meet your criteria for deployment in production, using accuracy, performance or other custom metrics. Once you serve the model, monitor your LLM outputs and inputs and check the model performance and usage by enabling MLRun model monitoring. This is an essential part of the model development, helping you better understand if you need to retrain the model or the model outputs so they meet your criteria for deployment in production.
Once your project is ready to deploy in production systems you need to run some of the steps above in the production cluster again Once your project is ready for production, you can easily move your project from dev system and move the same project configuration to production system, by using MLRun CI/CD automation.

MLRun can take your code and run and manage your functions and artifacts in Kubernetes environments from your first deployment. This allows you to focus on development and decreases the time needed to deploy AI projects in production, while maintaining a production-first mindset approach.

How to Get Started with MLRun

1. On your laptop, install MLRun and configure your remote environment. Now you have your MLRun environment ready to develop your project from your laptop to production.

2. Create your MLRun project by using the MLRun SDK.

3. Run your Python code as an MLRun function. For a remote or batch function you can run your code locally or on your k8s cluster from the beginning of the development phase (always keep production mindset approach). You can also log models and different artifact types to your system experiment tracking management.

4. Based on the run and the experiment tracking you can monitor your result and make the way to production more easy and convenient.

More Resources:

See also

MLRun simplifies and automates the various stages of the AI lifecycle. Here are some key use cases where you can use MLRun:

Recent Blog Posts
Introducing MLRun v1.10: Build better agents, monitor everything
We’re proud to announce a series of advancements in mlrun v1.10 designed to power your end-to-end or...
Michal Eshchar
April 16, 2025
Introducing MLRun Community Edition
Gilad Shapira
April 16, 2025