New Blog: Bringing (Gen) AI from Laptop to Production with MLRun

Introducing MLRun Community Edition

MLRun CE is the out-of-the-box solution of MLRun for AI and ML orchestration and model lifecycle management.

MLRun Community Edition (CE) is the out-of-the-box solution of MLRun for AI and ML orchestration and model lifecycle management.

MLRun CE can be installed directly on your Kubernetes cluster or even on your local desktop. It provides a complete, integrated MLOps stack that combines MLRun’s orchestration power with Nuclio’s high-performance serverless engine, along with additional tools for data storage, monitoring, and more.

In this blog, we’ll explain how MLRun CE works and recommended use cases, and share how one of our users leverages MLRun CE for experiment and model tracking.

What is MLRun CE?

MLRun CE is ready to use out of the box. It is designed to simplify the entire lifecycle of LLM and ML projects, and provides a robust solution for complex MLOps needs (see examples below).

Under the Hood

By easily installing the MLRun CE Helm chart on your Kubernetes cluster or local desktop, you get a powerful, integrated environment for development. The platform is built on two cores: MLRun for MLOps orchestration and Nuclio for serverless computing.

MLRun is the MLOps orchestration framework that automates the entire AI pipeline, from data preparation and model training to deployment and management. It automates tasks like model tuning and optimization, enabling you to build and monitor scalable AI applications. With MLRun, you can run real-time applications over elastic resources and gain end-to-end observability.

Nuclio is a high-performance serverless framework that focuses on data, I/O, and compute-intensive workloads. It is the engine that powers the real-time functions within MLRun. Nuclio allows you to deploy your code as serverless functions, which are highly efficient and can process hundreds of thousands of events per second. It supports various data sources, triggers, and execution over CPUs and GPUs. It also supports real-time serving for generative AI use cases.

Key Integrations for a Complete MLOps Solution

MLRun CE easily integrates with several other tools. It includes an internal JupyterLab service for developing your LLM code and supports Kubeflow Pipelines workflow for creating multi-step AI pipelines. It also works with Kafka and TDengine for robust, real-time and batch model monitoring, and provides built-in support for Spark and Grafana for data processing and visualization.

Key Advantages of MLRun CE

Data and developer users of MLRun CE can benefit from:

  • Open-source MLOps Solution: MLRun CE is an open-source MLOps platform that you can quickly install on your Kubernetes cluster or local desktop by deploying the mlrun-ce chart.
  • Integrated MLOps Workflow: MLRun CE combines the MLRun orchestration framework with the Nuclio serverless engine to provide a complete MLOps solution. This allows users to seamlessly automate tasks from data preparation and model training to deployment and monitoring. This integration eliminates the need for teams to stitch together disparate tools, saving time and effort.
  • Rapid Production Deployment: The platform allows you to take your code from a Jupyter Notebook or your local IDE to a scalable k8s cluster with minimal changes. This significantly shortens product development, enabling faster iteration and business impact.
  • Scalability and Efficiency: With Nuclio as its serverless engine and MLRun as an MLOps orchestrator, MLRun CE can automatically and elastically scale resources based on demand. This ensures your workloads, whether batch or real-time, run efficiently, reducing computation costs. It’s particularly useful for resource-intensive tasks like LLM fine-tuning or inference.
  • Real Time and Batch Model Monitoring: MLRun CE includes a real-time and batch model monitoring solution, based on the fully out-of-the-box integration with Kafka and TDengine.  You can track models, compare results and performance metrics, and detect data drift or anomalous behavior. It also supports automated alerts for model exceptions, enabling proactive maintenance and ensuring continued model reliability.

Seamless Integrations: The platform integrates with a wide range of popular open-source tools, including Kubeflow Pipelines for workflow management, and Spark and Grafana for data processing and visualization. This open architecture gives you the flexibility to use the tools you already know and love.

MLRun CE Ecosystem

The following are the components that get installed when installing MLRun CE

  1. MLRun
  2. Jupyter notebook
  3. Kafka & TDengine
  4. Minio
  5. Nuclio (for realtime functions)
  6. Grafana & Prometheus
  7. MPI
  8. Minio
  9. Spark Operator

The picture below describes the relations between them. MLRun is the orchestrator and deploy function by using MLRun, Nuclio, Spark and MPI jobs runtimes. Grafana is used to monitor usage, Jupyter for out-of-the-box development platforms and Minio, MySQL & TDengine to store data.

Use Cases for the Community

MLRun CE can be used for a wide variety of MLOps use cases. In particular:

  • MLRun UI for project management – Track project experiments, artifacts, model performance, and manage project members and secrets
  • Batch jobs for retraining processes and user experimenting
  • Experiment tracking and results tracking, including duplication and debugging
  • Real-time serving for generative AI use cases
  • Alerts mechanism for threshold results like data drift 
  • Real-time functions for model inferences, realtime data processing and more
  • Runtime resource management and scaling
  • Models artifactory that allows users to track datasets and model artifacts by tags, labels and results

Spotlight: How One of Our Users Manages Their Experiments and Models with MLRun

One of our community users has adopted MLRun CE as their MLOps platform to deploy, track, and manage their ML training experiments and models. MLRun CE is deployed across Kubernetes environments.They run two main types of ML workflows run through MLRun CE. The first is manually triggered training jobs. MLRun CE runs the training function, logs metrics and datasets, and registers the model for deployment on edge devices. 

The second is automated periodic insight models, such as drift detection functions that compare recent data against training distributions and generate alerts when anomalies occur.

The team relies on MLRun CE’s full set of components: project management, batch functions, experiment tracking, model monitoring, and alerts.

With MLRun CE, their data science teams can:

  • Run experiment tracking
  • Ensure model integrity
  • Duplicate experiments
  • Easily debug experiment logs
  • Deploy models easily and quickly
  • Manage resources and scale

Getting Started with MLRun CE

Check out these resources for more information:

Recent Blog Posts
MLRun v1.8 Release: with Smarter Model Monitoring, Alerts and Tracking
MLRun v1.8 adds features to make LLM and ML evaluation and monitoring more accessible, practical and...
Gilad Shaham
December 1, 2025
Introducing MLRun v1.10: Build better agents, monitor everything
We’re proud to announce a series of advancements in mlrun v1.10 designed to power your end-to-end or...
Michal Eshchar
December 1, 2025
MLRun + NVIDIA NeMo: Building Observable AI Data Flywheels in Production
NVIDIA NeMo and Iguazio streamline training, evaluation, fine-tuning and monitoring of AI models at...
Baila - Iguazio
December 1, 2025

Bringing (Gen) AI from Laptop to Production with MLRun

Find out how MLRun replaces manual deployment processes, allowing you to get from your notebook to production in just a few lines of code.

MLRun is an open-source framework that orchestrates the entire generative AI lifecycle, from development to deployment in Kubernetes. In this article, we’ll show how MLRun replaces manual deployment processes, allowing you to get from your notebook to production in just a few lines of code.

What is the Traditional AI Application Lifecycle?

As a data professional, you’re probably familiar with the following process:

  1. You want to run a batch fine-tuning job for your LLM, but your code requires a lot of memory, CPUs and/or GPUs. It also needs a number of Python requirements packages to run and fine-tune the LLM.
  2. You must run your code on your K8s cluster because your local computer doesn’t have enough resources. For this, you need to create a K8s resource and maybe a new Docker image with the new Python requirements.
  3. Once you’ve successfully run the function on the K8s cluster, you need to version and track your experiment results (in this case the LLM and fine-tune job results). This is essential to understand where and why you need to improve your fine-tune job.
  4. In some projects, the model inference is done in a batch, in others it’s in real-time. If this is a real-time deployment, you need to create a K8s resource that serves the model with the user prompts or create a batch job that does the same. Both should run in the K8s cluster for production testing, and you’ll need to manage those resources by yourself.
  5. Once you serve the model, you need to monitor and test how your model is behaving and if the model outputs meet your criteria for deployment in production, using accuracy, performance or other custom metrics.
  6. Once your project is ready to deploy in production systems you need to run some of the steps above in the production cluster again.

What are the Challenges in the Traditional AI Lifecycle?

The traditional process described above is fraught with challenges:

  • Engineering Technologies and Resources – Data teams, DevOps and engineers each use different technologies and frameworks. This creates technological friction and inconsistency across AI pipelines and silos between teams, demanding a solution to streamline and automate the entire process.
  • Resource Management – AI models, and especially LLMs, often require substantial memory and GPU resources, which are in low supply and costly. Plus, compute requirements are not consistent throughout the workflow. For example, data processing and training might require more resources. Enterprise teams need a solution to auto-scale and to allocate and monitor deployment resources easily.
  • Versioning and Experiment Tracking – Distributed systems are convoluted and dispersed and teams lack holistic visibility into them, making it complex to track changes, metrics and results for each model or artifact. This requires versioning capabilities and artifact management solutions.
  • Data Privacy – LLMs may handle sensitive user data, which needs to be safeguarded to protect user privacy and abide by compliance requirements. Guardrails must be implemented in any live business application.
  • Monitoring – Production models can degrade over time due to data drift and changing real-world conditions, leading to poor performance. Plus, LLMs might hallucinate or have inherent bias, requiring LiveOps and guardrails.
  • Kubernetes Complexity – Deploying models or running a user workflow in production requires extensive understanding of Kubernetes, like the ability to manage and deploy k8s resources, collect necessary logs  and tune resource requests and limits. Most data professionals typically have expertise in other technologies. As a result, it is challenging to effectively run a job, serve the model and understand how their code is behaving in production for monitoring purposes.

The Core Advantages of MLRun

MLRun addresses these challenges by allowing you to easily run your local code in K8s production environments as a batch job or a remote real-time deployment. MLRun eliminates the need to worry about the complexity of Kubernetes, abstracting and streamlining the process. MLRun also supports scaling and configuring resources, such as GPU, Memory, CPU, etc. It provides a simple way to scale resources, without requiring users to understand the inner workings of Kubernetes.

What’s left is simply to monitor the functionality and behavior of your AI system once it’s live, which can also take place in MLRun.

Here’s how MLRun achieves this:

  • Orchestration – MLRun orchestrates workflows across all AI development and deployment tasks like data preprocessing, model training and fine-tuning, serving, etc. These pipelines are modular and components can be swapped out and replaced, future-proofing the architecture.
  • Auto-Scaling – MLRun allows auto-scaling deployments across the Kubernetes cluster.
  • Containerized Environment – MLRun packages models, code and dependencies into containers for Kubernetes-based deployment.
  • Serverless Model Serving – MLRun integrates with Nuclio, a high-performance serverless framework, to enable lightweight and scalable deployments.
  • Version Control – MLRun provides built-in versioning for datasets, code and models, ensuring reproducibility.
  • Artifact Management System – MLRun manages the artifact registry and enables managing artifacts by types (models, datasets and others), labels and tags. In addition MLRun stores relevant metadata such as model features, stats and more.
  • Real-Time Monitoring – MLRun integrates monitoring capabilities to track model performance, latency and resource utilization of individual workflows and deployments, and more – in real time.
  • Logs Forwarding – MLRun supports logs forwarding, and a clear and easy UI logs screen for debugging and checking your deployment logs.
  • MLRun integrates seamlessly –  with popular ML and deep learning frameworks like TensorFlow, PyTorch, Hugging Face and scikit-learn.

What is the AI model Lifecycle with MLRun? 

Here’s what the same process looks like, but with MLRun:

Before MLRun After MLRun
You want to run a batch fine-tuning job for your LLM, but your code requires a lot of memory, CPU, GPUs. It also needs a number of Python requirements packages to run and fine-tune the LLM. By using MLRun this flow is very simple. You only need to connect your local IDE to MLRun, create a project, create an MLRun function set and run your code using the relevant resources. With this flow, you can develop and run your code in a Kubernetes from the beginning of the development phase with only a few code lines.
You must run your code on your K8s cluster because your local computer doesn’t have enough resources. For this, you need to create a K8s resource and maybe a new Docker image with the new Python requirements. To run your code in a Kubernetes cluster, create an MLRun function that runs your Python code. Then,  add the amount of resources (memory, CPU and GPU), and add Python requirements.
MLRun will use those values and run your fine-tuning job in Kubernetes and manage the deployment.
Once you’ve successfully run the function on the K8s cluster, you need to version and track your experiment results (LLM and the fine-tune job results). This is essential to understand where and why you need to improve your fine-tune job. Now that you have a model that has been fine-tuned by the MLRun function, you can track the model artifact as part of the MLRun model artifactory, with the model version, labels or the model metrics.
In some projects, the model inference is done in a batch, in others it’s in real-time. If this is a real-time deployment, you need to create a K8s resource that serves the model with the user prompts or create a batch job that does the same. Both should run in the K8s cluster for production testing, and you need to manage those resources by yourself. In some projects, the model inference is done in a batch, in others it’s in real-time. In MLRun, you can do both. You can serve your LLM in real-time or collect the prompts and run the same in batch for the LLM evaluations, in just a couple of lines of code.
Once you serve the model, you need to monitor and test how your model is behaving and if the model outputs meet your criteria for deployment in production, using accuracy, performance or other custom metrics. Once you serve the model, monitor your LLM outputs and inputs and check the model performance and usage by enabling MLRun model monitoring. This is an essential part of the model development, helping you better understand if you need to retrain the model or the model outputs so they meet your criteria for deployment in production.
Once your project is ready to deploy in production systems you need to run some of the steps above in the production cluster again Once your project is ready for production, you can easily move your project from dev system and move the same project configuration to production system, by using MLRun CI/CD automation.

MLRun can take your code and run and manage your functions and artifacts in Kubernetes environments from your first deployment. This allows you to focus on development and decreases the time needed to deploy AI projects in production, while maintaining a production-first mindset approach.

How to Get Started with MLRun

1. On your laptop, install MLRun and configure your remote environment. Now you have your MLRun environment ready to develop your project from your laptop to production.

2. Create your MLRun project by using the MLRun SDK.

3. Run your Python code as an MLRun function. For a remote or batch function you can run your code locally or on your k8s cluster from the beginning of the development phase (always keep production mindset approach). You can also log models and different artifact types to your system experiment tracking management.

4. Based on the run and the experiment tracking you can monitor your result and make the way to production more easy and convenient.

More Resources:

See also

MLRun simplifies and automates the various stages of the AI lifecycle. Here are some key use cases where you can use MLRun:

Recent Blog Posts
MLRun v1.8 Release: with Smarter Model Monitoring, Alerts and Tracking
MLRun v1.8 adds features to make LLM and ML evaluation and monitoring more accessible, practical and...
Gilad Shaham
April 16, 2025
Introducing MLRun v1.10: Build better agents, monitor everything
We’re proud to announce a series of advancements in mlrun v1.10 designed to power your end-to-end or...
Michal Eshchar
April 16, 2025
MLRun + NVIDIA NeMo: Building Observable AI Data Flywheels in Production
NVIDIA NeMo and Iguazio streamline training, evaluation, fine-tuning and monitoring of AI models at...
Baila - Iguazio
April 16, 2025