Gilad Shapira, Author at MLRun.org

MLRun is an open-source framework that orchestrates the entire generative AI lifecycle, from development to deployment in Kubernetes. In this article, we’ll show how MLRun replaces manual deployment processes, allowing you to get from your notebook to production in just a few lines of code.

What is the Traditional AI Application Lifecycle?

As a data professional, you’re probably familiar with the following process:

You want to run a batch fine-tuning job for your LLM, but your code requires a lot of memory, CPUs and/or GPUs. It also needs a number of Python requirements packages to run and fine-tune the LLM.

You must run your code on your K8s cluster because your local computer doesn’t have enough resources. For this, you need to create a K8s resource and maybe a new Docker image with the new Python requirements.

Once you’ve successfully run the function on the K8s cluster, you need to version and track your experiment results (in this case the LLM and fine-tune job results). This is essential to understand where and why you need to improve your fine-tune job.

In some projects, the model inference is done in a batch, in others it’s in real-time. If this is a real-time deployment, you need to create a K8s resource that serves the model with the user prompts or create a batch job that does the same. Both should run in the K8s cluster for production testing, and you’ll need to manage those resources by yourself.

Once you serve the model, you need to monitor and test how your model is behaving and if the model outputs meet your criteria for deployment in production, using accuracy, performance or other custom metrics.

Once your project is ready to deploy in production systems you need to run some of the steps above in the production cluster again.

What are the Challenges in the Traditional AI Lifecycle?

The traditional process described above is fraught with challenges:

Engineering Technologies and Resources – Data teams, DevOps and engineers each use different technologies and frameworks. This creates technological friction and inconsistency across AI pipelines and silos between teams, demanding a solution to streamline and automate the entire process.

Resource Management – AI models, and especially LLMs, often require substantial memory and GPU resources, which are in low supply and costly. Plus, compute requirements are not consistent throughout the workflow. For example, data processing and training might require more resources. Enterprise teams need a solution to auto-scale and to allocate and monitor deployment resources easily.

Versioning and Experiment Tracking – Distributed systems are convoluted and dispersed and teams lack holistic visibility into them, making it complex to track changes, metrics and results for each model or artifact. This requires versioning capabilities and artifact management solutions.

Data Privacy – LLMs may handle sensitive user data, which needs to be safeguarded to protect user privacy and abide by compliance requirements. Guardrails must be implemented in any live business application.

Monitoring – Production models can degrade over time due to data drift and changing real-world conditions, leading to poor performance. Plus, LLMs might hallucinate or have inherent bias, requiring LiveOps and guardrails.

Kubernetes Complexity – Deploying models or running a user workflow in production requires extensive understanding of Kubernetes, like the ability to manage and deploy k8s resources, collect necessary logs and tune resource requests and limits. Most data professionals typically have expertise in other technologies. As a result, it is challenging to effectively run a job, serve the model and understand how their code is behaving in production for monitoring purposes.

The Core Advantages of MLRun

MLRun addresses these challenges by allowing you to easily run your local code in K8s production environments as a batch job or a remote real-time deployment. MLRun eliminates the need to worry about the complexity of Kubernetes, abstracting and streamlining the process. MLRun also supports scaling and configuring resources, such as GPU, Memory, CPU, etc. It provides a simple way to scale resources, without requiring users to understand the inner workings of Kubernetes.

What’s left is simply to monitor the functionality and behavior of your AI system once it’s live, which can also take place in MLRun.

Here’s how MLRun achieves this:

Orchestration – MLRun orchestrates workflows across all AI development and deployment tasks like data preprocessing, model training and fine-tuning, serving, etc. These pipelines are modular and components can be swapped out and replaced, future-proofing the architecture.

Auto-Scaling – MLRun allows auto-scaling deployments across the Kubernetes cluster.

Containerized Environment – MLRun packages models, code and dependencies into containers for Kubernetes-based deployment.

Serverless Model Serving – MLRun integrates with Nuclio, a high-performance serverless framework, to enable lightweight and scalable deployments.

Version Control – MLRun provides built-in versioning for datasets, code and models, ensuring reproducibility.

Artifact Management System – MLRun manages the artifact registry and enables managing artifacts by types (models, datasets and others), labels and tags. In addition MLRun stores relevant metadata such as model features, stats and more.

Real-Time Monitoring – MLRun integrates monitoring capabilities to track model performance, latency and resource utilization of individual workflows and deployments, and more – in real time.

Logs Forwarding – MLRun supports logs forwarding, and a clear and easy UI logs screen for debugging and checking your deployment logs.

MLRun integrates seamlessly – with popular ML and deep learning frameworks like TensorFlow, PyTorch, Hugging Face and scikit-learn.

What is the AI model Lifecycle with MLRun?

Here’s what the same process looks like, but with MLRun:

Before MLRun	After MLRun
You want to run a batch fine-tuning job for your LLM, but your code requires a lot of memory, CPU, GPUs. It also needs a number of Python requirements packages to run and fine-tune the LLM.	By using MLRun this flow is very simple. You only need to connect your local IDE to MLRun, create a project, create an MLRun function set and run your code using the relevant resources. With this flow, you can develop and run your code in a Kubernetes from the beginning of the development phase with only a few code lines.
You must run your code on your K8s cluster because your local computer doesn’t have enough resources. For this, you need to create a K8s resource and maybe a new Docker image with the new Python requirements.	To run your code in a Kubernetes cluster, create an MLRun function that runs your Python code. Then, add the amount of resources (memory, CPU and GPU), and add Python requirements. MLRun will use those values and run your fine-tuning job in Kubernetes and manage the deployment.
Once you’ve successfully run the function on the K8s cluster, you need to version and track your experiment results (LLM and the fine-tune job results). This is essential to understand where and why you need to improve your fine-tune job.	Now that you have a model that has been fine-tuned by the MLRun function, you can track the model artifact as part of the MLRun model artifactory, with the model version, labels or the model metrics.
In some projects, the model inference is done in a batch, in others it’s in real-time. If this is a real-time deployment, you need to create a K8s resource that serves the model with the user prompts or create a batch job that does the same. Both should run in the K8s cluster for production testing, and you need to manage those resources by yourself.	In some projects, the model inference is done in a batch, in others it’s in real-time. In MLRun, you can do both. You can serve your LLM in real-time or collect the prompts and run the same in batch for the LLM evaluations, in just a couple of lines of code.
Once you serve the model, you need to monitor and test how your model is behaving and if the model outputs meet your criteria for deployment in production, using accuracy, performance or other custom metrics.	Once you serve the model, monitor your LLM outputs and inputs and check the model performance and usage by enabling MLRun model monitoring. This is an essential part of the model development, helping you better understand if you need to retrain the model or the model outputs so they meet your criteria for deployment in production.
Once your project is ready to deploy in production systems you need to run some of the steps above in the production cluster again	Once your project is ready for production, you can easily move your project from dev system and move the same project configuration to production system, by using MLRun CI/CD automation.

MLRun can take your code and run and manage your functions and artifacts in Kubernetes environments from your first deployment. This allows you to focus on development and decreases the time needed to deploy AI projects in production, while maintaining a production-first mindset approach.

How to Get Started with MLRun

1. On your laptop, install MLRun and configure your remote environment. Now you have your MLRun environment ready to develop your project from your laptop to production.

2. Create your MLRun project by using the MLRun SDK.

3. Run your Python code as an MLRun function. For a remote or batch function you can run your code locally or on your k8s cluster from the beginning of the development phase (always keep production mindset approach). You can also log models and different artifact types to your system experiment tracking management.

4. Based on the run and the experiment tracking you can monitor your result and make the way to production more easy and convenient.

More Resources:

- Intro to what MLRun can do

- MLRun functions

- MLRun tutorials

- MLRun demos

Bringing (Gen) AI from Laptop to Production with MLRun

What is the Traditional AI Application Lifecycle?

What are the Challenges in the Traditional AI Lifecycle?

The Core Advantages of MLRun

What is the AI model Lifecycle with MLRun?

How to Get Started with MLRun

See also