New Blog: Bringing (Gen) AI from Laptop to Production with MLRun

MLRun + NVIDIA NeMo: Building Observable AI Data Flywheels in Production

NVIDIA NeMo and Iguazio streamline training, evaluation, fine-tuning and monitoring of AI models at scale, ensuring high-performance, low latency and lowering costs

We’ve integrated MLRun with NVIDIA NeMo microservices, to extend NVIDIA’s Data Flywheel Blueprint. This integration lets you automatically train, evaluate, fine-tune and monitor AI models at scale, while ensuring low latency and reduced resource use. Read on for all the details:

What are NVIDIA NeMo Microservices?

NVIDIA NeMo is a modular microservices platform for building and continuously improving agentic AI systems.

It provides:

  • RAG implementations
  • Model customization
  • Model evaluation 
  • Guardrails for optimized agent behavior

What is an AI Data Flywheel?

  • A data flywheel is a process that continuously improves models and AI agents using production feedback loops. Inference results, business data and user preferences. Are fed back to the models, creating a continuous loop where AI models improve over time. According to NVIDIA, a high level flow of a Data Flywheel flow looks like this:

How MLRun + NeMo Work Together

  • Iguazio has collaborated with NVIDIA to power enterprise data flywheels with MLRun. MLRun acts as the flywheel orchestrator, wrapping the flywheel and powering training, fine-tuning to a specific use case, evaluation and monitoring. NeMo is the customizer and evaluator.

    How the integration works:

    1. Monitor – MLRun ingests interaction logs, evaluates performance, stability and resource usage. This helps organizations detect and mitigate risks associated with GenAI and AI.
    2. Train & EvaluateNVIDIA NeMo Customizer trains and fine-tunes with LoRA, p-tuning and supervised fine-tuning. NVIDIA NeMo Evaluator benchmarks candidate models with zero-shot, RAG and LLM-as-a-Judge. This is orchestrated by MLRun.
    3. Feedback – MLRun orchestrates feedback from human-in-the-loop decisions.
    4. Deploy – MLRun automates updates and redeployments.

    Use case example:

    Let’s say we want to improve a small model’s performance to match a larger model. The data Flywheel runs experiments against production logs against candidate models and surfaces efficient models that meet accuracy targets.

The Benefits of Using the Data Flywheel

  • 60% code reduction
  • End-to-end automation of monitoring, training, evaluation and fine-tuning
  • Continuous improvement
  • Faster and simpler LLM tuning 
  • Scalability across multiple models, workflows, and environments.
  • Lower inference costs + reduced latency.
  • Future-Proof – Models stay current via ongoing optimization.

Explore the joint Iguazio MLRun and NVIDIA blueprint to try for yourself.

Recent Blog Posts
MLRun v1.8 Release: with Smarter Model Monitoring, Alerts and Tracking
MLRun v1.8 adds features to make LLM and ML evaluation and monitoring more accessible, practical and...
Gilad Shaham
August 14, 2025
Fine-Tuning in MLRun: How to Get Started
How to fine tune an existing LLM quickly and easily with MLRun, with two practical hands-on examples...
Nick Schenone
August 14, 2025
Bringing (Gen) AI from Laptop to Production with MLRun
Find out how MLRun replaces manual deployment processes, allowing you to get from your notebook to p...
Gilad Shapira
August 14, 2025