New Blog: Bringing (Gen) AI from Laptop to Production with MLRun

MLRun + NVIDIA NeMo: Building Observable AI Data Flywheels in Production

NVIDIA NeMo and Iguazio streamline training, evaluation, fine-tuning and monitoring of AI models at scale, ensuring high-performance, low latency and lowering costs

We’ve integrated MLRun with NVIDIA NeMo microservices, to extend NVIDIA’s Data Flywheel Blueprint. This integration lets you automatically train, evaluate, fine-tune and monitor AI models at scale, while ensuring low latency and reduced resource use. Read on for all the details:

What are NVIDIA NeMo Microservices?

NVIDIA NeMo is a modular microservices platform for building and continuously improving agentic AI systems.

It provides:

  • RAG implementations
  • Model customization
  • Model evaluation 
  • Guardrails for optimized agent behavior

What is an AI Data Flywheel?

  • A data flywheel is a process that continuously improves models and AI agents using production feedback loops. Inference results, business data and user preferences. Are fed back to the models, creating a continuous loop where AI models improve over time. According to NVIDIA, a high level flow of a Data Flywheel flow looks like this:

How MLRun + NeMo Work Together

  • Iguazio has collaborated with NVIDIA to power enterprise data flywheels with MLRun. MLRun acts as the flywheel orchestrator, wrapping the flywheel and powering training, fine-tuning to a specific use case, evaluation and monitoring. NeMo is the customizer and evaluator.

    How the integration works:

    1. Monitor – MLRun ingests interaction logs, evaluates performance, stability and resource usage. This helps organizations detect and mitigate risks associated with GenAI and AI.
    2. Train & EvaluateNVIDIA NeMo Customizer trains and fine-tunes with LoRA, p-tuning and supervised fine-tuning. NVIDIA NeMo Evaluator benchmarks candidate models with zero-shot, RAG and LLM-as-a-Judge. This is orchestrated by MLRun.
    3. Feedback – MLRun orchestrates feedback from human-in-the-loop decisions.
    4. Deploy – MLRun automates updates and redeployments.

    Use case example:

    Let’s say we want to improve a small model’s performance to match a larger model. The data Flywheel runs experiments against production logs against candidate models and surfaces efficient models that meet accuracy targets.

The Benefits of Using the Data Flywheel

  • 60% code reduction
  • End-to-end automation of monitoring, training, evaluation and fine-tuning
  • Continuous improvement
  • Faster and simpler LLM tuning 
  • Scalability across multiple models, workflows, and environments.
  • Lower inference costs + reduced latency.
  • Future-Proof – Models stay current via ongoing optimization.

Explore the joint Iguazio MLRun and NVIDIA blueprint to try for yourself.

Recent Blog Posts
MLRun v1.8 Release: with Smarter Model Monitoring, Alerts and Tracking
MLRun v1.8 adds features to make LLM and ML evaluation and monitoring more accessible, practical and...
Gilad Shaham
August 14, 2025
Fine-Tuning in MLRun: How to Get Started
How to fine tune an existing LLM quickly and easily with MLRun, with two practical hands-on examples...
Nick Schenone
August 14, 2025