New Blog: Bringing (Gen) AI from Laptop to Production with MLRun

Deploying Gen AI in Production with NVIDIA NIM and MLRun

Watch: How to build a modular, production-grade GenAI chatbot using NVIDIA NIM + MLRun

In this end-to-end demo, you’ll learn how to create a multi-agent banking chatbot using NVIDIA NIM, LangChain, and MLRun. The demo walks through:

  • Deploying an LLM with a production-first mindset using MLRun’s serverless application runtime
  • Building an intent classification system that routes user queries to the right agent (loans, investments, general banking)
  • Operationalizing and monitoring LLMs with MLRun’s built-in LLM gateway—enabling cost optimization, versioning, and use-case-level observability
  • Using LLM-as-a-judge to automatically evaluate model quality
  • Wrapping the entire pipeline into a reusable, trackable gen AI application using MLRun’s workflow orchestration

See the chatbot in action as it dynamically responds to nuanced banking questions, switching between agents in real-time. This demo showcases how to move beyond experimentation and build robust, traceable, and scalable gen AI systems that deliver real value.