Deploying Gen AI in Production with NVIDIA NIM and MLRun

Watch: How to build a modular, production-grade GenAI chatbot using NVIDIA NIM + MLRun

In this end-to-end demo, you’ll learn how to create a multi-agent banking chatbot using NVIDIA NIM, LangChain, and MLRun. The demo walks through:

Deploying an LLM with a production-first mindset using MLRun’s serverless application runtime
Building an intent classification system that routes user queries to the right agent (loans, investments, general banking)
Operationalizing and monitoring LLMs with MLRun’s built-in LLM gateway—enabling cost optimization, versioning, and use-case-level observability
Using LLM-as-a-judge to automatically evaluate model quality
Wrapping the entire pipeline into a reusable, trackable gen AI application using MLRun’s workflow orchestration

See the chatbot in action as it dynamically responds to nuanced banking questions, switching between agents in real-time. This demo showcases how to move beyond experimentation and build robust, traceable, and scalable gen AI systems that deliver real value.

Watch More

Demo

Smart Call Center Analysis App

Demo

LLM Monitoring

Session #29