Sally O'Malley explains the unique observability challenges of LLMs and provides a reproducible, open-source stack for monitoring AI workloads. She demonstrates deploying Prometheus, Grafana, OpenTelemetry, and Tempo with vLLM and Llama Stack on Kubernetes. Learn to monitor critical cost, performance, and quality signals for business-critical AI applications. By Sally O'Malley