Infrastructure Production AI Infrastructure on Kubernetes: Lessons from Running LLMs at Scale From GPU node pools to GitOps pipelines — the infrastructure decisions that keep a production AI platform running 24/7 on AKS.