Building a Production-Ready Multi-Agent FinOps System with FastAPI, LLMs, and React

Wait 5 sec.

Cloud dashboards show you the problem.\They don’t solve it.\Every organization running in AWS, Azure, or GCP eventually faces the same issue:Idle compute running for monthsOverprovisioned instancesOrphaned storageNo clear optimization decisionsNo ownership\What teams need is not another dashboard.\They need an intelligent control plane.\In this article, I’ll walk through how to build a production-ready multi-agent FinOps system powered by:FastAPI (backend orchestration)LLMs (structured reasoning)React (dashboard UI)Docker (deployment)\This is implementation-focused. Minimal theory. Real architecture.Architecture\\The Problem: Cost Data Without DecisionsMost FinOps tools stop at:Cost visualizationAlertsBasic rule-based recommendations\But real optimization requires reasoning: Should we downsize this instance? \n Is this idle volume safe to delete? \n What’s the performance risk?\Static rules are too rigid. \n Pure AI is too risky.\The solution: Rule-based triggers + LLM reasoning + human approval.System Architecture OverviewAt a high level:User → React UI → FastAPI → Agents → LLM → Structured Output → Human Approval\We separate responsibilities clearly:UI handles interactionAPI orchestratesAgents apply logicLLM provides contextual reasoningHumans approve execution\This keeps the system enterprise-safe.The Multi-Agent DesignInstead of one monolithic “AI service”, we use specialized agents.1. Diagnostic AgentDetects inefficiencies and optimization opportunities.2. Idle Cleanup AgentIdentifies unused resources that may be safely removed.3. Rightsizing AgentRecommends better instance sizing based on usage trends.\Each agent follows the same pattern:Apply deterministic rulesConstruct a structured contextCall the LLM with constrained instructionsValidate JSON outputReturn recommendation\This is not chat AI. \n This is constrained reasoning.Backend: FastAPI as the Control PlaneFastAPI acts as the orchestrator.\Example endpoint:@app.post("/analyze/idle")def analyze_idle(): data = fetch_cloud_metrics() result = idle_agent.analyze(data) return result\Responsibilities:Route requests to the correct agentInject telemetryEnforce policiesLog all decisionsValidate structured responses\The API layer is critical. \n It prevents LLM outputs from directly impacting infrastructure.Inside an AgentHere’s what a simplified DiagnosticAgent looks like:class DiagnosticAgent: def __init__(self, llm_client): self.llm = llm_client def analyze(self, resource): if resource["cpu_avg"] < 5 and resource["days_running"] > 14: return self._call_llm(resource) return NoneNotice:We do not send everything to the LLM.We filter first.\This reduces:CostLatencyHallucination riskConstrained LLM PromptingWe never ask open-ended questions.\We use structured prompts:You are a FinOps optimization engine.Given:- cpu_avg: 2%- monthly_cost: $430- environment: productionReturn:- recommendation- risk_level- estimated_savings- justificationOutput JSON only.We force:Role claritySchema constraintsDeterministic structure\The output must look like:{ "recommendation": "Downsize to t3.medium", "risk_level": "Low", "estimated_savings": 180, "justification": "CPU utilization below 5% for 30 days"}\If parsing fails, we reject it.\Never pass raw model text downstream.The Idle Cleanup AgentThis agent is more sensitive.\Deletion is high risk.\Example logic:if resource["attached"] is False and resource["days_idle"] > 30: flag = True\The LLM is not deciding whether to delete.\It classifies:Risk levelCompliance concernSavings estimate\Human approval is mandatory.The Rightsizing AgentRightsizing requires trend awareness.\We analyze:Average CPUPeak CPUMemory utilization30-day stability\Example:if cpu_avg < 40 and cpu_peak < 60: candidate = True\The LLM suggests a smaller instance while respecting performance buffers.\Again:Recommendation, not execution.React FrontendThe React dashboard shows:Optimization opportunityRisk levelEstimated savingsConfidence scoreApprove/Reject button\This turns AI output into decision support.\Not automation.Human-in-the-Loop ExecutionExecution flow:Frontend → Backend → Cloud API → Confirm status\\Key safeguards:No production deletion without approvalSnapshot before resizePost-change monitoringFull audit logging\AI assists. Humans decide.Dockerized DeploymentWe containerize:FastAPI serviceReact frontendOptional Redis / Postgres\Example Dockerfile:FROM python:3.11WORKDIR /appCOPY . .RUN pip install -r requirements.txtCMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]\This allows:Reproducible environmentsCloud-native deploymentEasy scalingProduction HardeningThis is where most AI projects fail.\Enterprise safeguards include:Schema validation on every LLM outputObservability (log prompts + responses)Retry logic with backoffEnvironment restrictions (prod guardrails)Role-based access controlVersioned promptsRate limiting\AI without guardrails is a liability.\AI with structure becomes leverage.Why This Architecture WorksIt balances:Rules + LLM + Humans\Instead of replacing decision-makers, it augments them.\The LLM:ExplainsQuantifiesSuggests\It does not:ExecuteOverride policiesBypass governance\That separation is what makes this production-ready.Key TakeawaysDon’t build AI monoliths — build specialized agents.Always filter before calling the LLM.Constrain prompts with explicit schemas.Validate outputs before using them.Keep humans in the loop for infrastructure changes.Log everything.\This is how you move from an AI demo to an enterprise system.Final ThoughtFinOps dashboards show cost.\Agentic AI systems generate action.\When designed correctly, multi-agent architectures can transform cloud cost management from reactive reporting to intelligent optimization.\The difference is not in using an LLM.\The difference is in how you architect around it.\Let’s connect 👇🔗 LinkedIn: \n https://www.linkedin.com/in/dhiraj-srivastava-b9211724/💻 GitHub (Code & Repositories): \n https://github.com/dexterous-dev?tab=repositories\