The new FinOps problem isn’t cloud bills

Wait 5 sec.

At Google Cloud Next this month in Las Vegas, The New Stack sat down with Finout co-founder and CEO Roi Ravhon and Pathik Sharma, who leads cloud FinOps at Google Cloud, to talk about how the financial discipline that grew into its own around managing cloud costs is now quickly being rewired for the AI era.“We need to do the same thing we did for cloud to AI, but we’re doing it in a year.” — Roi Ravhon, co-founder and CEO, FinoutIn this episode of The New Stack Makers, our two guests talk about why token economics is forcing FinOps to evolve faster than during the cloud era, why agentic FinOps tools need deterministic guardrails to be useful, and why both still recommend that anyone new to the discipline should start with the FinOps Foundation, not a FinOps vendor.Token economics, on a clockCloud had a decade to grow up around FinOps. AI, Ravhon argues, is only getting about a year because the economics of running AI in production are breaking the discipline’s old assumptions.The first difference is that even though token prices keep falling, AI costs for enterprises keep climbing. Anthropic and OpenAI both released new flagship models around the time we recorded, but the new reasoning models are “thinking 3x as much,” Ravhon says, which means they use more tokens to complete the same task.The other major difference is that the cost of the same prompt isn’t fixed. “You ask the same question twice, and you get different token usage for everything,” Ravhon says. “So how can this scale?”CFOs’ patience for that level of unpredictability is running low. Ravhon says CFOs started this cycle with “unlimited budgets […], let’s be innovative, because being innovative is super important.” The conversation has now circled back to ROI.Don’t reach for Thor’s hammer when you don’t need it. —Pathik Sharma, cloud FinOps at Google CloudGoogle’s Sharma picks up that thread with an analogy he says he’s been using with customers: Don’t reach for Thor’s hammer when you don’t need it. “We started using [Gemini] Pro [models] for everything,” he recalls a customer telling him. “Can you summarize this email for me? Can you help me write better emails for me?” But most of those use cases are perfectly served by Flash, Google’s smaller and significantly cheaper Gemini model. The FinOps discipline isn’t about asking every employee to memorize which model fits which task; it’s about building an orchestration layer beneath that routes each request to the cheapest model that can reliably answer it.In Sharma’s broader framing, LLM API spend is only one slice of the AI bill. The cost of running AI also stretches across GPUs and TPUs (still scarce), training compute, inference compute, storage for the data that feeds them, and the org-side cost of putting AI to use. Sharma also cites recent Stanford University research that confirms an earlier finding: “For every $1 of tangible tech investment, companies spend up to $10 on intangibles (process redesign, reskilling, organizational transformation).” The technology this time is AI.The other half of the answer is running smaller models closer to the user. Sharma says he installed Gemma, Google’s small open model, on his phone. It’s under 4 GB, capable of summarization, OCR, and translation on-device. In his view, this shows that not every request needs a frontier model.Don’t ask the LLM to fix your KubernetesFinOps, Ravhon says, is “all about small problems that you need to fix.” A large enterprise with thousands of developers gets a constant stream of right-sizing recommendations and anomalies on redundant infrastructure that no one cares about individually. But for the company, this adds up to a lot of money.The traditional answer was headcount plus a culture push to get engineers to care. In the agentic era, there is a temptation to simply throw an LLM at the problem and let it solve it. But that doesn’t really work, Ravhon argues. “FinOps is a partially deterministic problem, so you can’t 100% count on LLMs to do stuff,” he says. Right-sizing has hard thresholds, and anomaly detection has math behind it. But, in his experience, LLMs can “convince themselves that they’re right when they want to be right.”The architecture Ravhon describes for agentic FinOps, in his framing, is deterministic and a bit like putting together Lego bricks. The boring parts stay deterministic, and the agentic layer stitches them together. “If you want detection, detection is deterministic. Don’t try to reinvent the wheel,” he says. Enrichment and context analysis, by contrast, are “great workloads for agents.” Before anything destructive — like terminating a server — a deterministic check or a human approval step must be in place before the LLM.Sharma’s framing is to think about onboarding an AI agent the way you’d think about onboarding a new SRE, with a focus on standards, scoped permissions, and a playbook for how the team should make decisions. His example is Kubernetes right-sizing on GKE. “You’re not saying, LLM, go fix my Kubernetes,” Sharma says. Instead, he argues, you give the agent the same signals an SRE would use: golden signals, requests versus limits, observability metrics for the last 30 days, p99 vCPU utilization, peak memory. The agent then produces a recommendation as a pull request for the application owner to approve or deny. “Now you instantly build that trust,” he says. “I know where this recommendation is coming from. I know it’s contextual.”It was never about the toolWhen asked where someone new to FinOps should start, neither led with their own product.“Sign up to the FinOps Foundation,” Ravhon says. “FinOps is first and foremost an organizational problem that we’re trying to solve. Just buying a FinOps tool is not going to solve the problem.”“FinOps is first and foremost an organizational problem that we’re trying to solve. Just buying a FinOps tool is not going to solve the problem.” —Finout CEO Roi RavhonTools come second, Ravhon argues. The culture change has to come first, with cross-team accountability, engineering teams that actually care about cost, and a relationship with cloud spend that treats it as an investment. “Only when you understand that you need a tool to continue scaling, this is the time you need to talk to Finout or an equivalent tool,” he says.Sharma agrees. “No matter who you are, if you are working with cloud, you hold keys to the kingdom,” he says. “As said in the Spider-Man movie, with great power comes great responsibility.” If everyone running infrastructure starts looking at it from a value perspective rather than a pure cost perspective, he argues, the rest, including accountability, efficiency, and governance, follows automatically.The post The new FinOps problem isn’t cloud bills appeared first on The New Stack.