“Tokenmaxxing is real, expensive & it’s spreading”: New tools emerge to stop AI budgets from exploding

Wait 5 sec.

There’s a new weapon in the fight against tokenmaxxing.Tokenmaxxing, of course, occurs when an enterprise decides that AI token usage equates to productivity. But token usage can quickly become a vanity metric, and a business that treats token gluttony as a direct measure of productivity will likely fail to map token usage to desired outcomes. As a fad, Tokenmaxxing was wildly popular for a while, but it seems cooler heads are prevailing as the focus shifts to outcomes rather than just using AI for its own sake. Take the recent case of tokenmaxxing at Uber:Uber CTO Neppalli Naga told The Information last month that he’s “back to the drawing board because the budget [he] thought [he] would need is blown away already.” That budget was earmarked for Uber’s use of Anthropic Claude Code.For his part, Uber COO Andrew Macdonald responded a few weeks later, saying in a Rapid Response interview first reported on by Business Insider that Naga’s comments about blowing through the Claude budget created a “head-exploding moment” for the operations team.“Everyone was like, ‘Oh, head-exploding moment,'” Macdonald said. “We’re going to have to start talking about token consumption and the associated costs vs. headcount, and making trades on that as an engineering organization. “If you’re not able to draw a direct line to how many useful features and functionalities you’re shipping to your users, that trade can feel harder to justify.”Lexi Reese, co-founder and CEO of Lanai, underscores that the problem is occurring everywhere. Uber is just the latest high-profile company to experience it.“Tokenmaxxing is real, it’s expensive, and it’s spreading beyond just a few engineers or companies,” Reese tells The New Stack.Likely to create zones of bloated code, agentic sprawl, and areas where software applications might eventually become brittle or even vulnerable, tokenmaxxing is expensive and reduces visibility into the total system state.Lanai, an AI accountability company, aims to help enterprises understand where AI spend occurs, which workflows AI is applied to, and at what cost.The company recently debuted Token Tuner to identify where lower-cost models can reduce unnecessary token costs. It’s the latest tool that developers and leaders can use to control token usage by engineers and end users. The internet is full of top-ten lists for how to reduce token usage. Companies and organizations like Kong, Braintrust, LiteLLM, and Dynatrace, among others, offer tools to ensure token usage is being budgeted.“Tokenmaxxing is real, it’s expensive and it’s spreading beyond just a few engineers or companies.”Reese and team have positioned Token Tuner as a service that fills the missing context gap for enterprises by mapping token spend to workflows, model choices, efficiency, and value created. The software ties each AI interaction to a measurable outcome and generates a productivity score based on how well each user matched token usage and model choice to the task at hand. For example, an employee using Opus 4.7 for email responses is likely to receive a lower efficiency score than if they used a smaller model for the task. From tokenmaxxing to outcomemaxxingInstead of tokenmaxxing, Reece would like to see companies focus on outcomemaxxing to analyze which workflows are actually improving productivity.Currently in beta, one Lanai Token Tuner user delegated 4.2% of all AI leverage hours across the organization while using only 0.7% of tokens. Their efficiency score was 6.0, indicating they were matching tasks to the right models, while others were burning 10x as many tokens for half the efficiency.Lanai Chief Product Officer Mohit Mehta tells The New Stack that Token Tuner is an all-terrain vehicle, i.e., its scoring engine can calculate productivity scores when a single workflow spans multiple models simultaneously.“Productivity is estimated by the complexity of work delegated to AI as observed through prompt and tool activity by Lanai’s proprietary models,” says Mehta. “The model operates at the level of prompts and tool invocations independent of models and applications.”Tracking AI usage for business tasksAs we start to place greater emphasis on business results from applied technology deployments (even politicians have begun using the term “measurable outcomes” in recent times), we need to question which instrumentation is required at the API layer for Token Tuner to attribute tokens to specific business outcomes.“Lanai aggregates prompt interactions and associated tool activity for a given session and then runs proprietary models to calculate the task type and associated productivity gain and complexity,” explains Mehta. “This enables customers to go from contextless vendor invoice to connecting intent to value to cost at the interaction level.  No custom instrumentation is required for this functionality.”“Rather than relying on synthetic evaluations, we utilize observed outcome data, Our recommendations are grounded in how actual users within an organization achieve comparable results across different models.”In terms of how this technology drives business efficiency, business users may ask – when Token Tuner recommends a lower-cost model, is there a benchmark in place to assess output quality equivalence before surfacing the recommendation?“Rather than relying on synthetic evaluations, we utilize observed outcome data,” clarifies Mehta. “Our recommendations are grounded in how actual users within an organization achieve comparable results across different models. “Rather than a recommendation like ‘this will work for you,’ we provide empirical evidence that ‘teams in your company performed this exact workflow on Haiku with equal success,’ for example. This represents real-world preference at scale over synthetic benchmarks.”Key features include workflow-level value visibility, a service that shows which teams, workflows, and use cases are driving AI spend and whether that usage is tied to measurable business value. Productivity and efficiency measurement compares token spend with the leverage gained by users, teams, and workflows to show where AI creates the most value per dollar. A spend optimization recommendation function identifies runaway workflows, mismatched tasks, and premium model usage for work that lower-cost models could handle.AI’s next killer service: efficiency?First, the Earth cooled, and we just wanted AI… and the plain old predictive version was fine. Then, the dinosaurs died off, and we wanted domain-specific RAG-based intelligence with what then became agentic AI services that could work for us with human-in-the-loop oversight to ward off the rise of the robots. Now, perhaps, we want AI that is fit for purpose in the most applied sense of the term, so that we don’t use it where we don’t need to, and we use high-octane services only when we can really justify the turbocharge.In truth, AI’s next killer app factor will come down to a whole lot more than just business efficiency, but this could become a more prevalent part of the mix.The post “Tokenmaxxing is real, expensive & it’s spreading”: New tools emerge to stop AI budgets from exploding appeared first on The New Stack.