Opus 4.8 Made Claude Smarter. Token Discipline Got Urgent.

Wait 5 sec.

I’m Matt Burns, Chief Content Officer at Insight Media Group. Each week, I round up the most important AI developments, explaining what they mean for people and organizations putting this technology to work. The thesis is simple: workers who learn to use AI will define the next era of their industries, and this newsletter is here to help you be one of them.I need to start with a story I can’t verify and can’t stop thinking about. Axios relayed an AI consultant’s claim that one client spent half a billion dollars on Claude in a single month after failing to set usage limits on employee licenses. Polymarket ran with it, and the tweet has over 29 million views. Is the claim real? I’m doubtful like others. But it’s going viral, and that matters more than whether it’s factual, in part, because there were more claims like it this week as companies reported earnings. Almost every one of these viral cost-blowout stories is individually unverifiable, yet everyone who uses AI at scale now believes a version of it could happen to them. Together, they paint a picture of a straining bubble.The reckoning isn’t a verdict that AI doesn’t work. It’s the high cost for using it thoughtlessly. And the emblem of the whole moment is Anthropic’s new Opus 4.8, which launched late in the week. Opus 4.8 claims to be the smartest yet from Anthropic and seems like the easiest one yet to set money on fire. The tokenmaxxing era – spending tokens as a badge of being AI-forward – looks like it’s starting to end. The skill that replaces it is token discipline: the right model, in the right amount, for the right job. The workers and companies who learn this process will win. The ones who don’t will discover that the AI budget eventually comes out of something else.Opus 4.8 is the perfect emblem: smarter, and far easier to overspend onAnthropic shipped Opus 4.8 on Thursday, and Meredith Shubel walked through it for us — the changelog is more double-edged than it first looks. The headline price is unchanged from 4.7 and fast mode is now billed as three times cheaper than before. And the marquee feature, “dynamic workflows,” lets Claude Code plan a job and then run hundreds of parallel subagents in a single session to do things like a code migration across hundreds of thousands of lines from kickoff to merge. There’s also a new effort control, so you can dial how hard Claude thinks. Shubel framed that control perfectly: It’s a hedge against “AI shrinkflation,” for users worried about burning through rate limits faster than they expected.Read those features again and consider what each one does to your bill. Hundreds of subagents means hundreds of token meters running at once. There’s no premium pricing on dynamic workflows; the subagents eat tokens at standard Opus rates, which means cost scales with ambition, and ambition is the entire pitch for Claude Code. In one viral developer test, Opus 4.8 at max effort reportedly burned 16.5 million tokens and $17.26 on the same mid-size Cursor ticket that GPT-5.5 completed with 5.9 million tokens and $5.57. Same ticket but triple the cost. None of this is new math, exactly. Earlier this month I pointed to Ida Silfverskiöld’s breakdown on Towards Data Science, which found an unoptimized agent running 100 messages a day could hit roughly $2,490 a month — about 25x what the same agent costs once it’s tuned. And Opus 4.8 raises the bar even higher.The good is real: Opus 4.8 seems like the strongest Claude yet, and the effort dial baked-in is a good step towards token discipline. It gives engineers a way to tell the model when not to think hard, which is essentially a cost lever. The bad is obvious: the smarter the model, the bigger the workflows, and the easier it becomes to overspend on tokens by deploying a fleet of subagents when just a couple would have worked fine. That’s the defining tension of the year. Capabilities are increasing, but so is the cost. Tokenmaxxing looks like it’s dyingA few weeks ago, the fashionable thing inside big companies was “tokenmaxxing” — treating raw token consumption as an unlockable achievement badge for how AI-forward an employee was. This week, that era looked like it was ending. I want to be careful with that claim because the evidence consists of a handful of big names rather than a market-wide dataset. But the names are loud ones. My friend Jeremy Kahnwrote on Fortune this week: tokenmaxxing ran straight into Goodhart’s Law, where a measure that becomes a target stops being a good measure. Amazon pulled its internal “Kirorank” leaderboard after employees started pointing agents at pointless tasks to climb the rankings. Meta took its leaderboard down too last month. Reportedly, Amazon now tracks “normalized deployments,” meaning AI-generated code that’s actually useful.The cost stories kept stacking up. Axios reported that companies are getting AI sticker shock: Microsoft reportedly canceled most of its internal Claude Code licenses partly over cost, Uber’s COO said the spend is “harder to justify,” and one CTO found employees using enterprise models to check the weather. “Earnings Before Tokens” became the joke, and the leaderboards went dark the same week Anthropic announced a $65 billion round at a $965 billion valuation.I’m doubtful at the cut-and-dry accuracy of all of these stories. A few CEOs and four leaked ROI figures are a thin sample against the universe of companies now running AI, and I’d hold the “tokenmaxxing is dead” verdict loosely. But the anecdotes rhyme, and the direction is consistent enough to act on. Kahn’s real point is the one to hold onto: tokenmaxxing failed because spending tokens was never the goal. The value comes from redesigning how the work gets done, and most companies are stuck doing the AI equivalent of bolting an electric motor onto a Nissan and calling it a Ferrari. The reckoning isn’t AI failing. It’s the market starting to punish the companies that confuse activity with output. This is a discipline and orchestration problem.The winners are getting surgical and relying on engineers to pick modelsWhile some companies panic, the disciplined ones are just getting surgical about AI spend. Axios reported on the bargain hunt the day after the $500 million sticker-shock piece. Factory CEO Matan Grinberg, whose product routes each query to the cheapest model that can handle it, put it plainly to Axios: “There are many tasks you simply don’t need Opus for.” Open-model usage at Factory reportedly tripled in the last month relative to closed models. Micro1’s CEO says companies are switching to open-source models and purpose-built agents that are often both cheaper and better. Even Marc Benioff, staring down a roughly $300 million Anthropic bill, is openly wishing for a smart router to send only the hardest queries to the priciest model. On the dev side, Cursor’s Composer 2.5 now matches Opus 4.7 and GPT-5.5 on benchmarks at a fraction of the cost.CEOs are now scrambling to do what the power users already do: freely switch between models and providers. The reason no one wants to lock into a single lab is the fear of price-gouging once you’re captive. That’s the open-source thesis. We’ve covered the infrastructure that makes switching real: Steven J. Vaughan-Nichols wrote about llm-d on our site — the CNCF Sandbox project from IBM Research, Red Hat, and Google Cloud, with NVIDIA, CoreWeave, Hugging Face, and Mistral AI among the contributors — built to run any model, on any accelerator, in any cloud. And even the messy OpenClaw saga showed the appetite for model-portable agents: the whole point of the project is that switching models is a one-line change. Open source is becoming the sensible, disciplined option for a lot of tasks.And some companies are already operating in this way: This week I spent time with Asaf Wiener, a Wiz alum and the co-founder of the AI-native security startup Mate Security, for a New Stack profile running next week. He tells me that after an inference bill nearly ended his company, Wiener pushed model choice down to the people shipping features. Every backend engineer at Mate now writes evals and decides, per workload, which model to route to, including self-hosted open-source models running on the company’s own GPUs, which in their internal tests sometimes beat the frontier APIs on both cost and quality. The way he frames the shift sticks with me: His engineers “are not actually writing lines of code. They are orchestrating agents.” That’s the job now, and it’s the most practical answer to the cost problem I’ve heard, because it makes the people doing the work responsible for what it costs.You must treat models like a portfolio, not a religion, and push at least some of the cost decisions to the people closest to the work. The half-billion-dollar Claude story is almost certainly a myth, but the discipline problem underneath it is not. Tokenmaxxing could be on its way out. Token discipline needs to replace it, both for workers and the companies that employ them.The post Opus 4.8 Made Claude Smarter. Token Discipline Got Urgent. appeared first on The New Stack.