Real-time usage metering makes the customer's balance the source of truth, settling every usage event against it as it happens instead of at month end. Only 43% of organizations can attribute AI cost to a single customer (CloudZero, May 2025), and the lag between usage and billing is where that visibility disappears. For workloads with variable per-request costs, that lag is a margin risk the invoice cannot surface until it is too late. This article explains the architecture decision behind honest billing numbers.Your billing system and your cost meter run on different clocksIn most AI products, the billing system reports a number, and the cost ledger reports a different one, because they settle at different times. Cost is incurred the moment an inference runs. Revenue is recognized when an invoice closes, days or weeks later. The gap between those two events is small in a subscription business and enormous in an AI business.It is enormous because AI workloads are not uniform. Bessemer, in a November 2025 analysis, observed that agentic workflows now consume 10 to 100 times more tokens per task than a typical single-shot request did in late 2023. When a single customer can swing the bill by two orders of magnitude inside one billing period, a system that totals usage at the end of the month is reporting history, not state. By the time the invoice is generated, the margin decision it implies has already been made for you.This is the problem behind the title. A billing system that "lies to your CFO" is rarely lying about totals. It is reporting accurately and late. It tells you in May which customers were unprofitable in April, after you have already served them through April and into May. For real-time monetization to mean anything, the billing system has to know the truth at the moment of usage, not at the close of the cycle.We did not arrive at this by deciding to build a usage-based billing product. We arrived at it from the opposite direction, and that origin shaped the architecture more than any feature request did.Where the numbers driftThe numbers drift because usage-based billing was retrofitted onto an architecture designed for a different job. Five forces make that drift visible, and each one is documented in named companies and dated data.The architecture was built for invoices, not eventsTraditional billing platforms came from subscriptions, where the customer is invoiced on a recurring basis and the invoice is the record of what is owed. When usage-based billing arrived, the industry's answer was to aggregate usage over the cycle, total it at the end, and append it to that invoice. The invoice stayed the source of truth. Usage became a line item feeding into it.That choice is now expensive to unwind. In January 2026, Stripe completed its acquisition of Metronome for a figure widely reported around $1 billion, specifically to obtain an event-first billing architecture it had not built into its own subscription-era data model. Stripe's own co-founder Patrick Collison framed the rationale plainly at the time, calling metered pricing the native business model for the AI era. The largest payments company in the world concluded it was cheaper to buy event-first billing than to retrofit it. That is a strong signal about how deep the architectural difference runs.When usage and pricing live in different systems, the invoice is wrong before it printsThe clearest public example of the failure mode is Fly.io, which ran usage telemetry in TimescaleDB and pricing in Stripe, and the two were decoupled. Stripe could not represent the usage dimensions the business actually billed on, such as regions, machine types, CPU counts, bandwidth, and data transfer. Invoices ballooned to thousands of line items with no way to group them by organization or app. Staff engineer Jon Phenow described the migration goal as finding the most direct path to tying pricing dimensions back to the dollars they generated.That is the structural version of lying to your CFO. When the system that records usage and the system that prices it are different systems, the invoice is assembled from a join that was never designed to be exact. It is wrong before anyone reads it.Margin drifts fastest exactly when you can least afford itThe financial consequence shows up as margin that moves while you are not watching. Replit's gross margin swung from 36% to negative 14% as its AI agent consumed more model compute than its flat per-checkpoint pricing covered. Replit addressed the reversal in mid-2025 by launching effort-based pricing tied to the compute each task actually used. Intercom's Fin agent, billed per resolution, produces a single-customer monthly bill that ranges from roughly $50 to $30,000 depending on volume. Same product, same headline price, wildly different cost exposure per account.The averages hide it. As Todd Gagne argued in a March 2026 analysis (Wildfire Labs), a customer who looks profitable at the median can turn unprofitable in the tail as query volume compounds. A billing system that reports the average is reporting the one number that conceals the problem.Most teams cannot even see cost per customerThe visibility gap is measurable. CloudZero's State of AI Costs report, published in May 2025, found that only 43% of organizations can attribute AI cost to an individual customer, and only 22% can attribute it to a specific transaction. Meanwhile the FinOps Foundation reported in February 2026 that 98% of FinOps practitioners now include AI in scope, up from 31% two years earlier. Almost everyone now agrees AI spend is a problem to manage. Most still cannot answer the first question a CFO asks: which customers are profitable?That gap is the literal version of the title. The aggregate AI bill is visible. The per-customer truth, the number that decides who to grow and who to throttle, is not.Pricing now changes faster than invoice-based systems can followThe last force is velocity. PricingSaaS, summarized by Kyle Poyar in December 2025, tracked more than 1,800 pricing changes across the top 500 SaaS and AI vendors during 2025. That averages 3.6 changes per company, with credit-based pricing adoption up 126% year over year. ICONIQ's January 2026 snapshot found 37% of AI companies planning another pricing change within twelve months. In June 2025, Cursor replaced a request cap with a credit pool priced at model rates. The backlash was sharp enough that CEO Michael Truell published an apology and opened a refund window in July.When pricing is applied at reconciliation, every mid-cycle change forces a retrospective correction. OpenAI described this directly in its account of migrating to an event-first billing stack: a pricing change that used to take six to eight weeks of engineering dropped to under an hour afterward. If your billing architecture cannot reprice without a code change, you cannot move at the speed the market is now moving.For context on where each platform sits architecturally: Orb, Metronome, and Lago are invoice-based usage billing systems, built to meter throughout a period and reconcile into an invoice at cycle end. Stripe Billing is subscription-first with metered add-ons. Stigg is a real-time orchestration layer; it checks entitlements before usage in real time, but billing settles downstream through Stripe, Zuora, or Chargebee at cycle end. Credyt is real-time end to end, settling usage against the balance as it happens. None of these is wrong; they are built for different jobs, which the next sections take seriously.The real question behind real-time billing is the source of truthThe architecture follows from one decision: what is the source of truth, and when does it update? Invoice-based systems answer "the invoice, at cycle end." Real-time systems answer "the balance, on every event." Everything downstream follows from that single choice.| \n | Invoice-based billing | Real-time billing ||----|----|----|| Source of truth | The invoice, computed at cycle end | The customer's balance, updated on every event || Settlement timing | Days to weeks after usage | The moment usage happens || Mid-cycle price change | Retrospective correction at reconciliation | Applies to the next event; nothing settled moves || Per-customer margin | Known after the cycle closes | Known as it happens || Best fit | Enterprise quarterly contracts, high-volume batch | Variable per-request costs like AI inference and agents |We did not set out to build usage-based billing. We came from embedded payments and revenue intelligence. The question we were actually asking was broader: what architecture lets one system process several revenue streams at once, embedded payments, usage-based charges, and fixed recurring billing, without a separate pipeline for each?That question forces you to name the primitive that everything else settles against. In traditional billing, that primitive is the invoice. The invoice cannot be the source of truth for a real-time system; it updates too rarely. The primitive that can is the customer's balance.This is also where the title's word, metering, needs precision. In an invoice-based system, metering and billing are two stages separated by a reconciliation job: you meter now and bill later. In a real-time system they collapse into one operation. Usage is captured, priced, and debited from the balance in a single atomic transaction. There is no metering layer feeding a billing engine on a delay, because there is no delay. That is why the numbers cannot drift: there is no window between recording usage and billing it for the two to disagree.A wallet is the customer's pre-funded balance plus the rules that govern it. The deceptively simple version is "track a balance in a database and debit it on every action." Then you pull the thread.What happens when credits expire? When a subscription bundles an included allowance alongside a separately purchased top-up? Which balance draws down first? Drawdown order is not cosmetic; it determines how credits become revenue and how cost is attributed.Consider one concrete case. A customer holds two grants in one balance: a monthly plan entitlement of 10,000 units that resets each cycle, and a purchased top-up of 5,000 units that does not expire. The correct rule is to spend the resetting entitlement first, so unused purchased credit is preserved and revenue is recognized in the right order. In a real-time system, that rule is a first-class property of the balance: every event draws down in the defined order, and the recognized-revenue number is correct the instant the event settles. In an invoice-based system, the same rule is reconstructed at reconciliation, which is where the gaps and the patches accumulate.This is exactly what Anyscale's CFO Jooree Na described, in a November 2025 statement, as the goal: a single source of truth paired with real-time visibility into usage. The point worth underlining is that a single source of truth is an architectural property, not a dashboard feature. You cannot add it on top of a system whose source of truth updates once a month.When invoice-based billing is the right callInvoice-based billing is not a legacy mistake. It is the correct design for a real set of businesses, and pretending otherwise would be its own kind of dishonesty.If your revenue runs on enterprise contracts with negotiated terms and quarterly true-ups, the invoice genuinely is the artifact that matters. It is what the customer's procurement team approves and what your auditor examines. Revenue recognition under ASC 606, with its variable-consideration constraints, is built around that document. For that motion, a system optimized to produce a correct, defensible invoice at cycle end is serving the actual need, and real-time authorization would add machinery you would not use.High-volume ingestion is the second honest case. When you are recording enormous event volumes and throughput matters more than instantaneous balance accuracy, batching is a feature, not a flaw. Metronome's engineering, described in a June 2025 Confluent case study, runs a streaming pipeline reported at 10,000 invoices per second. That is built for scale at cycle-level settlement, and it is genuinely good at it. If approximate, fast-path aggregation for dashboards is acceptable because billing settles later, invoice-based architecture fits cleanly.There is also a condition under which the whole thesis weakens. If inference cost per request collapses toward zero and stops being meaningfully variable, then the lag between usage and billing stops mattering, because there is no margin risk hiding in the gap. The argument for real-time settlement is strongest precisely because per-request cost today is high and variable. Should that change, the calculus changes with it. This is the same reason AI companies need real-time economic control now in a way that flat-margin SaaS never did.The real-time billing principle: your numbers are only as honest as your source of truthA billing system tells your CFO the truth at exactly the granularity and latency of its source of truth, and no finer. Pick the invoice, and you get cycle-end truth: accurate, defensible, and late. Pick the balance, and you get per-event truth: the margin on each workload, known as it happens. That is the whole decision, and it is made once, in the architecture, not later in a reporting layer.The mid-cycle pricing change makes the difference concrete. When pricing is applied at the moment of usage against the balance, changing a price means the next event prices differently and nothing already settled needs to move. When pricing is applied at reconciliation, the same change reaches backward into events that already happened, and someone has to write the correction. One architecture treats a price change as routine; the other treats it as a migration.So before you choose a billing tool, ask it one question: what is your source of truth, and when does it update? The answer predicts everything else. It predicts whether you can see per-customer margin today or next month. It predicts whether a pricing change is an afternoon or a sprint. It predicts whether your billing numbers and your cost numbers will ever fully agree.Credyt is the infrastructure we built to make the balance the source of truth. It captures usage, prices it, and debits the customer's balance in one atomic operation. Cost and revenue settle in the same transaction, and per-customer margin is visible as it happens rather than after the cycle closes. It exists because we could not buy that property and could not bolt it onto an invoice-based system; it had to be the foundation. For teams that want it without ripping out what they have, the practical path is adopting real-time billing alongside an existing stack, not in place of it.The honest framing is the one we kept returning to: we did not want to build a billing system so much as a system that knows your unit economics from the first event. The billing is how it earns its place; the truth is the point. See Credyt for AI companies.