🌟Vasilij’s note

The Economist put it plainly this week: the era of subsidised intelligence is ending. Uber burned its entire 2026 AI coding budget by April. One firm accidentally spent $500m on Claude tokens in a single month. Sam Altman called mounting customer costs "a huge issue." None of this is surprising to anyone who has been watching the agent deployment curve - but it is landing hard on finance teams who set their 2026 AI budgets in autumn 2025, before agentic tools detonated their assumptions. The pattern I keep seeing with clients is the same one playing out at enterprise scale: capability without accountability. The firms that will hold their margin through the pricing correction are the ones that treat token spend the way they treat any other operational cost - with visibility, forecasting, and governance. Everything else this week connects back to that.

In today's edition

This week in agents | What changed

The tokenmaxxing era officially ends

The Economist reports companies are scrambling to curtail soaring AI costs after Uber blew its entire 2026 AI budget in four months and one firm reportedly spent $500m in a single month. Sam Altman has called the cost problem "a huge issue." → What looked like aggressive adoption is now a governance crisis. The firms that built cost controls during the pilot phase are the ones with room to manoeuvre.

Linux Foundation launches Tokenomics Foundation

A new industry body - backed by SAP, IBM, JPMorgan Chase, ServiceNow, and others - will set open standards, benchmarks, and best practices for AI token economics, modelled on what FinOps did for cloud spend. Formal launch at FinOps X in San Diego, 8–10 June 2026. → Token governance is moving from internal spreadsheets to industry standards. FinOps teams that already understand cloud cost management are the natural owners of this problem.

Hermes Desktop launches as open-source alternative to Claude Cowork

Nous Research shipped the Hermes Agent Desktop App on 2 June - MIT-licensed, free, with 180,000+ GitHub stars in under four months. Self-improving: agents that build 20+ skills complete similar tasks 40% faster (token reduction, not quality uplift). → In a week when AI bills are making headlines, a zero-licence-cost agent with a local model option lands very differently. The cost conversation and the open-source agent conversation are now the same conversation.

Top moves | Signal → impact

  • Microsoft cancels Claude Code licences over cost

    Reported by The Verge and Axios, May–June 2026. Microsoft cancelled most of its internal Claude Code licences, partly over cost, six months after rolling them out to staff. Uber's COO told Business Insider that higher AI usage was not translating into proportionally more useful output. One enterprise client of an AI consultant spent $500m on tokens in a single month after failing to set usage limits on licences. Agentic tools run recursive loops that consume tokens at 10–30x the rate of a chatbot - a customer service agent that once used 500 tokens per query can require 15,000 in an autonomous loop. → The productivity premium from heavy AI usage is real but does not justify unconstrained spend. If you have rolled out agentic tools without per-team budgets, usage limits, or model routing policies, the bill is already accumulating. Audit before your next invoice.

  • Tokenomics Foundation formalises AI cost governance as a discipline Announced at FinOps X, San Diego, 3–10 June 2026. The Tokenomics Foundation will produce open billing specifications (built on FOCUS 1.4), benchmarks for evaluating fair token pricing, and a certification path for AI FinOps practitioners. Founding members include IBM, JPMorgan Chase, SAP, ServiceNow, Booking.com, and major hyperscalers. Global token usage is projected to grow 24x between 2026 and 2030, hitting 120 quadrillion tokens per month. The inference market is forecast to expand from $106bn (2025) to $255bn by 2030. → For consultancies, AI cost governance is becoming a client deliverable in its own right. Being early to the FOCUS specification will make audits considerably easier when clients - particularly in financial services and regulated sectors - start demanding it.

  • Hermes Desktop enters as a direct cost pressure on managed agents Launched by Nous Research, 2 June 2026. The Hermes Agent Desktop App is MIT-licensed, free, and supports local model deployment via Ollama - meaning zero API costs for firms with the capacity to self-host. It ships with a self-improving skills loop: agents that accumulate 20+ self-created skills complete similar tasks 40% faster (token consumption, not output quality). The same mechanism that makes it efficient is also a security consideration - self-written skills persist and execute later, creating a standing injection surface that requires audit. → At the moment enterprise AI bills are making front pages, a zero-licence agent with no API requirement is a meaningful alternative for internal workflows. The governance question decides whether it belongs in your firm - not the feature list.

Upskilling spotlight | Learn this week

Anthropic's Code with Claude: Managed Agents and Capability Curve (InfoQ)

Covers the architectural changes behind Managed Agents and Proactive Workflows announced at the Code with Claude event: how supervised execution differs from standard Claude API calls, what the capability curve means for deployment planning, and where the governance boundaries sit. Required reading before deploying any Claude-based coding agent in production.

MIT Technology Review: "It's Time to Address the Looming Crisis in Entry-Level Work" (MIT Tech Review)

A grounded analysis of what AI-driven coding and knowledge work automation means for junior hiring, capability building within firms, and the pipeline of expertise that mid-tier consultancies have relied on. Useful framing for any firm thinking about where human judgment and development remain non-negotiable.

Maker note | What I built this week

This week I put together the video comparing Hermes Desktop and Claude Cowork head-to-head - same discovery call prompt, same proposal task, both agents tested with three traps baked in (no invented stats, phase one pricing under the board threshold, adoption risk addressed).

Decision: Cowork for anything client-facing, Hermes sandboxed for internal experimentation - because the governance question, not the feature list, decides this for a professional services firm.

Operator’s picks | Tools to try

elvex

Use for: enterprise AI cost governance: token budgets, model routing, and per-team spend visibility without building internal tooling.

Standout: intelligent model routing typically cuts costs 60–80% by sending simple tasks to budget-tier models ($0.10–$1/M tokens) and reserving frontier models for agentic and reasoning tasks.

Caveat: requires someone to own the routing logic and review it as workflows evolve.

Hermes Desktop

Use for: internal, sandboxed agent experimentation where you want self-improving skills and local model support.

Standout: MIT-licensed, runs on Mac/Windows/Linux, supports Ollama for fully local deployment with no API costs.

Caveat: nine-day-old public preview - not production-ready for client-facing workflows. Download only from the official Nous Research site or GitHub repo; lookalike sites already exist.

TrueFoundry tokenmaxxing guide

Use for: a practical breakdown of the four enterprise failure modes driving runaway token costs: premium-model overuse, context stuffing, agent loops, and tokeniser drift.

Standout: explains the gateway controls that prevent each failure mode from compounding. Pair with your existing FinOps or cloud cost management tooling.

Deep dive | The Token Bill: Thesis & Playbook

The cost structure that made AI adoption feel frictionless is unwinding. AI labs priced aggressively to acquire customers; pending IPOs from Anthropic and OpenAI change that calculus. Agentic tools - which run recursive loops and consume 10–30x the tokens of a chatbot - arrived before finance teams understood the pricing model. The result is a wave of budget blowouts landing simultaneously at the moment pricing subsidies are about to end.

On paper
  • Uber gave 5,000 engineers access to Claude Code in December 2025; the company had consumed its entire annual AI budget by April 2026.

  • One enterprise reportedly spent $500m on AI tokens in a single month after failing to set usage limits on Claude licences.

  • One healthcare firm consumed 1 trillion tokens over six months - over $6m in unplanned spend before finance understood the driver.

  • Bain & Company projects AI spend could reach 20–30% of operating expenses within three to four years as agent usage scales.

  • Token prices have dropped significantly (GPT output tokens from $60/M to $8/M since 2024), yet total spend is rising because volume is growing faster than unit price falls.

  • Global token usage projected to grow 24x between 2026 and 2030, reaching 120 quadrillion tokens per month.

In practice
  • Most 2026 AI budgets were set in autumn 2025, before agentic tools detonated usage assumptions. Finance teams are rebuilding forecasting models with no historical baseline.

  • Agentic workflows consume tokens at 10–30x chatbot rates. A task that consumed 500 tokens in a prompt-response model requires 15,000+ in an autonomous agent loop.

  • Microsoft cancelled most of its internal Claude Code licences, partly over cost, six months after rolling them out - an early signal that even well-resourced organisations are hitting limits.

  • Enterprises using intelligent model routing (cheap models for simple tasks, frontier models for judgment checkpoints) are cutting costs 60–80% without measurable quality loss.

  • Heavy token users are roughly 2x more productive than low users - but they spend 10x the tokens. The productivity premium does not justify unconstrained spend.

  • The cultural problem: "tokenmaxxing" - teams optimising for token consumption as a visible productivity metric - has driven artificial inflation of spend with no corresponding output improvement. Internal leaderboards at some firms tracked token burn as a status signal.

Issues/backlash
  • The subsidised-intelligence window is closing. Anthropic and OpenAI are expected to go public later this year; pressure to turn a profit will mean pricing pressure on customers.

  • Opacity is a structural problem. Managed model pricing makes it difficult for enterprises to know whether they are paying a fair price for the value delivered - a gap the Tokenomics Foundation is designed to address.

  • Budget governance hasn't kept pace with capability adoption. Most firms lack the instrumentation to log which model, which team, and which workflow is driving spend - let alone to optimise it.

  • Price wars between Anthropic and OpenAI may provide short-term relief, but the inference market is projected to grow from $106bn (2025) to $255bn (2030). The cheap era is structural, not permanent.

My take (what to do)
  • Startup (15–40 staff): You are spending on a small number of workflows, which means visibility is achievable quickly. This week: pull your AI spend for the last 60 days, identify which workflows drove the majority, and check whether those are running on the most expensive models by default. For any workflow that runs more than 10 times weekly, map it to the cheapest model tier that delivers acceptable output. Run the ROI calculation: (time saved × hourly cost × 52 weeks) minus (token cost at optimised tier + 10% contingency). If the number is still positive at realistic token pricing - not subsidised pricing - you have a sustainable workflow. If not, pause it until pricing stabilises.

  • SMB (50–120 staff): You have enough workflow volume that unconstrained access is a real risk. Assign one person as "AI spend owner" - not a new role, but an explicit responsibility for an existing ops or delivery lead. Their job: a monthly token spend review, model routing policy (which tier for which task type), and a registry of agent workflows with their estimated monthly token cost. Set per-team budgets before you roll out any further agentic tools. The Tokenomics Foundation's emerging standards are worth tracking - being early to adopt open billing specifications will make audits easier when clients start asking.

  • Enterprise (150–250 staff): You are likely already experiencing budget pressure or will be in Q3. Three things to do before your next board review. First, instrument everything: log every model call with user, workflow, model tier, and token count. You cannot govern what you cannot see. Second, implement intelligent model routing - frontier models (Fable 5, Opus 4.8) for judgment checkpoints only; budget models for classification, summarisation, and routine generation. Third, review every client contract and DPA that touches AI-processed data - the 30-day retention requirement on Fable 5 and the incoming pricing changes from IPO-stage providers are material considerations for MSAs signed before late 2025. Brief your CFO on the Tokenomics Foundation timeline; having a named contact for AI FinOps in your organisation positions you ahead of the governance curve.

How to try (15-minute path)
  1. Pull your last 30 days of AI spend from your provider dashboard (Claude Console, OpenAI dashboard, or your API key billing page). Identify the top three workflows by token cost. (5 min)

  2. For each workflow, check: is it using a frontier model by default? Could it achieve the same output on a mid-tier model? Note the per-token price difference and estimate the monthly saving if rerouted. (7 min)

Success metric: one workflow identified where intelligent model routing would reduce monthly token cost by more than 30% with no quality loss - and a decision made to either reroute it this week or document why it legitimately needs the premium tier. (3 min)

"AI budgeting has recently become a huge issue for some companies, something that never came up earlier this year. People are really saying - it's kind of a meme now - 'My company spent my entire 2026 budget in Q1. Can you make this more efficient?'"

Sam Altman, CEO, OpenAI — Enterprise event, 3 June 2026

Spotlight tool | Tokenomics Foundation

Purpose: Open industry standard for AI token cost governance - benchmarks, billing specifications, and FinOps-style discipline applied to token-based AI spend.

Edge: Built on the FOCUS specification that already normalises cloud billing across providers. Backed by IBM, JPMorgan Chase, SAP, ServiceNow, and major hyperscalers - not a research project.

  • → Open billing specs that normalise token cost data across providers

  • → Benchmarks for evaluating whether you are paying a fair price per token

  • → Certification path for AI FinOps practitioners (distinct from cloud FinOps)

  • → Regional events: Amsterdam September 2026, London February 2027

What did you think of today's issue?

Login or Subscribe to participate

Did you find it useful? Or have questions? Please drop me a note., I respond to all emails. Simply reply to the newsletter or email [email protected].

This issue’s sponsor

n8n

An open‑source automation platform that lets you chain tools like DeepSeek, OpenAI, Gemini and your existing SaaS into real business workflows without paying per step. Ideal as the backbone for your first serious AI automations.

Refer and win

Share this newsletter for a chance to win!

Keep Reading