🌟Vasilij’s note
On 12 June, at 5:21pm ET, the US Commerce Department issued an export control directive ordering Anthropic to disable Fable 5 and Mythos 5 for every foreign national, globally, with no advance notice. Firms running those models in production found them gone by the weekend. No SLA protects you from that. No contract clause anticipated it. This is not a criticism of Anthropic - they had no choice. It is the shape of renting capability from a vendor operating inside a geopolitical conflict. The operational question every firm should now ask is not "which cloud model should we use?" It is "which of our workflows can we afford to have switched off on a Friday evening, and which ones can we not?" That question now has a concrete, testable answer: local AI. This week I ran the comparison live - Hermes Desktop with Ollama against Claude Cowork - to find out what local actually delivers, where it falls short, and who it is genuinely for.
In today's edition
This week in agents | What changed
US government pulls Fable 5 and Mythos 5 overnight.
On 12 June, Anthropic was ordered to suspend access to its two most advanced models for all foreign nationals worldwide, including staff inside Anthropic. Because nationality verification at scale is not operationally feasible, the company disabled both models for all customers. Claude Opus 4.8 and other models remain available. → Any workflow running on a frontier model can be suspended by a government directive, not just deprecated by a vendor. Third-party AI risk now belongs in your contingency planning, not just your procurement review.
Hermes Desktop ships v0.17.0 - the "Reach" release.
Three weeks after its initial public preview, Nous Research's open-source agent has published its most substantial update yet: background subagents, iMessage integration via Photon, Cursor Composer model access, a full profile builder, and a rehauled Skills Hub browser. GitHub stars hit 180K+ in under four months. → Hermes is moving from developer experiment to operational tool faster than most expected. The governance question now decides whether it belongs in your firm - not the feature list.
Ollama crosses 52 million monthly downloads.
Q1 2026 data shows a 520x increase from 100K in Q1 2023. HuggingFace now hosts 135,000 GGUF-formatted models optimised for local inference. → Local AI is no longer a hobbyist category. The infrastructure has matured to the point where a practical deployment decision - not an experimental one - is now on the table for professional services firms handling sensitive client data.
Top moves | Signal → impact
US export controls hit frontier AI as a governance category, not just a product
Issued by the Commerce Department's Bureau of Industry and Security, 12 June 2026. The directive ordered Anthropic to suspend Fable 5 and Mythos 5 for any foreign national, whether inside or outside the US. The stated trigger was a reported jailbreak capability related to software vulnerability detection; Anthropic publicly disputed the proportionality of the response. The practical effect: all customers lost access to both models while Anthropic worked to comply - with no advance notice and no specific technical findings shared with the company before the order landed. → For consultancies with international staff or client-facing AI workflows: verify which models power your production systems today, map which workflows would fail if those models were suspended, and identify which of those failures you could tolerate versus which would be a client delivery risk. The Fable 5 suspension is a proof of concept for a risk that most firms had not modelled.
Hermes Desktop v0.16 and v0.17 ship in a single week
Two major releases in under seven days (19 June 2026). v0.17.0, the "Reach" release, adds background subagent execution, a new messaging channel (iMessage via Photon), Cursor Composer model access via xAI Grok, a full dashboard profile builder, and a rehauled Skills Hub. The self-improving loop - where agents with 20+ self-created skills complete similar tasks 40% faster in token consumption - has been corroborated by TokenMix's independent benchmarks (April 2026). The security picture is active: v0.16.0 closed 2 P0 issues, 62 P1 issues, and 16 security-tagged tickets in a single release window. → Hermes is iterating faster than most managed enterprise tools. The 40% efficiency gain from the skills loop is token reduction, not output quality improvement - a distinction worth keeping clear. If your firm is evaluating open-source agents, do not download from third-party mirrors; lookalike sites for Hermes Desktop are already circulating.
Local AI hardware cost maths have changed - but the hidden costs have not
A 50-person team running 13B–70B models on a dedicated workstation (RTX 5090, approximately £5,500 at current constrained market prices) lands at roughly £7,100 per year all-in: hardware amortisation, electricity, storage, and engineering time. Against a cloud API at comparable usage, the break-even is approximately 35.6 million tokens per month - at which point self-hosted saves 71–97% of marginal cost. Below that volume, cloud is almost certainly cheaper when you include the 2–4 hours of monthly administration and the engineering time to diagnose memory leaks, GPU OOM crashes, and model quantisation tradeoffs. → The "save money with local AI" argument is real at volume and for specific use cases. It is not a universal default. Run the actual maths for your firm before committing to hardware. The hidden cost is not electricity. It is the senior technical person whose time gets consumed keeping the infrastructure running.
Maker note | What I built this week
This week I built and filmed a side-by-side comparison: Hermes Desktop running locally via Ollama (Gemma 4 on a 12GB RTX 570) versus Claude Cowork on the cloud. Same prompt: a discovery call note → client proposal, with three embedded traps - no invented statistics, phase one pricing under board threshold, adoption risk acknowledged.
Decision: Cowork for anything client-facing; Hermes sandboxed for internal experimentation - because the local model struggled with file generation (docx via Python packages), whereas Cowork produced a fully formatted proposal with tables, colour, and layout in the same time.
The capability gap is real. So is the control gap. Both matter, and the right answer depends on which risk you are managing this week.
Upskilling spotlight | Learn this week
Local AI vs Cloud AI in 2026 (MindStudio)
A clear framework for deciding which tasks belong on local hardware and which require frontier cloud capability, with the break-even maths and hybrid architecture patterns to implement it. Required reading before advising clients on AI infrastructure decisions.
Anthropic Export Control Directive: Governance Analysis (Volkov Law)
Understand the legal and operational implications of the Fable 5 suspension, what belongs in your third-party AI risk framework, and how nationality-based access controls create enterprise compliance obligations that most firms have never had to address for a commercial software product.
Operator’s picks | Tools to try
Hermes Desktop (Nous Research)
Use for: internal, sandboxed agent experimentation where self-improving skills and local model support matter.
Standout: MIT-licensed, runs on Mac/Windows/Linux, supports Ollama for fully local deployment with zero API costs.
Caveat: still in public preview; production-ready for internal use only, not client-facing workflows. Always download from the official Nous Research domain.
Ollama
Use for: running open-weight models locally with an OpenAI-compatible API, no cloud dependency.
Standout: single-command model management, 52M monthly downloads, supports Gemma 4, Qwen 2.5, Llama 4 and 135K+ GGUF models.
Caveat: not a production server at scale - Ollama queues requests sequentially, and latency degrades linearly with concurrency above 5–10 users.
MindStudio
Use for: hybrid local/cloud agent workflows where some steps require privacy (sensitive data routed to a local Ollama instance) and others require frontier reasoning (routed to cloud).
Standout: 200+ models including Ollama and LM Studio alongside cloud APIs, through a single visual builder with no API key management required.
Pair with: Ollama for the local inference layer.
Deep dive | Thesis & Playbook
You Don't Own the Capability. You Rent It - and the Landlord Can Change the Locks.
On 12 June 2026, one of the most capable AI models available to UK consultancies was switched off overnight - mid-session, for every customer, with no advance warning. The immediate cause was a US government export control directive citing national security concerns about a potential jailbreak. The operational lesson is not about the politics. It is about the structure of the dependency. Every consultancy running workflows on cloud AI is renting capability from a vendor operating inside regulatory, geopolitical, and commercial pressures that have nothing to do with their client delivery. This week's event is not a one-off - it is the clearest demonstration yet of what renting capability means in practice.
On paper
Frontier cloud models remain materially more capable than anything you can run locally. Claude Opus 4.8 and GPT-5.5 involve hundreds of billions of parameters running on distributed GPU infrastructure that no firm under 250 staff can replicate. Local models - even the strongest open-weight options - deliver approximately 70–85% of frontier model quality for common workloads, according to benchmark data from Q1 2026. Qwen 2.5 32B, the current leading local model by quality-to-cost ratio, scores 83.2% on MMLU against GPT-4's reported 86.4%. Gemma 4, which I ran in the comparison this week on an RTX 570 with 12GB VRAM, is capable but clearly below frontier quality on complex reasoning and multi-step tool use.
The hardware cost is lower than most people assume. An RTX 570 (12GB, approximately £300–400) fits a standard PC and handles 7B–13B models at usable speeds for one to two concurrent users. A more serious deployment - RTX 5090, 32GB GDDR7, currently approximately £5,500 at constrained market prices - handles 33B–34B models entirely in GPU memory. The break-even against cloud API spend sits at roughly 35.6 million tokens per month for a competitively-priced cloud provider.
In practice
Where local AI works: air-gapped or genuinely confidential client data that cannot leave the building; research agents running high-volume, repetitive web scraping tasks (cost of electricity, not API spend); internal workflows where task complexity is bounded and the model does not need to generate complex file outputs or integrate with tooling chains.
Where local AI fails: complex file generation. In the comparison this week, Gemma 4 running via Hermes Desktop could not reliably generate a formatted .docx file using Python packages - a task that Claude Cowork handled without friction. This is not a model capability failure; it is a tooling integration problem that requires additional configuration time. For open-source deployments, nothing is exactly straightforward. The output quality gap on the proposal task was visible: Cowork produced colour-formatted tables, consistent layout, and a professionally structured document. The local model produced solid structured text with no formatting.
The hidden cost is engineering time. Local deployment needs 2–4 hours of monthly administration at small scale, rising to 1.5–2 full-time engineers per cluster at enterprise scale. GPU memory issues, model quantisation tradeoffs, OOM crashes, and inference engine bugs are real - and they land on someone's calendar, not on an SLA.
Security: local does not mean safe. Hermes Desktop has real access to the local machine and a skill library it can rewrite. It requires a code execution environment with significant machine access. The open-source repository is a standing injection surface - someone else can commit code that ends up on your machine. Managed cloud tools like Cowork have a substantially more controlled review and deployment process.
Issues/backlash
The Fable 5 shutdown has prompted a visible shift in how the open-source and self-hosting community discusses local AI - less as a cost play, more as a continuity and sovereignty play. The argument has changed from "local is cheap" to "local is ungovernable in a different way from cloud." The export control order has also raised questions about data processing obligations for firms with international workforces: the foreign national access trigger has enterprise-wide implications that most companies have never had to operationalise for a commercial software product.
On the hardware side: NVIDIA cut RTX 50-series production 30–40% in 2026, pushing street prices well above MSRP. The break-even calculation looks different at actual market prices versus theoretical list prices.
My take (what to do)
Startup (15–40 staff): For most of your work, a managed cloud tool is cheaper, more capable, and easier to govern. Accept the renting risk for what it is. The exception: if you handle genuinely confidential client data that cannot be processed by a third-party service - legal, medical, or regulated financial data - identify that specific workflow and evaluate a local deployment for it alone. Do not start with hardware. Start with Ollama on an existing machine to validate whether local model quality is sufficient for the specific task before buying anything.
SMB (50–120 staff): Run a continuity audit this week. For every AI-powered workflow in your delivery stack, document which model it depends on and what happens if that model is suspended with 48 hours notice. Classify each workflow: tolerable disruption versus client delivery risk. For the latter category, build a fallback path - either a secondary cloud model (Opus 4.8 remains available; GPT-5.5 is unaffected) or a local alternative for genuinely non-negotiable internal tasks. Assign one ops team member to own the AI dependency register; it does not need to be a separate role.
Enterprise (150–250 staff): The Fable 5 directive belongs in your third-party AI risk assessment framework immediately. Three actions before your next board review. First: map every production AI integration to its underlying model - not just the vendor, but the specific model version. Second: review client contracts and DPAs for any workflow touching foreign national data, given the nationality-based access control precedent this directive sets. Third: brief your legal and compliance teams on the export control landscape; the Commerce Department's use of BIS directives to restrict commercial AI is a new category of risk that standard software procurement frameworks do not address.
How to try (15-minute path)
Install Ollama from ollama.com and run
ollama run gemma3(orqwen2.5:7bfor better quality). Send one real work prompt - a summary, a draft, a classification task. Note the output quality and the response speed. (5 min)Take the same prompt to your current cloud tool. Compare output quality, formatting, and whether any tool use or file generation was required. Note where the gap is visible and where it is not. (5 min)
Success metric: a clear yes/no decision on whether local model quality is sufficient for one specific internal workflow you run regularly - and if yes, a decision to either pilot it this week or document why governance or tooling requirements prevent it. (5 min)
"To date, the government has only given us verbal evidence of a potential narrow, non-universal jailbreak, which essentially consists of asking the model to read a specific codebase and fix any software flaws. We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people."
Spotlight tool | Ollama
Purpose: Local inference runtime that makes running open-weight AI models on your own hardware as simple as a single terminal command. The infrastructure layer beneath most local AI deployments in 2026.
Edge: OpenAI-compatible API means existing tool integrations work without modification. 52 million monthly downloads and 135,000+ GGUF models on HuggingFace. Supports Gemma 4, Qwen 2.5, Llama 4, and most open-weight model families.
→ Zero marginal API cost once hardware is deployed
→OpenAI-compatible HTTP endpoint for drop-in integration
→No data leaves your machine - relevant for GDPR, HIPAA, and regulated client data
→Runs on Mac (Apple Silicon), Windows (CUDA), and Linux
Caveat: Not a production server. Sequential request queuing means latency degrades with concurrency. Pair with a reverse proxy and authentication layer before exposing to a team. For production-scale throughput, evaluate vLLM on the same hardware.
What did you think of today's issue?
Did you find it useful? Or have questions? Please drop me a note., I respond to all emails. Simply reply to the newsletter or email [email protected].
This issue’s sponsor
n8n
An open‑source automation platform that lets you chain tools like DeepSeek, OpenAI, Gemini and your existing SaaS into real business workflows without paying per step. Ideal as the backbone for your first serious AI automations.

Refer and win
Share this newsletter for a chance to win!

