🌟Vasilij’s note

This week I filmed something I've been meaning to prove for a while: a three-year-old business laptop with no dedicated GPU, wifi switched off, running a capable AI model that reads a confidential document and writes code - nothing leaving the machine. The capability gap versus cloud is real. So is the control case for using local AI. But what struck me most was how few consultancies have actually asked the right question. Not "should we use AI?" Not even "which model?" The question is: which AI, for which data? Get that right and most of the risk disappears without banning anything. This week's edition is built around that question.

In today's edition

This week in agents | What changed

Gartner forecasts over 40% of agentic AI projects will be cancelled by end of 2027.

Published this week as part of Gartner's updated strategic predictions, the figure cites three primary causes: escalating costs, unclear business value, and inadequate risk controls. The same report shows AI agent software spending is projected to hit $206.5 billion in 2026 - up 139% from $86.4 billion in 2025 - making it the fastest-growing segment of enterprise software. The contradiction is stark: spending is accelerating sharply while more than four in ten projects are on track to fail. Gartner's own data shows only 21% of organisations have a mature governance model for autonomous AI agents. → For consultancies evaluating or selling agentic AI: the cancellation risk is not a technology problem. It is a scoping and governance problem. Projects that start with a single constrained workflow, a clear ROI threshold, and a human-in-the-loop architecture are the ones that survive. The firms shipping agents into production without those foundations are the ones inflating the 40% figure.

EU Parliament approves Digital Omnibus on AI, deferring high-risk deadlines by 16 months.

On 16 June, the European Parliament voted 423 to 57 to approve the Digital Omnibus package. High-risk AI obligations for stand-alone Annex III systems (employment, credit scoring, biometrics) move from 2 August 2026 to 2 December 2027. Transparency duties in August 2026 remain unchanged. Formal Council adoption is expected before 2 August. → Firms that paused AI deployment pending August deadlines have breathing room. Firms that paused compliance work should not use this as permission to stop.

Spanish data protection agency flags AI conversation tracking.

The AEPD sent a warning to the European Data Protection Board citing research showing popular AI assistants embed third-party trackers – including analytics and advertising scripts from Google, Meta, and TikTok – that can access user conversation data. The finding covers consumer and, in some configurations, business-tier tools. → If your team is using AI tools on client data, confirm whether tracking is disabled on your plan and whether your DPA covers it.

Top moves | Signal → impact

  • Local AI runtimes hit practical deployment standard

    LM Studio 0.4.16, Ollama 0.24.0, and the llmster headless daemon shipped across May and June 2026, collectively maturing the local inference stack from a developer experiment to an operational option for non-technical teams. Qwen3.5 now beats GPT-5-mini on most benchmarks whilst running on a MacBook with 64GB RAM. LM Studio added a native iPhone app with end-to-end encrypted remote access via Tailscale. Ollama added Codex App support and reworked MLX sampling for Apple Silicon. → For regulated professional services – legal, HR, financial – the question is no longer whether local AI is capable enough. It is whether governance is in place to use it correctly. Local models are a credible option for sensitive workloads today; the deployment decision is now operational, not experimental.

  • GDPR Article 28 is now a practical AI tool selection issue

    Research published in March 2026 confirmed that on consumer AI plans – ChatGPT Plus, Claude Pro, Gemini individual – you formally lack the legal basis to process clients' personal data under GDPR Article 28. Business plans with active Data Processing Agreements cover most cloud deployments, but only if configured correctly and signed. Separately, the Spanish data protection authority flagged that popular AI assistants embed third-party trackers from Google, Meta, and TikTok that can access conversation content. → Every cloud AI tool that processes client data requires a DPA on file – not assumed, confirmed. Local inference eliminates the DPA requirement for the processing step entirely, which is a meaningful operational simplification for firms working under client NDAs or in regulated sectors.

  • EU Digital Omnibus defers high-risk AI rules by 16 months

    On 16 June, the European Parliament voted 423 to 57 to approve the Digital Omnibus package. High-risk AI obligations for stand-alone Annex III systems – covering employment, credit scoring, and biometrics – move from 2 August 2026 to 2 December 2027. Transparency duties in August 2026 remain unchanged: chatbot disclosure and AI-generated content labelling still apply. Formal Council adoption is expected before 2 August. → Operators can resume AI deployment in employment, HR, and credit workflows without the August 2026 deadline pressure. Use the 16 months to build proper governance rather than treat the delay as a green light to skip it.

Maker note | What I built this week

This week I filmed a side-by-side: a three-year-old ThinkPad with integrated graphics only, wifi off, running Google Gemma 3n in LM Studio against a cloud tool on the same tasks – parsing a confidential document and generating a script.

Decision: local for sensitive, contained tasks; cloud for anything requiring complex file output or frontier reasoning – because the local model produced solid structured text whilst cloud produced a fully formatted, colour-coded document in the same time.

The capability gap is real. The privacy case is also real. Which one you're managing this week depends entirely on what the data is.

Upskilling spotlight | Learn this week

BlueDot Impact – AI Safety Courses (London / San Francisco)

Move yourself or a team member into work that makes AI go well. BlueDot is a non-profit talent accelerator that has trained 7,000+ people since 2022. Free, pay-what-you-want courses covering Technical AI Safety, a Project Sprint, and an AI Safety Operations Bootcamp. Widely seen as the main on-ramp into AI safety careers. Worth knowing about as the governance and compliance conversation around AI grows – the talent pipeline matters.

GDPR DPA Guide: Data Processing Agreements (ComplianceStack)

understand exactly what Article 28 requires before deploying any AI tool on client data – what a DPA must contain, when it is legally required, how sub-processor chains work, and what to look for when reviewing a vendor-provided template. Updated 22 June 2026. Practical reference for any firm reviewing its AI tool stack.

Operator’s picks | Tools to try

LM Studio (Element Labs)

Use for: running open-weight models locally with a GUI, offline document analysis, and sensitive client data workflows where nothing should leave the machine.

Standout: ships a local API server, CLI (lms), and headless daemon (llmster) for server deployments – one app covers the full workflow from model discovery to production serving. Free for home and commercial use.

Caveat: complex file output (formatted .docx, charts) requires additional tooling; not a replacement for cloud on multi-step document generation.

Open WebUI

Use for: giving a whole team access to local models through a browser-based interface, without requiring each person to install anything locally.

Standout: one-command Docker install, multi-user authentication, document upload for local RAG, and 399,000-person community. Supports any Ollama or OpenAI-compatible backend. 126,000+ GitHub stars. Free and open source.

Caveat: requires a machine to run the server; not a managed service. Add a reverse proxy with authentication before exposing beyond a local network.

AnythingLLM

Use for: private document RAG with proper team permissions – query internal documents, client files, or knowledge bases locally without sending data to a third-party API.

Standout: lowest hallucination rate in May 2026 benchmarks (6% on a 5,000-page corpus vs 11–14% for alternatives), swappable embedders, and multi-user workspaces with document-level access controls. Desktop and server versions available.

Caveat: switching embedding models requires a full re-index; budget 30–90 minutes per 5,000 pages on consumer hardware.

Deep dive | Thesis & Playbook

Which AI, For Which Data

Most consultancies have a default: they use whatever cloud AI tool is on their subscription and paste data into it. That default has served well for low-sensitivity tasks. It is quietly creating compliance and continuity exposure for everything else. The question no one has formally answered is: which workflows contain data that should not touch a cloud service, and what do you run instead?

On paper
  • Local AI in 2026 is not a hobbyist project. LM Studio shipped stable MTP (Multi-Token Prediction) speculative decoding, MLX hit 4x faster inference on M5, and Ollama added Codex App support – all in May 2026 alone.

  • Google's Gemma 3n E4B runs on a standard business laptop with integrated graphics only. Requires as little as 3–4GB of RAM. Supports text, vision, and audio input. Runs fully offline after initial download.

  • Gemma 4 12B runs in 16GB of RAM with multimodal understanding, reasoning, and agent workflows on regular laptops. The quality-to-hardware ratio has shifted materially in the last six months.

  • The break-even against cloud API spend for a team deployment (RTX 5090, ~£5,500) sits at approximately 35.6 million tokens per month – at which point local saves 71–97% of marginal cost.

  • EU AI Act high-risk provisions and GDPR data minimisation principles now create a genuine compliance case for local inference in regulated professional services, not just a cost argument.

In practice
  • Local AI handles short, bounded documents well. A grant application, a client briefing, a contract summary: solid output, nothing leaves the machine.

  • Complex file generation (formatted .docx, colour-coded tables, multi-sheet Excel) is where local models fall short. In testing, Gemma 3n produced clean structured text; the cloud tool produced a formatted proposal with layout and colour in the same time. This is a tooling integration gap, not a model quality gap – but it matters for client-facing deliverables.

  • Speed is slower than cloud. On a three-year-old laptop with integrated graphics: usable for one user, slower on large documents. As of June 2026, Ollama and LM Studio run the same llama.cpp and MLX backends, so speed is within roughly 5% between them – pick on interface, not performance.

  • The hidden cost is engineering time: 2–4 hours per month for a solo deployment, rising to 1.5–2 full-time engineers per cluster at enterprise scale. GPU memory issues, model version updates, and inference configuration land on someone's calendar.

  • UK solicitors drafting first-pass contract analyses without sending privileged material to OpenAI, and consultancies operating under client NDAs, are the primary professional services adopters in 2026.

Issues/backlash
  • Local does not mean secure. LM Studio and Ollama have significant local machine access. MCP integrations can execute code and access files. The open-source repository is a standing injection surface for Ollama-based deployments.

  • GDPR, the EU AI Act, the Digital Markets Act, and a growing patchwork of US state laws now converge on the same AI systems – local inference addresses data transfer risk but does not eliminate the need for AI governance overall.

  • Hardware pricing: NVIDIA cut RTX 50-series production 30–40% in 2026, pushing street prices above MSRP. Break-even calculations using list prices are optimistic at current market conditions.

  • Consumer-grade hardware cannot match frontier cloud models on complex multi-step reasoning or document generation. The gap is narrowing, but it is real today.

My take (what to do)
  • Startup (15–40 staff): You almost certainly have at least one workflow containing data that should not touch a third-party server – a client's financial model, a legal document, a sensitive briefing. Identify it this week. Before buying any hardware, install Ollama on an existing machine (free, five minutes) and run the specific task you have in mind. If the output quality is sufficient for that task: you have your answer and your hardware already. If not, the gap is in tooling integration, not capability – and that is solvable before you spend anything.

  • SMB (50–120 staff): The priority is classification, not deployment. Map your three highest-volume AI workflows and note what data they touch: public information, internal data, or client-confidential data. For the third category, you need either a cloud business plan with an active DPA, or a local inference setup for those specific tasks. You do not need to do both. Assign one ops team member to own the answer. Most SMBs will find that one or two workflows warrant local deployment and the rest are fine on cloud with a business plan. Build the policy; the tooling follows.

  • Enterprise (150–250 staff): You have three actions before the August 2026 transparency deadline. First: confirm that every cloud AI tool used on client data has an active DPA signed and on file – not assumed, confirmed. Second: classify your workflows by data sensitivity and map which ones would benefit from local inference. Third: use the Digital Omnibus extension on high-risk rules to build governance properly rather than rush to comply with a deadline that has now moved. The August 2026 transparency obligations (chatbot disclosure, AI-generated content labelling) still apply; those need to be in place. The December 2027 high-risk deadline gives you time to build the governance infrastructure that actually works.

How to try (15-minute path)
  1. Install Ollama from ollama.com and run ollama run gemma3 in a terminal. Takes two minutes once the model downloads. (5 min)

  2. Take one piece of real internal text – a document summary, a draft, a classification task – and send it as a prompt. Note the output quality and response speed compared to your current cloud tool. (5 min)

Success metric: a clear yes/no decision on whether local model quality is sufficient for one specific internal workflow you run regularly. If yes, you have a deployment decision. If no, you have a clear record of why – which is also valuable. (5 min)

"Today's adoption of the AI Omnibus in the European Parliament weakens fundamental rights protections in the AI Act, delays enforcement of key provisions, and empowers Big Tech companies. The adopted text makes it more difficult to protect people from invasive AI, empowers industry actors and begins the dismantlement of digital protections in Europe."

Diego Naranjo, Senior Advocacy Advisor at European Digital Rights (Liberties) – on the European Parliament's adoption of the Digital Omnibus on AI, 16 June 2026

Spotlight tool | LM Studio

Purpose: Local AI runtime and GUI for running open-weight models on your own hardware, offline, with zero data leaving the machine.

Edge: LM Studio 0.4.16 ships a headless daemon (llmster) for server environments, a native iPhone app (Locally), and LM Link – an end-to-end encrypted remote-access bridge built on Tailscale's WireGuard mesh. Chat history stays on your devices; only a device discovery list touches LM Studio's servers.

  • → Fully offline inference after model download – nothing sent to external servers

  • → OpenAI and Anthropic-compatible API for drop-in integration with existing tooling

  • → Headless daemon (llmster) for Linux server and CI deployment without GUI

  • → Free for home and commercial use

Caveat: single-user architecture; no authentication on the local API by default. Add a reverse proxy before team deployment.

Try it: lmstudio.ai

What did you think of today's issue?

Login or Subscribe to participate

Did you find it useful? Or have questions? Please drop me a note., I respond to all emails. Simply reply to the newsletter or email [email protected].

This issue’s sponsor

n8n

An open‑source automation platform that lets you chain tools like DeepSeek, OpenAI, Gemini and your existing SaaS into real business workflows without paying per step. Ideal as the backbone for your first serious AI automations.

Refer and win

Share this newsletter for a chance to win!

Keep Reading