🌟Vasilij’s note

This week the data caught up with the reality I see in every client conversation. OutSystems surveyed 1,900 IT leaders and found that 96% of enterprises are already running AI agents in some capacity. Only 12% have a centralised platform to manage them. MiniMax open-sourced a 230-billion-parameter model that autonomously improved its own training scaffold across 100 rounds - and it is free to deploy. Anthropic quietly launched Claude for Word, completing the full Office suite integration. And Google expanded agentic booking to the UK, meaning your clients' customers can now task an AI to find, select, and reserve a restaurant table from inside Google Search. None of these require a subscription to a frontier model. None require a dedicated AI team. What they all require is the discipline to ask: where is this showing up in my clients' operations, and do I have a position on it?

In today's edition

This week in agents | What changed

OutSystems: 96% of enterprises are running AI agents — 94% are concerned about sprawl.

The 2026 State of AI Development report (published 13 April, 1,900 global IT leaders) confirms the shift from experimentation to execution is complete. 97% are exploring system-wide agentic strategies. But governance has not kept pace: 94% report concern that agent sprawl is increasing complexity, technical debt, and security risk. Only 12% have implemented a centralised platform to manage it. 38% are mixing custom-built and pre-built agents in stacks that are difficult to standardise or secure.

MiniMax open-sources M2.7 - a 230B model that autonomously improved its own training.

Published 13 April on Hugging Face. The model participated in its own development cycle: MiniMax tasked an internal version with optimising a programming scaffold across 100+ autonomous rounds. M2.7 analysed failure trajectories, modified scaffold code, ran evaluations, and decided whether to revert each change - without human intervention. Result: 30% performance improvement on internal evaluation sets. It now handles 30–50% of MiniMax's daily reinforcement learning workflows end-to-end. Benchmarks: 56.22% on SWE-Pro (matching GPT-5.3-Codex), free to deploy via SGLang, vLLM, and NVIDIA NIM.

Claude for Word launches in public beta - completes Anthropic's full Office suite.

Launched 11 April for Team and Enterprise plan users on Mac and Windows. A persistent sidebar within Microsoft Word lets users draft, edit, and revise .docx files with AI assistance. All edits surface as native Word tracked changes. Claude can read full multi-section documents, work through comment threads, and edit clauses while preserving formatting, numbering, and styles. Connects to Claude for Excel and Claude for PowerPoint in a single conversation thread. First listed use case: legal contract review.

Top moves | Signal → impact

  • Perplexity CEO confirms 5x revenue growth - from $100M to $500M - with only 34% headcount increase.
    Aravind Srinivas posted the figure publicly on 14 April. The growth is attributed to the Perplexity Computer launch - an agentic product that executes multi-step workflows using up to 19 models rather than returning search results. Revenue surged 50% in a single month in March; the cumulative ARR is now confirmed at $500M. → The commercial proof that doing tasks outperforms finding answers is now explicit and CEO-confirmed. For consultancies still framing their AI offer around "AI-powered research" or "AI-assisted analysis," the positioning question is: are you a search product or an agent product? The firms that reframe first will close faster.

  • Google launches agentic booking in UK Search - agents now complete transactions in consumer products.

    Available from 11 April across the UK (plus Australia, Canada, India, Singapore, and four other markets). Users describe a booking requirement in natural language - party size, dietary needs, location, time - and Google's AI Mode surfaces options and facilitates the reservation through partners including OpenTable, TheFork, ResDiary, and SevenRooms. The model does not retrieve information. It completes the transaction. → This is the signal that agentic capability has crossed into consumer infrastructure. When Google normalises agents that act rather than answer within Search - the product used by your clients' customers daily - the expectation that software should complete tasks, not just surface results, becomes a baseline. What does your clients' service experience look like against that baseline?

  • 94% of enterprises flag agent sprawl as a governance risk - only 12% have tooling to manage it.
    The same OutSystems data point that opens This Week in Agents deserves a strategic read in the Top Moves context. 38% of organisations are running mixed stacks of custom-built and pre-built agents with no standardisation. The firms managing this are not using better models - they are using centralised registries, approval workflows, and audit trails. → The governance gap is the engagement. Consultancies that arrive with a structured agent portfolio audit - inventory of deployed agents, data access patterns, approval chains, cost attribution - will find the budget has already been approved. The client's CISO signed off on it when they read the breach statistics.

Upskilling spotlight | Learn this week

Claude for Word - Official Documentation and Use Case Guide

The Anthropic product page covers the full capability set: document-aware drafting, tracked changes integration, comment thread editing, cross-app context with Excel and PowerPoint, and clause-level search. Essential reading before advising clients on Microsoft 365 AI integration, or before deploying it in your own delivery team. The legal contract review use case listed first is the highest-signal starting point for professional services firms.

MiniMax M2.7 - Model Card and Deployment Documentation (Hugging Face)

The full model card covers architecture (229B MoE), benchmark results, deployment options (SGLang, vLLM, NVIDIA NIM, Transformers), and the self-evolution methodology. For any consultancy advising clients on open-source model deployment or build-versus-buy infrastructure decisions, this is the reference that changes the cost model. Understanding what a free, production-grade agent model looks like in 2026 is now table stakes for informed infrastructure advice.

NVIDIA Technical Blog - MiniMax M2.7 on NVIDIA Platforms

NVIDIA's engineering write-up covers production deployment of M2.7 on Blackwell Ultra GPUs via vLLM and SGLang, including optimised configurations for tool calling and reasoning, throughput benchmarks (up to 2.7x gains with SGLang), and NeMo RL fine-tuning recipes. The practical outcome: anyone evaluating whether to self-host M2.7 versus use a managed API has the infrastructure numbers they need to model the cost honestly. Relevant for senior ops or technical leads making build-versus-buy decisions on behalf of clients.

Maker note | What I built this week

This week I recorded a full breakdown of Anthropic's Managed Agents platform - what it is, how the three-layer architecture actually works, and what it means for consultancies that are stuck between demo and deployment. The video covers the Sessions/Harnesses/Sandboxes model, the credential isolation architecture that addresses the biggest security objection to third-party agent hosting, a cost model for a 50-person firm, and a live comparison against five alternatives, including Microsoft Copilot Studio, AWS Bedrock AgentCore, and OpenAI's Responses API.

Decision:

The infrastructure problem - not the intelligence problem - is what is killing most agentic projects. Managed Agents removes the infrastructure excuse. Whether the vendor lock-in trade-off is acceptable is the real decision each firm needs to make deliberately, not by default.

Operator’s picks | Tools to try

Claude for Word (beta)

Use for AI-assisted drafting, editing, and clause review directly inside Microsoft Word, with all edits surfacing as tracked changes.

Standout: cross-app context with Excel and PowerPoint in a single conversation thread - Claude can check for data inconsistencies between a Word report and the underlying Excel model simultaneously.

Caveat: Team and Enterprise plans only ($25/user/month minimum); free and Pro users on waitlist.

MiniMax M2.7 (open source)

Use for deploying a production-grade agent model on your own infrastructure — coding agents, document processing pipelines, research workflows — without per-token API costs. 229B MoE architecture; deploy via SGLang, vLLM, or NVIDIA NIM.

Standout: free to run; SWE-Pro performance matches GPT-5.3-Codex. The first open-source model with documented self-evolution capability in a production engineering context.

Caveat: requires GPU infrastructure to run at scale; not a managed service.

OutSystems Agentic Systems Engineering

Use for building, managing, and evolving governed agentic systems in enterprise environments - specifically the agent portfolio management and compliance tooling that the 94% sprawl concern makes urgent.

Standout: purpose-built for the governance gap, not the capability gap. Covers agent inventory, approval workflows, and audit trails as first-class features rather than afterthoughts.

Caveat: enterprise-focused; likely overkill for firms under 100 staff, but the report itself is free and worth reading regardless of the product.

Deep dive | Anthropic Managed Agents - The Build vs Buy Decision, Made Concrete

88% of AI proofs of concept never reach wide-scale deployment. The failure point is rarely the model. It is everything required to run AI reliably in production: sandboxing, state management, credential security, error recovery, and context handling across long sessions. For a 15-to-200-person consultancy without a dedicated infrastructure team, building all of this yourself is a losing bet. MIT's Project NANDA found that buying from specialised vendors succeeds roughly twice as often as building internally. Claude Managed Agents, which launched in public beta on 8 April, is Anthropic's answer to the infrastructure gap - and the question every consultancy now needs a clear position on.

On paper
  • Managed Agents provides three composable infrastructure layers. Sessions are append-only event logs stored outside Claude's context window - full history is always recoverable, and agents can resume mid-task through disconnections without losing state. Harnesses are the orchestration loops that call Claude, route tool calls, and manage context; if a harness crashes, a new one fetches the event log and resumes from the last checkpoint. Sandboxes are managed execution containers where Claude runs code and edits files - interchangeable and isolated from credential storage.

  • The security architecture is the most significant structural change from self-hosted approaches. In previous designs, any code Claude generated ran in the same container as your credentials. A prompt injection was only needed to convince Claude to read its own environment variables. In Managed Agents, API keys and OAuth tokens sit in a secure vault; Claude calls tools through a proxy that fetches credentials independently. That is a structural fix, not a configuration choice.

  • Pricing: $0.08 per active session-hour for infrastructure, plus standard Claude token costs. Anthropic's own benchmark puts a one-hour coding session at approximately $0.70 total. For a 50-person consultancy running five agents at four active hours per day, infrastructure costs run under £30 per month. Token costs are where the real spend sits - expect £500–£2,000 per month total for that deployment profile, depending on model choice and task complexity.

In practice
  • Five alternatives exist, each with a distinct trade-off. Microsoft Copilot Studio is the lowest-friction option if your firm runs on Microsoft 365, with a visual workflow builder and deep Teams integration - but flows time out after two minutes, ruling it out for long-running autonomous work. AWS Bedrock AgentCore is the most flexible option, fully model-agnostic with per-second billing, but cost modelling across its multiple billing layers is complex. OpenAI's Responses API has the most mature developer ecosystem with an open-source Agents SDK, but you assemble all the infrastructure pieces yourself. Salesforce Agentforce is native if you are already on the full Salesforce stack, but first-year implementation costs typically start at £50,000 for a small team. Open-source frameworks like LangGraph and CrewAI offer maximum control and zero vendor lock-in, but trade months of engineering time for that flexibility.

  • The critical trade-off with Managed Agents is portability. It runs exclusively on Claude - not available through AWS Bedrock or Google Vertex. If Anthropic's models fall behind a competitor, or if Anthropic faces any business disruption, migration costs will be high. The decision is whether Claude's current strength in long-horizon reasoning and Managed Agents' structural security architecture outweighs that portability risk for your specific deployment.

  • Internal testing showed up to 10 percentage points of task success improvement over standard prompting loops, with the largest gains on the hardest problems. Three use cases have the clearest ROI for consultancies: proposal generation from RFP documents (40–80 hours of non-billable time per complex bid), research and competitive analysis (cutting 3-week cycles to 4 days), and time capture (consultants under-report billable hours by 15–25%; an agent monitoring calendars and document activity closes that gap at roughly £20,000 per consultant per year at a 10% improvement).

Issues/backlash
  • This is a public beta. Multi-agent orchestration and advanced memory tooling remain in limited preview. Production reliability at scale is still being proven. The vendor lock-in risk is real, and Anthropic is explicit about it - Managed Agents does not run on other providers' infrastructure.

  • Data sovereignty is the second objection. Client confidential information - strategy documents, financial data, M&A details - sent to a third-party AI platform creates risk. Only 51% of UK IT decision-makers are confident their AI-generated data complies with GDPR. For regulated sectors, explicit client consent frameworks or anonymisation pipelines are required before connecting client data to any of these platforms. The EU AI Act's August enforcement deadline makes this urgent, not optional.

  • Human oversight is non-negotiable regardless of infrastructure. Deloitte was asked to partially refund a $290,000 Australian government report that contained AI-generated hallucinations. Only 21% of companies have a mature governance model for autonomous agents. The infrastructure decision and the governance decision are separate. Managed Agents solves the first. You still have to solve the second.

My take (what to do)
  • Startup (15–40 staff): The infrastructure problem is what is killing your agent projects, not the model. Create a free Claude Platform account, test one constrained workflow - proposal first drafts from RFP documents, client status report generation from project data, meeting prep from CRM history - and run it for 30 days. Measure time saved against the £500–2,000 monthly spend at your usage level. One recovered billable hour per week per consultant at £150 covers the cost at a 15-person firm size in the first month.

  • SMB (50–120 staff): Assign one delivery lead as the agent owner - not an additional role, a two-hour weekly responsibility. Their job: maintain a running cost model for agent deployments (actual token consumption, model choices, session duration), review Console tracing logs monthly, and retire any agent that does not deliver evidence of ROI after eight weeks. The session tracing built into the Managed Agents Console makes this tractable. Use it.

  • Enterprise (150–250 staff): Before deploying, run a data residency review. Confirm your existing DPAs with Anthropic cover the specific data types your agents will process. If you handle financial services, healthcare, or legal data, the August EU AI Act deadline is a current procurement requirement, not a future concern. Define which decisions agents can make autonomously versus where human approval is required before go-live - not after the first incident.

How to try (15-minute path)
  1. Go to platform.claude.com and navigate to the Managed Agents section. Read the architecture overview - specifically the Sessions/Harnesses/Sandboxes model and the credential isolation section. Ten minutes here saves two weeks of debugging later. (5 min)

  2. Identify one workflow that happens more than ten times per week and consumes more than 30 minutes each time. Write down the trigger, the steps, and the output. (5 min)

  3. Calculate: (weekly time × 52 weeks × your hourly cost) - (£0.08/session-hour infrastructure + token costs + 10% contingency). If positive within six months, this workflow is ready for a Managed Agents pilot. If negative, it is not your first candidate. (5 min)

Success metric: A clear yes or no decision on whether to proceed, based on your own cost data - not a vendor's case study.

"Two years ago, law firms had to promise clients they weren't using generative AI on their cases. Now they are saying you must use it."

Susan Wortzman, Partner, McCarthy Tétrault — via Reuters

Spotlight tool | Claude For Word

Purpose:

AI-native document editing inside Microsoft Word for professional services teams working with contracts, reports, proposals, and memos.

Edge:

The tracked changes integration is the differentiator - AI edits are surfaced as reviewable revisions, not opaque replacements. Every suggestion has a human decision point before it is accepted. This is the governance model for AI-assisted document work: capability without removing accountability.

  • → Full document awareness - Claude reads multi-section documents and maintains context across the entire file

  • → Comment thread editing - reads existing comments and makes changes to associated text, explaining edits within the thread

  • → Tracked changes - all AI edits appear as native Word revisions, reviewable and rejectable by a human

  • → Cross-app context - connects to Claude for Excel and Claude for PowerPoint in a single conversation thread

  • → Clause search - finds and edits specific clauses by description, preserving formatting, numbering, and styles throughout

What did you think of today's issue?

Login or Subscribe to participate

Did you find it useful? Or have questions? Please drop me a note., I respond to all emails. Simply reply to the newsletter or email [email protected].

This issue’s sponsor

n8n

An open‑source automation platform that lets you chain tools like DeepSeek, OpenAI, Gemini and your existing SaaS into real business workflows without paying per step. Ideal as the backbone for your first serious AI automations.

Refer and win

Share this newsletter for a chance to win!

Keep Reading