🌟Vasilij’s note

Live Artifacts shipped last week and I built two of them in front of a camera. Not a demo. Not a walkthrough of someone else's build. My actual business pulse dashboard and my morning attention queue - both live, both running, both built whilst recording. The reaction has been interesting: most people clock the tools. The ones who get it clock the design pattern underneath. Data plus prompt template plus paste-to-Claude. That pattern is the unlock. It works on a single tile, on a single row, on any artefact you connect to a data source. Meanwhile, OpenAI ships ChatGPT 5.5, Google quietly confirmed a $40 billion commitment to Anthropic, Anthropic shipped persistent memory for Claude Agents, and OpenAI's IPO sprint revealed a revenue gap that makes the infrastructure bets look increasingly asymmetric. The week felt like a hinge. The pieces are in place. The question – as always – is whether firms have the discipline to use them.

In today's edition

This week in agents | What changed

Live Artifacts launch in Claude Cowork

Connected dashboards and trackers that auto-refresh on reopen, saved in a dedicated tab with version history; launch connectors include HubSpot, Gmail, Notion, Slack, and Stripe → the gap between "Claude wrote me a report" and "I run my business from this" has closed for operators already on Cowork.

OpenAI misses revenue and user targets ahead of IPO

Despite shipping GPT-5.5 and aggressive product launches, OpenAI fell short of internal targets for both revenue and active users in its pre-IPO sprint → the gap between stated capability and commercial traction is widening, and pricing pressure on API costs is a likely downstream consequence for firms building on OpenAI infrastructure.

China blocks Meta's $2 billion acquisition of Manus

Regulators ordered the deal unwound, halting Meta's push into agentic AI via the cross-border acquisition of the autonomous agent startup → a signal that agentic AI is now a strategic asset class in the US-China technology competition, not just a product category.

Top moves | Signal → impact

  • GPT-5.5 ships – the cadence is now the story, not the model

    Released 23 April, GPT-5.5 is OpenAI's fastest frontier model to date at the same per-token latency as its predecessor, whilst using roughly 40% fewer output tokens to complete the same agentic coding tasks. It leads on Terminal-Bench 2.0 (82.7% vs Claude Opus 4.7's 69.4%) and computer use, but trails Opus 4.7 on SWE-Bench Pro (58.6% vs 64.3%) and hallucination rate (86% vs 36%). API pricing doubles GPT-5.4 at $5/$30 per million input/output tokens. GPT-5.4 shipped in March. GPT-5.5 shipped six weeks later. → The release cadence is the operating signal, not the benchmark card. OpenAI is moving fast enough that any workflow you build on a specific model version needs an upgrade path designed in from the start. The hallucination gap versus Opus 4.7 also matters for client-facing deployments: faster and cheaper is the wrong trade if the outputs require more human review to be usable. Evaluate on your actual workflows before switching.

  • Google's $40 billion Anthropic commitment cements the two-tier market

    Reported 27 April, Google confirmed investment of up to $40 billion in Anthropic as Anthropic's valuation crossed $1 trillion - driven by scarce shares and accelerating Claude Code and enterprise API demand. The previous round, led by Amazon, already totalled $4 billion. → For consultancies choosing a model provider to build on: Anthropic's infrastructure backing is now comparable to OpenAI's. The differentiation question is no longer financial stability – it is which model performs better on your specific workflows. The correct answer to that question is now worth calculating properly, not assuming.

  • The open-weight frontier is closing on closed models – and the moat narrative is cracking

    Reported 28 April, analysis across multiple AI benchmarks shows the gap between open-weight models and frontier closed models narrowing significantly, with Tencent and Alibaba backing DeepSeek at a $20B+ valuation. The thesis that frontier models would remain defensible monopoly products is under direct pressure. → For consultancies advising clients on AI strategy: the "which model provider should we commit to" question is becoming less important than "which workflows should we automate first." Infrastructure lock-in risk is falling. Workflow design skill is becoming the sustainable differentiator.

  • Persistent memory arrives for Claude Agents – and immediately surfaces a governance question

    Anthropic's new Memory feature for Managed Agents stores context across sessions as files, exportable and permission-scoped via API. The feature is in public beta as of 27 April. → Done correctly, persistent memory transforms client-facing agents from one-session tools into genuine relationship infrastructure. Done carelessly, it creates the same data governance exposure that shadow AI created in 2024. Before enabling Memory in any client-facing deployment, answer three questions: what gets retained, for how long, and who has API access to the files. These are governance questions, not technical ones.

Upskilling spotlight | Learn this week

Live Artifacts – Official Documentation (Anthropic)

Covers connector setup, auto-refresh behaviour, version history, and the separate usage quota structure that most coverage has missed. Read this before advising clients on internal tooling or before building your first artefact. The quota distinction matters: Live Artifacts draw from a separate allowance, not your standard Claude chat limits.

Monitoring LLM Behaviour: Drift, Retries, and Refusal Patterns

A practical guide to separating deterministic assertions (syntax and routing integrity) from model-based evaluations (semantic quality) in production agent deployments. Covers offline pipelines for pre-deployment regression testing and online pipelines for monitoring real-world drift. Directly relevant for any firm running agents in client-facing workflows.

Maker note | What I built this week

This week I built two Live Artifacts live on camera – a Business Pulse dashboard reading six tiles of real business data from BigQuery via MCP, and an Attention Dashboard pulling overdue tasks, open deals, upcoming meetings, and flagged emails across HubSpot, Notion, Gmail, and Outlook into a single ranked action queue.

Decision: Start with the Attention Dashboard. It needs no warehouse, no MCP server setup beyond standard connectors, and it delivers the highest frequency return. Most consultancies have everything they need already connected.

Operator’s picks | Tools to try

Langfuse

Use for monitoring live agent deployments: traces every LLM call, flags cost spikes, latency drift, and refusal pattern changes across production workflows.

Standout: open source and self-hostable, with a clean dashboard that non-engineers can read. The most practical observability tool for consultancies running client-facing agents who need to demonstrate reliability, not just capability.

Caveat: requires instrumentation of your existing agent code — not plug-and-play. Allow 2-3 hours for initial setup on a simple workflow. Pair with: Claude Managed Agents or OpenAI Agents SDK for end-to-end visibility from prompt to output.

OpenAI Agents SDK – Updated Sandboxing

Use for building isolated agent execution environments where client data and workflow logic must not cross-contaminate between runs.

Standout: native sandboxing via Cloudflare, Vercel, E2B, and Modal; long-horizon harness for multi-step tasks; model-agnostic across 100+ LLMs.

Caveat: the Assistants API is confirmed for deprecation mid-2026 – any firm still building on it needs a migration plan now.

Deep dive | Live Artifacts – The Internal Tooling Decision for Consultancies

A typical 30-person consultancy starts each morning with the same ritual: one tab for HubSpot, one for Notion, one for Gmail, a Power BI dashboard that takes 90 seconds to load, and a quarterly slide deck someone assembled last month. Each tool tells a partial story. None of them tells the right one. Live Artifacts, which shipped in Claude Cowork on 20 April, change the economics of this problem.

On paper
  • Live Artifacts are persistent, data-connected outputs that live in their own tab in Claude Cowork and auto-refresh when opened. Unlike standard artefacts – which are one-shot outputs you copy and paste somewhere else – Live Artifacts maintain connections to data sources and retain version history. Launch connectors cover HubSpot, Gmail, Notion, Slack, and Stripe. Anything else (Outlook, BigQuery, custom databases) connects via MCP.

  • The replacement cost stack is material: Looker Standard runs £40-100 per seat per month; Tableau roughly £60; Retool Teams $50. A 30-person consultancy running a BI seat and a Retool licence for internal ops tooling is spending £15,000-30,000 annually before counting the 5-10 hours per week operators spend on tab-switching and manual data assembly.

  • Anthropic has not published pricing for Live Artifacts specifically – the feature sits within the Cowork subscription with a separate usage quota from standard chat limits.

In practice
  • The Business Pulse dashboard took approximately 30 minutes end-to-end, including two iterations: one to adjust RAG band thresholds for cash (negative mid-month is normal, not a warning), and one to fix a pipeline health display issue caused by the underlying mart overwriting rather than appending records. Six tiles, click-through drill-downs, 12-month trend charts, auto-generated insight sentences, and a top-5 contributors table per tile – all from a single prompt with a structured brief.

  • The Attention Dashboard took 15 minutes. The design pattern transferred directly from the first build: once Claude understands how you want artefacts structured, it stops guessing on subsequent builds. Six sections – overdue work, due today, upcoming meetings, deals closing this week, tasks due this week, flagged emails – rendered from HubSpot, Notion, Gmail, and Outlook. One correction required: the first version pulled all unread email rather than flagged and labelled only.

  • The design pattern that ties both artefacts together is the context-loaded prompt button. Every tile in the Business Pulse carries a "?" icon that opens a modal with the drill-down data and a pre-written Claude prompt ready to paste. Every row in the Attention Dashboard carries a "Discuss" button that generates per-item context – deal name, value, stage, last activity, or task title and due date – plus a standard ask. The cost of asking Claude drops to one click. That is the unlock, not the dashboard itself.

Issues/backlash
  • Live Artifacts is research preview, not finished software. Intermittent refresh failures have been reported in the first week of production use. Collaboration is basic – not multiplayer for simultaneous editors. Per-user authentication means team-wide deployment requires individual connector setup, which does not yet scale cleanly to larger teams.

  • The Outlook connector is absent from the launch list. Microsoft 365 MCP is required, which adds setup time. BigQuery access requires an MCP server. Firms without clean data sources will produce artefacts that refresh inaccurate data reliably – which is worse than no artefact at all.

  • There is also a pattern risk. The "?" and "Discuss" buttons only deliver value if the prompt templates are well-written. A poorly structured template produces generic Claude output that does not justify the click. Template quality is a craft skill, not an infrastructure one.

My take (what to do)
  • Startup (15-40 staff): Build the Attention Dashboard first. You almost certainly already have HubSpot, Notion, Gmail, and Outlook. The prerequisites are already connected. Spend 20 minutes writing the prompt brief using the structure from the video, build the artefact, and run it for two weeks as your morning routine. If it saves 30 minutes a day, that is 10 hours a month recovered at partner rate. Only then move to the Business Pulse – which requires either a structured data source or MCP setup.

  • SMB (50-120 staff): Assign one ops team member to own the Live Artifacts programme for four weeks. Their job: build the Attention Dashboard for themselves first, document what works and what breaks, then replicate it for two other users before any wider rollout. Track revision rounds and time-to-first-action each morning before and after. Do not roll out to the full team until per-user connector authentication is resolved – partial adoption with inconsistent data is worse than a clean delayed rollout.

  • Enterprise (150-250 staff): The governance question comes before the build question. Live Artifacts store and display data via connectors that authenticate per user. Before enabling across a team, map which data sources will be connected, what classification those data sources carry, and whether your existing AI usage policy covers this. The artefact itself is low risk. The connectors carry your CRM, your inbox, and your project data. Treat the connector layer like any other third-party integration – security review first, then pilot, then rollout.

How to try (15-minute path)
  1. Open Claude Cowork and navigate to the Live Artifacts tab. If you don't see it, confirm your subscription tier includes Cowork – this feature is not in base Claude. (2 min)

  2. Connect HubSpot, Notion, and Gmail in the connector settings. These three cover the majority of the Attention Dashboard's data sources. (5 min)

  3. Paste this brief into Claude: "Build me an attention dashboard. Three sections: overdue tasks from HubSpot and Notion, open deals closing in the next 7 days from HubSpot (with stage, value, and probability), and flagged emails from Gmail. Every item has two buttons: Open (to the source record) and Discuss (generating a context-rich Claude prompt for that item). Refresh when I reopen this artefact." Run it. Note how long it takes and how many iterations you need. (8 min)

Success metric: a working artefact that surfaces at least one action you would otherwise have found 20 minutes later by checking tabs manually. If it does that on day one, the pattern is working.

"Our users tell us Claude is increasingly essential to how they work. We need to build the infrastructure to keep pace with rapidly growing demand."

Dario Amodei, CEO, Anthropic - Statement issued alongside the Google and Amazon investment announcements, 24 April 2026

Connected, persistent dashboards and action queues that auto-refresh on reopen and stay live as internal business tools – not throwaway one-shot outputs.

  • → Connector-native: HubSpot, Gmail, Notion, Slack, Stripe at launch; Outlook and BigQuery via MCP

  • → Context-loaded prompt buttons on every tile and row – one click from data view to Claude analysis

  • → Version history and dedicated artefact tab – not buried in chat history

  • → Replaces: BI seats, Retool licences, and 5-10 hours per week of manual tab-switching for operators

  • → Included in existing Cowork subscription – separate usage quota from standard chat

What did you think of today's issue?

Login or Subscribe to participate

Did you find it useful? Or have questions? Please drop me a note., I respond to all emails. Simply reply to the newsletter or email [email protected].

This issue’s sponsor

n8n

An open‑source automation platform that lets you chain tools like DeepSeek, OpenAI, Gemini and your existing SaaS into real business workflows without paying per step. Ideal as the backbone for your first serious AI automations.

Refer and win

Share this newsletter for a chance to win!

Keep Reading