🌟 Vasilij’s Note
I was in Glasgow last week, delivering a live presentation, and the news keeps coming out on AI. This week wasn’t “just another model release” – the whole stack moved at once: open reasoning, cloud chips, video, sovereign AI, and even decentralised compute. Here’s the operator’s cut.
In Today's Edition:
This Week in Agents | What Changed
DeepSeek ships V3.2 + Speciale as open, reasoning-first models → GPT-5-class maths and coding performance you can self-host. DeepSeek
Google’s Gemini 3 + Deep Think lands → Multimodal agents get a “slow think” switch for hard problems, not a separate model line. Gemini 3
OpenAI declares “Code Red” on ChatGPT → Ads and side agents paused while they sprint on speed, reliability and personalisation. Independent
AWS rolls out Trainium3 servers + Nova updates → Cloud gets cheaper, denser AI infra with 4× performance and ~40% less power per box. Reuters
Runway launches Gen-4.5 → Text-to-video jumps again, topping Artificial Analysis with 1,247 Elo. Runway
Ukraine, Telegram and Black Forest Labs go “sovereign & open” → Ukraine is developing a sovereign LLM on Google’s Gemma framework for civil and government applications — signalling a shift toward national-scale open-weight deployments. StratNews Global
A practical deep dive into Relevance AI - what it does, how it works, and how to build real multi-agent automations without writing code.
Top Moves - Signal → Impact
Launch/Policy — DeepSeek V3.2 & Speciale go open
DeepSeek released V3.2 and V3.2-Speciale as Mixture-of-Experts models aimed squarely at agents, with Speciale scoring gold-medal results on the 2025 IMO, IOI and ICPC — all under an MIT-style licence. VentureBeat
→ Why it matters: The first time frontier-tier reasoning arrives as open weights you can fine-tune, self-host and govern internally instead of renting via API.
Ecosystem shift — Gemini 3, AWS Trainium3 and Nvidia–Synopsys
Gemini 3 Pro with Deep Think sets new highs on Humanity’s Last Exam, GPQA Diamond and ARC-AGI-2, while AWS pushes Trainium3 (4× performance, ~40% less power). Nvidia invests $2B into Synopsys to bake GPU-accelerated AI into chip design. blog.google
→ Operating guidance: Model choices and infra choices are converging — expect Gemini, Nova, DeepSeek etc. to be optimised for specific chips. Build an abstraction layer so you can swap both models and accelerators without rewriting workflows.
Security/Compliance — Sovereign AI and confidential compute
Ukraine is building a national LLM on Gemma for civil and military use; Telegram activates Cocoon — a decentralised confidential compute network on TON paying GPU owners to run encrypted AI workloads. StratNews Global
→ Risk/opportunity: Regulators will increasingly ask “why send this data to US SaaS?” Open weights + sovereign builds + confidential compute create new placement options — and new governance responsibilities.
Upskilling Spotlight | Learn This Week
Google: A new era of intelligence with Gemini 3 — Outcome: understand Deep Think, benchmark gains, and how to wire Gemini 3 into agents via AI Studio or Vertex.
Read the guide
Runway Gen-4.5 announcement — Outcome: get a realistic view of text-to-video progress (motion consistency, temporal control, keyframes) to decide if video belongs in your workflows. Runway
Maker Note | What I built this week
This week I re-cut my stack: DeepSeek V3.2-Speciale for the hardest text-only reasoning chains, Gemini 3 Pro for multimodal and tool-heavy flows, and a “GPU-ready” deployment plan in case we need to bring open weights in-house for regulated work.
Operator’s Picks | Tools To Try
LangSmith Agent Builder
Use for: Designing production-grade agents via a visual, agent-by-agent workflow.
Standout: For the first time, you can build agents as modular components with roles, evaluation sets, routing logic and telemetry — treating agent development like real software engineering instead of prompt alchemy.
What’s new:
Visual agent graph
Component-level evaluations
Step-by-step traces & introspection
Configurable policies (guardrails, retries, fallbacks)
Instant deployment to endpoints or LangGraph
Why operators should care: most agent failures aren’t model failures — they’re architecture failures. Agent Builder enforces explicit design, testability and observability for multi-step systems.
Deep Dive | DeepSeek V3.2
Why this matters now. DeepSeek has released two new open-weight models – V3.2 and V3.2-Speciale – that match or beat the best models in the world on difficult reasoning tasks. They’re MIT-licensed, meaning you can run them yourself, customise them, and avoid vendor lock-in. For the first time, a genuinely frontier-class model is available outside the big US companies.
What DeepSeek Actually Built
A huge model that’s cheap to run
DeepSeek uses a Mixture-of-Experts design. Think of it as a model with many specialists inside it, but only a few are used at any time.
Result: the intelligence of a giant model at the cost of a medium one.
Better handling of long documents
Their new Sparse Attention system makes it far more efficient to process long texts (up to 128k tokens). If your agents read PDFs, contracts or codebases, this matters.
A training approach focused on reasoning
DeepSeek didn’t just train on internet text. They used:
Maths olympiad problems
Coding competitions
Logic puzzles
This is why DeepSeek does extremely well at maths, structured problems and multi-step reasoning.
V3.2 vs Speciale — Simple Explanation
V3.2
The general version. Good for agents, coding, RAG and everyday reasoning. Faster and more flexible.
V3.2-Speciale
The “expert” version. Excellent at proofs, maths and hard logic. Slower and not meant for casual conversation. Best used as a problem-solver or teacher model.
Issues / Backlash
Practitioners report that benchmark wins don’t always translate to better everyday UX – DeepSeek can feel more “formal” or slower, and Deep Think is overkill on simple asks.
There is also geopolitical discomfort about leaning heavily on Chinese open-weight models for sensitive workloads and questions about whether sovereign stacks will just fragment standards further.
My Take (What to do)
Startup: Put a cheap default model in front: Route “hard mode” traffic (multi-step reasoning, long context) to either DeepSeek V3.2-Speciale (if you can handle infra) or Gemini 3 Deep Think (if you want managed). Cap slow-think spend with explicit budget and logging.
SMB: Keep customer-facing flows on managed providers (OpenAI, Gemini, Nova) Use this moment to pilot one internal DeepSeek or Gemma-based workload where sovereignty or cost actually matter (e.g. contracts, internal docs), so you’ve got an escape route if pricing or policy changes.
Enterprise: Treat sovereign AI as a real workstream, not a press release: evaluate Gemma/DeepSeek for on-prem, and write down clear criteria (data classes, regions, risk levels) for when workloads must be kept off US SaaS. Align infra teams now on how Trainium3, Nvidia-backed Synopsys tools and decentralised compute like Cocoon fit into your 3–5-year plan, even if you don’t act immediately.
How to Try (15-minute path)
Grab 20–30 real prompts from your agents (support tickets, internal research, planning tasks) and run them against your current default model, DeepSeek V3.2-Speciale, and Gemini 3 (with and without Deep Think).
Log quality, latency and approximate unit cost per run (based on public pricing) and tag which tasks genuinely improved with “slow think” or open weights.
Update your router: send only those tagged cases to DeepSeek/Deep Think and leave everything else on the cheap path. Success metric: ≥15–20% improvement on task completion or accuracy for that subset, with neutral or lower overall cost.
How to Try (Quick Version)
Pick 5–10 hard problems from your real work.
Test them on DeepSeek V3.2, Speciale, GPT-5 and Gemini 3.
Compare correctness, clarity and cost.
If DeepSeek wins even a few, it’s worth integrating into your agent stack.
What you’ll actually notice in use
DeepSeek is stronger at deep thinking than GPT-5 on many technical tasks.
It’s less polished for casual chat.
Costs can be much lower if hosted properly.
It’s ideal for agents that need accuracy, structure and long-context analysis.
Adoption challenges to be aware of
Self-hosting requires real engineering capability.
You need your own safety and compliance controls.
Some organisations may hesitate due to the model’s origin (China).
This doesn’t reduce its technical quality, but it affects procurement choices.
Spotlight Tool | Telegram Cocoon
Telegram Cocoon - Purpose: Privacy-first AI compute. Edge: decentralised confidential inference with built-in incentives.
→ Decentralised GPU marketplace • End-to-end encrypted AI workloads • TON-based rewards for node operators IQ.wiki
Try it: Explore Cocoon as a future option if you need private inference at scale and don’t want to be tied entirely to AWS/Azure/GCP.
What did you think of today's email?
Sponsored - Partner
AiGentic AI Readiness Assessment — A fast, honest snapshot of how ready your business is for AI agents, plus a concrete action plan instead of vague hype. Try: insights.aigenticlab.com
Did you find it useful? Or have questions? Please drop me a note. I respond to all emails. Simply reply to the newsletter or write to [email protected]

