Ghost Writer in the Machine (135)
Emdash galore.
Welcome to another edition of Artificial Insights, your autonomous intelligence digest. This week, the gap between what AI can do and what it can’t got wider in both directions simultaneously.
Picture a fogged bathroom mirror. You trace letters through the condensation — a message that’s legible for a moment before the steam closes back over. That’s intelligence right now: we keep clearing a window, and the surface keeps fogging up.
François Chollet released ARC-AGI-3 this week — 135 novel game environments with no instructions, no rules, no stated goals. The AI gets dropped in and has to figure out what winning even looks like. Untrained humans solved every single one. Gemini scored 0.37%. GPT 5.4 scored 0.26%. Opus scored 0.25%. Grok scored zero. Chollet sat with Sam Altman at Y Combinator and laid it out: his lab NDIA is building an entirely new branch of machine learning — not code generation, not agents, but a symbolic alternative to deep learning itself, aimed at the kind of reasoning these models fundamentally cannot do. Meanwhile, in the exact same week, GPT 5.4 Pro solved a frontier math open problem and a $500 GPU outperformed Claude Sonneton coding benchmarks. The models are getting genuinely better at structured tasks. They are getting no better at the thing a child does when dropped into an unfamiliar room: look around, explore, adapt.
The contradiction is the actual state of the field. Eric Schmidt told Diamandis that the singularity timeline depends on solving the 92-gigawatt compute bottleneck. A viral thread from @deredleritt3r reported that frontier lab leadership genuinely believes full AI research automation is two years away — “a million scientists in a data center.” And then there’s the rumor mill: Opus 5, reportedly trained on 10 trillion parameters at $10 billion, is said to be “so good that it poses a danger”. The Anthropic team, according to one account, doesn’t write code anymore. Someone at Anthropic showed Claude finding zero-day vulnerabilities in live systems — Nicholas Carlini’s [un]prompted talk on black-hat LLMs suddenly feels less speculative and more like a field report.
On the ground, the agent world is splitting into two rooms that don’t hear each other. In one: Claire Vo’s breathless 107-minute conversion on Lenny’s — she now runs nine OpenClaw agents across multiple Mac Minis, and it deleted her family calendar only the first time. Paperclip demoed hiring agents like employees. Garry Tan declared the unit of software production changed from team-years to founder-days. The SHL take: “OpenClaw + Opus 4.6 has made it 10x more fun to be a CEO who codes.” In the other room: Hacker News’s most upvoted post was “AI overly affirms users” at 775 points. Right behind: “Is anybody else bored of talking about AI?” at 745. 90% of Claude-linked output goes to GitHub repos with fewer than 2 stars.
Then there’s the part that doesn’t fit on a conference stage. Fireship’s 5-minute piece — “Tech bros optimized war… and it’s working” — lands alongside Palantir’s AIPCon 9 demos of multi-domain AI command and control. Karp says only two types survive the AI era. Sanders and AOC introduced a data center moratorium bill. Police used AI facial recognition to wrongly arrest a Tennessee woman. An Amsterdam court banned Grok’s undressing function. ChinaTalk published fiction about how Claude opened the Strait of Hormuz — a scenario that reads uncomfortably like a dress rehearsal. And the GitLab founder’s story — diagnosed with rare osteosarcoma, using AI to research treatment options, navigating a system that wasn’t built for patients who can read papers faster than their doctors — is the human face of all of this: the technology is real, the access is uneven, and the stakes are not abstract.
Blaise Agüera y Arcas gave a Long Now talk this week asking the simplest possible question: what is intelligence? Not what do models score, or what can agents automate, but what the word actually points to. Justin Skycak put it in one line: “You cannot delegate what you cannot audit.” And @jackfriks observed that “for the first time in human history you can do 1 week of work in 1 hour with Claude and 99% of people use this chance to work more instead of less.”
Back to the mirror. The steam always closes over. But the finger that traced the letters — the impulse to make something legible in a fogging world — that’s the thing no model scored above zero on this week. It might be the only thing that matters.
MZ
Why Scaling Alone Isn’t Enough for AGI (57 min)
The ARC-AGI creator on symbolic descent, NDIA, and why the next breakthrough won’t come from making GPT bigger.
Singularity’s Arrival (44 min)
Schmidt on why the singularity timeline is an energy problem, not an algorithm problem.
Black-hat LLMs (26 min)
Anthropic’s security researcher on how current models can now find zero-day vulnerabilities in battle-tested software.
Tech Bros Optimized War… and It’s Working (5 min)
Fireship’s sharpest video in months. Drone swarms, autonomous targeting, and what Palantir’s AIPCon actually showed.
Blaise Agüera y Arcas: What is Intelligence?
The question underneath all the benchmarks, asked without pretending the answer is simple.
Twitter Signals
Every frontier model scored below 1% on ARC-AGI-3. Untrained humans solved every challenge. @heygurisingh
“The unit of software production has changed from team-years to founder-days.” @garrytan
“You cannot delegate what you cannot audit.” @justinskycak
The Anthropic team reportedly doesn’t write code anymore. @om_patel5
Opus 5: reportedly 10T parameters, $10B to run, “so good it poses a danger.” @kimmonismus
“Building AI product is closer to tuning a musical instrument than writing software.” @signulll signull
“Prompting is actually harder than writing code because you need to understand what you actually want to do.” @KingBootoshi
“For the first time in history you can do 1 week of work in 1 hour with Claude and 99% of people use this to work more, not less.” @jackfriks
“AI won’t make most human skills obsolete, but it will change how they’re used.” @McKinsey_MGI
“While social media is polarising, evidence suggests AI may nudge people towards moderation.” @StefanFSchubert
Quick Links
ARC-AGI-3 — 135 novel environments, $2M in prizes, all solutions open-sourced;
How Claude Opened the Strait of Hormuz — Speculative fiction on AI in geopolitical crisis;
GitLab Founder’s Cancer Journey with AI — Using AI to navigate rare cancer treatment when you can read papers faster than your doctors;
Sanders/AOC Data Center Moratorium — Congressional push to pause construction pending environmental review;
What Anthropic Understood That OpenAI Didn’t — Constitutional AI vs scale-first;
Ensu — Ente’s Local LLM App — Private, local-first AI from the Ente Photos team;
China AI Anxiety — How AI displacement plays out differently in China;
Sycophantic AI Risks — Why AI over-affirmation is systemic, not a feature;
Amsterdam Court Bans Grok Undressing — Dutch court vs X’s AI feature;
If Artificial Insights makes sense to you, please help us out by:
📧 Subscribing to the weekly newsletter on Substack.
💬 Joining our WhatsApp group.
📥 Following the weekly newsletter on LinkedIn.
🦄 Sharing the newsletter on your socials.
Artificial Insights is written by Michell Zappa, CEO and founder of Envisioning, a technology research institute.





