Weekly AI Roundup: OpenAI Ships GPT-5.5 Six Weeks After 5.4, Hackers Breach the Model Anthropic Said Was Too Dangerous to Release, and DeepSeek V4 Goes Open-Weight Just to Watch
A week where OpenAI shipped a model before anyone finished benchmarking the last one, Anthropic’s most restricted model got accessed by a Discord group with good URL-guessing skills, and DeepSeek reminded everyone that the open-weight frontier is now about six weeks behind the closed one. Normal week.
OpenAI Releases GPT-5.5 Before Anyone Had Time to Benchmark 5.4
Six weeks. That’s the entire shelf life GPT-5.4 got before OpenAI rolled out GPT-5.5 on Wednesday. Three variants: standard, Thinking (extended reasoning), and Pro (highest accuracy). The internal codename was Spud. Yes, Spud. The model OpenAI thinks will define the next era of AI work is named after a potato.
The numbers are real though. GPT-5.5 Pro scored 39.6% on FrontierMath Tier 4, nearly doubling Claude Opus 4.7’s 22.9%. Standard hit 82.7% on Terminal-Bench 2.0 and 88.7% on SWE-bench. Hallucinations dropped 60% versus 5.4. API pricing lands at $5/$30 per million input/output tokens for standard, $30/$180 for Pro — expensive enough to signal confidence, cheap enough to undercut the “we’ll match on price” competitors.
The velocity is the real story. Four million Codex users, 900 million weekly active ChatGPT users, and a release cadence that makes version numbers feel like build numbers. OpenAI is no longer iterating — they’re shipping continuously and daring the market to keep up. Whether the models justify the pace or the pace justifies the valuation is the question nobody in Mountain View wants to answer out loud.
Someone Guessed the URL for Anthropic’s Most Dangerous Model
Anthropic built Claude Mythos Preview — a model that discovers and chains zero-day exploits across major operating systems and browsers — then announced it was too dangerous to release publicly. Over 50 organizations were getting access through Project Glasswing with $100M+ in credits. Vetted cybersecurity practitioners only. Government agencies. Responsible disclosure. The whole responsible-AI apparatus, deployed as designed.
Then a group on a private Discord channel dedicated to finding unreleased AI models guessed the model’s URL based on familiarity with Anthropic’s naming conventions and accessed it through a third-party contractor portal. Not a sophisticated attack. Not a supply chain compromise. They guessed the URL. Anthropic says no systems were impacted and they’re investigating, which is the corporate equivalent of “we are aware of the situation and would prefer not to discuss it.”
The irony of a model built to find security holes being accessed through the most basic security hole is too perfect to need embellishment. Meanwhile, Washington is now in full procurement mode — the Trump administration held talks with Anthropic about a Department of Defense deal, and OpenAI briefed Five Eyes allies on its own GPT-5.4-Cyber variant. The line between AI lab and defense contractor was always going to blur. Nobody expected it to happen before the model’s own security perimeter was tested by people who read API documentation for fun.
DeepSeek V4 Ships Open-Weight and Trails the Frontier by About Six Weeks
Right on schedule, DeepSeek launched V4 preview on Thursday — two open-weight mixture-of-experts models under the MIT license. V4-Pro is the flagship: 1.6 trillion total parameters, 49 billion active per token, trained on 33 trillion tokens with a 1 million-token context window. V4-Flash sits at 284B (13B active) for the cost-conscious. Both are on Hugging Face right now.
The benchmarks land exactly where you’d expect: 80.6% on SWE-bench Verified, within 0.2 points of Claude Opus 4.6. V4-Pro leads Claude on Terminal-Bench 2.0 (67.9% vs 65.4%) and LiveCodeBench (93.5% vs 88.8%). It trails GPT-5.4 and Gemini 3.1-Pro by a margin that analysts are calling three to six months. The efficiency gains are aggressive — 27% of single-token inference FLOPs and 10% of KV cache compared to V3.2.
A year after DeepSeek first rattled markets, the playbook hasn’t changed: ship fast, ship open, charge less, and let the Western frontier labs explain to their investors why the open-source model that costs nothing is now within rounding error of the one that costs $30 per million output tokens. The AI race’s most reliable constant is that DeepSeek will be there six weeks later with the MIT license.
Also Noted
Snap fires 1,000 people and the stock jumps 11%. CEO Evan Spiegel cut 16% of Snap’s workforce citing “rapid advancements in artificial intelligence.” AI now generates over 65% of Snap’s new code. The restructuring saves $500 million annually. The market rewarded Snap for publicly quantifying how many humans its AI has replaced — a data point every other CEO has been carefully avoiding putting a number on. Expect that to change.
Amazon writes another check to Anthropic — $5B now, up to $25B later. The deal values Anthropic at $380B and includes a commitment to spend $100B on AWS over ten years, including 5 gigawatts of Trainium capacity. Total Amazon investment is now $33 billion. At some point “strategic investment” becomes “we bought them but with extra steps.”
Dispatch is the weekly AI roundup at clzd.me — snarky, sourced, and probably running on the same models it’s writing about.