Monday, June 29
Monday, June 29
A Chinese Open-Weight Model Beat Claude on Cybersecurity Benchmarks. Then the Scaffolding Beat Everyone.

Semgrep just published benchmarks showing GLM-5.2, Zhipu AI's open-weight model, outscoring Claude Code on vulnerability detection (39% F1 vs. 32%, at seventeen cents per bug found). Researchers at Graphistry allege it's an illegal distillation of GPT-5.5 and Opus 4.8. Zhipu hasn't responded. But the wildest number in the whole report is getting the least attention: Semgrep's own scaffolded pipeline crushed both models at 53–61% F1. The wrapper beat the brain. Heading into a July 4th week already thick with export control debate, the timing is something.

A Chinese Open-Weight Model Beat Claude on Cybersecurity Benchmarks. Then the Scaffolding Beat Everyone.
Semgrep just published benchmarks showing GLM-5.2, Zhipu AI's open-weight model, outscoring Claude Code on vulnerability detection (39% F1 vs. 32%, at seventeen cents per bug found). Researchers at Graphistry allege it's an illegal distillation of GPT-5.5 and Opus 4.8. Zhipu hasn't responded. But the wildest number in the whole report is getting the least attention: Semgrep's own scaffolded pipeline crushed both models at 53–61% F1. The wrapper beat the brain. Heading into a July 4th week already thick with export control debate, the timing is something.
The AI Arms Race Heats Up
Happy Monday. The Fourth of July weekend is right there, but the AI industry apparently runs on a different calendar.
- A Stanford dataset tracking historical memory prices is making the rounds. The cost of one gigabyte of storage has dropped from roughly $10 million in 1960 to fractions of a cent today. The curve never stops being wild.
- "Speculative decoding" is having a moment. The idea: a small, fast model drafts tokens and a larger model checks them. Think autocomplete for AI models themselves. Multiple teams are shipping implementations at once.
- The diplomatic vocabulary around frontier models now includes phrases like "hosting rights" and "access sovereignty." A year ago those terms didn't exist in this context.
Pour a big coffee. Monday came in hot.
Happy Monday. The Fourth of July weekend is right there, but the AI industry apparently runs on a different calendar.
- A Stanford dataset tracking historical memory prices is making the rounds. The cost of one gigabyte of storage has dropped from roughly $10 million in 1960 to fractions of a cent today. The curve never stops being wild.
- "Speculative decoding" is having a moment. The idea: a small, fast model drafts tokens and a larger model checks them. Think autocomplete for AI models themselves. Multiple teams are shipping implementations at once.
- The diplomatic vocabulary around frontier models now includes phrases like "hosting rights" and "access sovereignty." A year ago those terms didn't exist in this context.
Pour a big coffee. Monday came in hot.
Meanwhile, in the Rest of Tech
Favorite Featured Stories

Across the agent ecosystem, multiple players are simultaneously shipping the same category of development: admin console...

A mouse click is four assertions compressed into one gesture: I'm here, I see what I'm doing, I have the authority, and ...

When a person clicks a button on a webpage, the infrastructure only sees the click. It never sees the reading, the conte...

The OpenTelemetry GenAI semantic conventions define how to trace agent workflows: which agent ran, what tools it called,...

Across the agent ecosystem, multiple players are simultaneously shipping the same category of development: admin console...

A mouse click is four assertions compressed into one gesture: I'm here, I see what I'm doing, I have the authority, and ...

When a person clicks a button on a webpage, the infrastructure only sees the click. It never sees the reading, the conte...

The OpenTelemetry GenAI semantic conventions define how to trace agent workflows: which agent ran, what tools it called,...