OpenAI's GPT-5 safety strategy is extensive, but I've seen this playbook before

The company has released a mountain of documentation on how it's keeping its most powerful models in check. The real question is whether any of it matters when things go wrong.

By Mark Kowalski

3 hours ago6 min de leitura

Crédito da imagem: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source

I spent most of last week reading through OpenAI's various system cards and technical reports for the GPT-5 family, and I have to tell you, my eyes started glazing over somewhere around page forty. Not because the material is bad (it's actually pretty thorough), but because I've been reading documents like these since before most of the kids at OpenAI were born.

The company has been busy. GPT-5.2 dropped recently, along with a specialized coding model called GPT-5.1-Codex-Max, a pair of open-weight safety models, and a whole new approach to what they're calling "safe-completions." There's also updated safety metrics for GPT-5.1 Instant and Thinking. That's a lot of model names to keep track of, and frankly I'm not sure why they need this many variants, but what do I know.

The documentation is extensive. I'll give them that. But extensive documentation and actual safety are two different things, and I've seen this movie before.

What are they actually doing differently?

The most interesting shift is something OpenAI calls "output-centric safety training," which they describe in a report titled From hard refusals to safe-completions. The basic idea is that instead of having the model just refuse to answer sensitive questions (which is annoying and often counterproductive), they're training it to provide helpful responses that are also safe.

This is, in a way, an admission that the old approach didn't work. Anyone who's used ChatGPT knows the frustration of asking a legitimate question about, say, chemistry or security vulnerabilities and getting a prim refusal. Meanwhile actual bad actors just jailbreak the thing anyway. So OpenAI is trying to thread a needle here, making the model more useful for normal people while still preventing misuse.

Cobertura relacionada

More in AI Models

Everyone's covering the parental controls. The real story is how OpenAI is trying to solve an almost impossible problem: age verification without surveillance.

James Chen · 1 hour ago · 7 min

The company is rapidly expanding where customer data can live, but the real question is whether this solves the problems enterprises actually have.

James Chen · 1 hour ago · 5 min

Three announcements in quick succession reveal OpenAI isn't just scaling up, it's building the backbone for AI that needs to think and respond in real-time.

Sarah Williams · 1 hour ago · 6 min

A string of partnerships with Foxconn, the DOE, and governments worldwide suggests OpenAI is becoming something very different from what it started as.

OpenAI's GPT-5 safety strategy is extensive, but I've seen this playbook before

What are they actually doing differently?

More in AI Models

What about the coding model?

What's this SafetyKit thing?

The open-weight models are actually interesting

The mental health stuff is new territory

So what

Fontes