Crédit photo: Image via The Verge — AI. Used under fair use for news commentary. · source
Most of the coverage around Claude Fable 5's launch went something like this: Anthropic releases its most powerful public model ever, Mythos-class capability finally available to everyone, huge deal, impressive benchmarks, et cetera. And sure, that's all true. But I think it missed the more interesting story sitting right underneath.
The more interesting story is that Anthropic shipped a model with hidden restrictions, got caught, and then had to apologize for it. In that order.
Fable 5 is Anthropic's first publicly available model from its Mythos family. That matters because Anthropic has spent months, honestly a pretty long time, warning that Mythos-class models are too capable to just release into the wild. The specific concern was cybersecurity and biology: models this good at those domains could, in theory, help bad actors do genuinely dangerous things.
So when Anthropic said it was releasing a Mythos-class model publicly, the obvious question was: okay, but how? The answer, it turns out, was guardrails. Blocks on responses in high-risk areas. TechCrunch reported that Fable comes with restrictions specifically around cybersecurity and biology queries. When the model hits one of those areas, it doesn't just refuse. It hands off to an older, less capable model, Claude Opus 4.8, and keeps going.
That part, the handoff, is where things get weird.
À lire aussi
More in AI Models
Google just slashed the cost of its AI Plus plan, and everyone's calling it a win for consumers. They're not wrong, but they're missing the bigger story.
Robert "Bob" Macintosh · 3 hours ago · 4 min
OpenAI has confidentially filed its S-1, joining a public market race that's ballooned to a staggering $3.6 trillion. Whether that number should excite or terrify you depends on who you ask.
Sarah Williams · 4 hours ago · 3 min
The payments company is betting a dollar-pegged token can get it into markets where traditional rails are slow and expensive. Whether that logic holds up is another question.
Here's a detail that I keep coming back to. The Verge reported that Fable 5 won't answer basic biology questions. Not advanced biosecurity stuff. Basic biology. The kind of questions a high schooler would handle without breaking a sweat.
And it's not that Fable doesn't know the answers. It does. Anthropic just won't let it respond. Instead, it silently routes the query to Opus 4.8, which handles it instead.
You might be wondering why that's a problem. If the answer still comes back, who cares which model answered it? The issue is that users didn't know this was happening. They thought they were talking to Fable. They weren't, at least not always. And if you're a developer building on top of Fable, or a researcher trying to evaluate its capabilities, that silent swap is a significant problem. You're not actually testing what you think you're testing.
ZDNet called it a "nerfed Mythos" and flagged that the pricing and the fallback architecture could make developers think twice. That framing is a bit harsh, but it's not wrong.
The biology angle is one thing. The cybersecurity angle is arguably messier.
TechCrunch reported that security researchers were not happy. The guardrails, designed to prevent misuse, are apparently broad enough to block legitimate security work. Penetration testers, vulnerability researchers, defensive security teams: these are people who need to ask exactly the kinds of questions Fable is now refusing to engage with.
This is the tension that's genuinely hard to resolve, tbh. Anthropic isn't wrong that a highly capable model could help someone do something harmful in a cybersecurity context. But the same capabilities that make it dangerous for attackers make it useful for defenders. Drawing that line in a way that doesn't just punish legitimate users is, I should know this better, but I don't think anyone has actually solved this cleanly yet. The researchers' complaints suggest Anthropic's current calibration is off.
The part that really got my attention came a day later. The Verge reported that Anthropic apologized for what it called "invisible" guardrails, specifically the silent handoff to Opus 4.8 that users couldn't see happening.
The company said it would reverse course and be more transparent about when restrictions kick in, even if that means Fable outright refuses more queries rather than quietly rerouting them. That's an interesting tradeoff to make explicitly: more visible refusals, fewer invisible workarounds.
I initially thought the apology was mostly about optics, a quick PR fix. But after reading the details more carefully, it seems like the distillation concern was actually substantive. If competitors or other AI labs are using Fable's outputs to train their own models, and the outputs are secretly coming from Opus 4.8 in some cases, that's not just a transparency issue. It's a data integrity issue. You're distilling from the wrong model without knowing it.
For users, the immediate change is that Fable will be more upfront when it can't help. That's better than a silent swap, even if it's more frustrating in the moment.
For the safety debate, this is a genuinely hard case to make sense of. Anthropic spent months saying Mythos was too dangerous to release. Then it released a version of Mythos. The guardrails are the argument that this is responsible. But if the guardrails are miscalibrated (blocking high schoolers' biology questions, frustrating legitimate security researchers), and if they were initially hidden, then the safety argument gets complicated. It raises questions about... well, multiple things. Whether the restrictions are actually well-designed, whether transparency about AI limitations is being treated as seriously as capability claims, and whether safety messaging holds up when commercial pressure to release is high.
It's too early to say whether Anthropic's revised approach will satisfy researchers or developers. The company says it's being more transparent going forward, but we don't know yet what that looks like in practice, or whether the guardrail calibration itself will change.
For the broader humanoid and embodied AI space, where I spend most of my time, this matters more than it might seem. Mythos-class models are the kind of thing that could eventually power physical systems in meaningful ways. The question of how you constrain a highly capable model without making it useless for legitimate applications isn't abstract. It's exactly the problem robotics developers are going to face as underlying models get more powerful.
Honestly, I'm not sure the Fable launch was handled well. The capability story was real. The safety story was real. But shipping hidden restrictions that silently reroute users to a different model, without telling anyone, is the kind of thing that erodes trust in ways that are hard to rebuild. Anthropic caught it, apologized, and is changing course. That's something. Whether it's enough probably depends on whether the underlying calibration gets fixed too.
FiberTune and APT each tackle a different failure mode in vision-language-action model training. Understanding why they matter requires knowing what they're actually solving.