I don't see how more advanced models won't get gated to specific known KYC'd entities. Classification-style guardrails will never be sufficient. Distillation attacks too are really hard to prevent. Open-source models can have their guardrails easily stripped away so it'll be incredibly dangerous to continue to release more and more capable OSS models that can and will be used to give bad actors 100x leverage.
I feel like this headline is a bit over-stated. There is not a ton of evidence it was about a jailbreak, and neither was there evidence that is was about retribution.
Ultimately, I bet Anthropic is fine with this because they needed to take Fable down to improve the guardrails (that were getting a ton of pushback) and they consider treating Fable as "too dangerous" to just be good PR hype for them. And they just get a little more anti-Trump "cred".
So the article calls it "knowledge gaps". Has technical expertise ever mattered when the law wants to ban or restrict something it doesn't like? The DMCA comes to mind.
Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak (theregister.com) 398 points | 6 hours ago | 223 comments
I suspect there's more to the story than has been reported too, but I'd like information to help turn those suspicions into something more concrete.
This seems too simple, and too complex at the same time.