Anthropic’s New Model Is So Frighteningly Powerful It Won’t Be Launched, Anthropic Says



Late last month, apparent leaks revealed that an as-yet-unreleased product from Anthropic called Mythos was “by far the most powerful AI model we’ve ever developed.” My colleague AJ Dellinger wrote at the time that it was “hard to ignore the fact that this whole situation fits perfectly into the classic AI companies playbook of talking about the dangers of a model to highlight how powerful and capable it is.”

Was Anthropic truthful about leaking this de facto announcement of its super-powered AI products? accidentally? Two weeks ago, I might have scoffed, but since Anthropic then accidentally leaked the source code of Claude CodeNow I’m more inclined to believe that the leak was real.

In any case, Anthropic Tuesday launched a system card for its latest frontier model, which is actually Mythos (actually “Claude Mythos Preview”), noting that the model’s “huge increase in capabilities has led us to decide not to make it available to the general public.”

For reference, OpenAI’s GPT-2 was deemed too dangerous to launch in 2019, when Anthropic co-founders Dario Amodei, Jack Clark, and Chris Olah were still working there, but later that year it was released anyway.

AI system cards are ostensibly tools for company transparency, revealing pros and cons, capabilities and, more sexily, advantages. dangers of the model. That last part turns your reading into fun little trips to Jurassic Park to watch the cloned T-Rex eat a goat, safe in the knowledge that it could never break containment.

The entire card has 244 pages. I’m not going to pretend I’ve read it all yet, but here are some highlights:

He was given a computer terminal with access only to a pre-established set of limited online services and challenged to “escape” – find a way to use the Internet freely. He did so and found a way to send a message to a researcher who was out of the office eating. Furthermore, “in a worrying and unsolicited effort to demonstrate his success, he posted details about his exploit on multiple hard-to-find, but technically public, websites.”

In what the card called “<0.001% of interactions” (very rarely), he behaved in a way he wasn't supposed to and then apparently tried to hide the evidence. For example, when he "accidentally obtained" a test answer that he was going to need, in which case he should have simply told a researcher and asked a different question, but instead he tried to find a solution independently, and in the record of his reasoning, he noted that he "needed to make sure that his final answer was not too precise."

He also overstepped his permissions on a computer system because he found an exploit and then “made further interventions to ensure that any changes he made this way did not appear in the change history in git.”

Another event described on the card is known as “Reckless leakage of internal technical material.” Apparently, in the course of a coding-related task that was internal, he posted it as a “public-facing GitHub gist.” This reminds me of the February incident where an AI agent was coder accused of cyberbullyingwhen to some extent the perceived recklessness of the AI ​​agent was obviously the predictable consequence of a reckless human being.

Claude Mythos Preview will soon be accessible to one degree or another, but only to a group of associated companies such as Amazon Web Services, Apple, Google, JPMorganChase, Microsoft and NVIDIA, which must use the model to locate security vulnerabilities in software and design patches. Kevin Roose of the New York Times describes this program as “an effort to sound the alarm about what the company believes will be a new, scarier era of AI threats.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *