New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously
9/10Researchers at Carnegie Mellon University developed a benchmark revealing that Anthropic's Claude Mythos and GPT-5.5 can autonomously exploit real vulnerabilities in Google's V8 engine, with Claude Mythos leading significantly in performance but incurring higher computational costs. This demonstrates Claude Mythos's advanced autonomous capabilities in security research.
