Anthropic released a 244-page system card for Claude Mythos, its most capable model to date, which the company is withholding from general release. The stated reason: Mythos is too effective at discovering unknown cybersecurity vulnerabilities. Microsoft and Apple have access. The public does not.
The system card doubles as a philosophical document. Anthropic states directly that as models grow more powerful, 'it becomes increasingly likely that they have some form of experience, interests, or welfare that matters intrinsically in the way that human experience and interests do.' The company is not claiming certainty. It is claiming its concern is growing.
The full 244 pages are worth reading not for the model benchmarks but for what Anthropic is willing to put in writing about AI welfare, consciousness, and the decision-making behind restricted releases. The psychiatry angle is the hook. The system card is the story.
[READ ORIGINAL →]