230 million people use ChatGPT for health questions every week. That number alone justifies why OpenAI is treating health as a core capability, not a feature. GPT-5.5 Instant now matches frontier Thinking models on OpenAI's most challenging health evaluations, and it is free to all users.
The meaningful gains are specific: better recognition of when urgent care is needed, more relevant follow-up questions, clearer explanation of uncertainty, and more readable explanations of complex information. Behind the benchmarks is a physician-led evaluation network that defines what good responses look like, reviews model outputs, and maps failure modes. This is not a vibe check. It is a structured clinical feedback loop.
What makes the full piece worth reading is the detail on how those health evaluations are actually constructed and what the physicians are measuring. The methodology matters here because it is the mechanism OpenAI is using to close the gap between a capable language model and something that behaves reliably in high-stakes medical contexts. That gap is not closed yet, but the infrastructure to close it is being described openly.
[WATCH ON YOUTUBE →]