Look, I’ll be honest with you folks, I’ve been a little burned out on OpenAI announcements lately. It started to feel like every new ChatGPT update was a reshuffling of the same deck, with a shinier name slapped on the box. So when GPT-5.4 landed today, I went in skeptical. I’m happy to report that this one actually gives me a reason to sit up straight.
OpenAI released GPT-5.4 today across ChatGPT, the API, and Codex. The company is positioning it as its most capable and efficient frontier model for professional work, and for once, the benchmarks seem to back that up rather than just repackage what was already there. A “Pro” version has also been released for users who need maximum performance on the most demanding tasks.
The headline numbers are genuinely hard to argue with. On GDPval, a benchmark that tests AI agents across 44 real-world occupations ranging from accounting to manufacturing, GPT-5.4 matched or beat industry professionals in 83.0 percent of comparisons. GPT-5.2, the model it replaces as the default thinking option in ChatGPT, only hit 70.9 percent. That is not a minor incremental tick.
On OSWorld-Verified, which tests a model’s ability to navigate a desktop environment using screenshots and keyboard and mouse actions, GPT-5.4 hit a 75.0 percent success rate, up from GPT-5.2’s 47.3 percent. That figure actually clears the human performance benchmark of 72.4 percent.
The computer use angle is probably the most interesting part of this release. GPT-5.4 is OpenAI’s first general-purpose model with native computer-use capabilities baked in, meaning agents built on it can actually operate computers and carry out complex workflows across different applications. It also supports up to 1 million tokens of context, which gives developers a lot more room to plan, execute, and verify tasks over longer horizons.
Is this the kind of thing that translates into everyday usefulness for the average person? Maybe not yet, but the developers building on top of this stuff will absolutely feel the difference.
Spreadsheet and presentation work got specific attention this time around, too. On an internal benchmark of spreadsheet modeling tasks typical of a junior investment banking analyst, GPT-5.4 scored 87.3 percent against GPT-5.2’s 68.4 percent. Human raters preferred GPT-5.4’s presentation output 68.0 percent of the time, citing stronger aesthetics and better use of image generation.
For anyone using ChatGPT as part of an actual professional workflow, that is meaningful. For developers working with the API, GPT-5.4 introduces something called tool search, which addresses a real pain point. Previously, a model given a large set of tools had all those definitions loaded into the prompt upfront, adding tokens and slowing everything down.
With tool search, the model gets a lightweight list and looks up tool definitions only when it needs them. In testing, this reduced total token usage by 47 percent on multi-tool tasks while maintaining the same accuracy. That kind of efficiency improvement has real cost implications.
On the hallucination front, OpenAI says GPT-5.4’s individual claims are 33 percent less likely to be false and full responses are 18 percent less likely to contain any errors compared to GPT-5.2. Those are meaningful numbers for anyone using the model in legal, financial, or editorial work where accuracy is not optional.
In ChatGPT, GPT-5.4 Thinking rolls out today for Plus, Team, and Pro subscribers, replacing GPT-5.2 Thinking as the default. GPT-5.2 Thinking will stick around for three months in a legacy section of the model picker before being retired on June 5, 2026. Enterprise and Edu customers can enable early access through admin settings.
On the API side, GPT-5.4 comes in at $2.50 per million input tokens and $15.00 per million output tokens, up from GPT-5.2’s $1.75 and $14.00 respectively. GPT-5.4 Pro runs $30.00 per million input tokens and $180.00 per million output tokens.
Support independent tech journalism
NERDS.xyz is independently owned and operated. If you enjoy my coverage of Linux, AI, hardware, cybersecurity, and tech culture, consider supporting the site on Ko-fi.
Support NERDS.xyz