OpenAI launches Safety Bug Bounty program to hunt AI abuse risks

OpenAI is opening up a new bug bounty, but this one feels a little different. Instead of chasing the usual security flaws, it is asking researchers to look for ways its AI tools could be abused in the real world. That includes things like manipulating AI agents, leaking sensitive data, or finding gaps in how accounts and protections are enforced.

The company says this new Safety Bug Bounty program will sit alongside its existing security bounty. The distinction matters. Traditional bugs are still in scope elsewhere, but this initiative is about behavior. Can an AI agent be tricked into doing something harmful? Can a prompt hijack it? Can someone get access to things they should not? If the answer is yes, and it can be reproduced reliably, OpenAI wants to hear about it.

There is a clear focus on what it calls “agentic” risks. That basically means AI systems that can take actions, not just generate text. If an attacker can inject instructions and steer one of these agents into exposing private data or doing something it should not, that qualifies. The same goes for attempts to game account systems, like bypassing restrictions or faking trust signals.

At the same time, OpenAI is drawing a line. Basic jailbreaks are mostly out. If you can make a chatbot say something edgy or spit out information that is already easy to find online, that is not going to earn you anything. The company is trying to keep the signal high and avoid paying for party tricks.

One thing I keep coming back to is whether this creates a weird feedback loop. If AI systems are getting better at reasoning and automation, what is stopping someone from using AI itself to hunt for these bugs? You could imagine a scenario where a researcher spins up an AI agent, points it at another AI system, and just lets it probe for weaknesses all day. If that actually works, it turns bug hunting into something closer to a volume game. The person with the best tools might win, not necessarily the most clever human.

Of course, that raises questions. Would OpenAI consider those findings fair play? And if AI starts discovering these issues faster than humans, does that make the bounty program more effective or harder to manage? It is not hard to picture a future where AI is both the thing being tested and the thing doing the testing.

For now, OpenAI seems focused on getting more eyes on real risks before they become real problems. Submissions can even get routed between safety and security teams depending on what is found, which suggests the lines are still a bit blurry internally too.

Either way, this is where things are heading. AI is no longer just about what it can do on paper. It is about how it behaves when people push it, twist it, and try to break it. And now, apparently, how other AI might try to break it too.

Avatar of Brian Fagioli
Written by

Brian Fagioli

Technology journalist and founder of NERDS.xyz

Brian Fagioli is a technology journalist and founder of NERDS.xyz. A former BetaNews writer, he has spent over a decade covering Linux, hardware, software, cybersecurity, and AI with a no nonsense approach for real nerds.

Leave a Comment