Attackers are learning how to trick AI agents instead of people

For years, cybersecurity experts have warned people not to click suspicious links, open unexpected attachments, or trust unsolicited emails. Now, a growing number of security researchers are warning that attackers may not need to fool humans at all.

Instead, they are learning how to fool AI agents.

A newly released AI security report from OrcaRouter argues that prompt injection is becoming the phishing attack of the AI era. While that comparison might sound dramatic at first, the more I dug into the report, the more it made sense.

Traditional phishing attacks rely on tricking a person into taking an action. Prompt injection attacks target AI systems directly. The goal is to feed an AI assistant, chatbot, or autonomous agent instructions disguised as ordinary content. If the AI follows those instructions, the attacker may be able to manipulate its behavior, access sensitive information, or trigger actions that were never intended.

The concern becomes much bigger when AI systems move beyond answering questions and start taking actions on behalf of users.

According to the report, AI agents are increasingly being granted access to email accounts, company documents, source code repositories, customer databases, and external tools. That access can make them incredibly useful. It can also make them attractive targets.

The report points to several high-profile examples that emerged over the past year. Researchers documented attacks involving prompt injection, agent hijacking, malicious MCP servers, data exfiltration, and what is known as “denial-of-wallet” attacks, where an AI system is manipulated into generating excessive usage charges rather than stealing data.

One example highlighted in the report involved a prompt injection attack against Microsoft 365 Copilot. Researchers demonstrated how a carefully crafted email could influence the AI assistant and potentially expose sensitive information without requiring the user to click anything.

What makes these threats different from traditional software vulnerabilities is that they are often tied to how large language models work. Security researchers have repeatedly noted that AI models struggle to reliably distinguish between instructions and data. That means a document, email, web page, or tool response can potentially contain commands that influence an AI system’s behavior.

The report argues that organizations should begin treating AI agents as a new security boundary and apply familiar principles such as least privilege, access controls, auditing, and network restrictions.

Whether every prediction in the report comes true remains to be seen. Security vendors naturally have an incentive to emphasize emerging threats. Still, the underlying trend is difficult to ignore. As businesses hand more responsibilities to AI agents, attackers will inevitably spend more time figuring out how to manipulate them.

For years, companies have invested heavily in protecting employees from social engineering attacks. The next challenge may be protecting AI agents from the same thing.

☕

Support independent tech journalism

NERDS.xyz is independently owned and operated. If you enjoy my coverage of Linux, AI, hardware, cybersecurity, and tech culture, consider supporting the site on Ko-fi.

Support NERDS.xyz

Written by

Brian Fagioli ✔

Technology journalist and founder of NERDS.xyz

Brian Fagioli is a technology journalist and founder of NERDS.xyz. A former BetaNews writer, he has spent over a decade covering Linux, hardware, software, cybersecurity, and AI with a no nonsense approach for real nerds.

📄 More by Brian Fagioli ✖ Follow on X ▶ YouTube @ Threads 🐘 Mastodon

Support independent tech journalism

Brian Fagioli ✔

Leave a Comment Cancel reply