Cloudflare accuses Perplexity of secretive web crawling

Dark-toned illustration of a robot holding a key approaching a locked web page

This is a very shocking accusation, folks. According to Cloudflare, Perplexity is secretly crawling websites using deceptive tactics, including hidden user agents and rotating IP addresses. The AI company is allegedly bypassing robots.txt directives and ignoring webmasters who have clearly said, “Stay out.”

I’m honestly surprised Cloudflare would make this claim so publicly. These are serious allegations, and the company is taking a risk by calling out Perplexity directly. While Cloudflare isn’t an AI company itself, it’s a major player in internet infrastructure. So when it accuses someone of sneaking past digital barriers, the industry pays attention.

The issue started after Cloudflare customers reported that Perplexity was still accessing content even after being blocked. So Cloudflare ran a test. They registered new, undiscoverable domains and locked them down with strict robots.txt rules and firewall protections. Despite this, Perplexity reportedly returned detailed answers about the content hosted on these sites.

Cloudflare says Perplexity starts with its declared crawlers, but when blocked, it switches to a stealth approach. It mimics a standard Chrome browser on macOS and rotates IP addresses, making it hard to detect. These hidden crawlers are allegedly sending millions of requests per day and accessing data they were clearly told to leave alone.

By contrast, Cloudflare pointed to OpenAI as a responsible bot operator. When OpenAI’s ChatGPT-User sees a robots.txt block or is hit with a firewall page, it backs off. No stealth behavior. No rerouting. It also uses a proposed Web Bot Auth standard to identify itself more transparently.

Cloudflare believes good bots should stick to five basic rules: identify themselves, avoid stealth, explain their purpose, use separate bots for separate tasks, and follow site rules. Perplexity, they argue, isn’t doing any of that.

To respond, Cloudflare added new protections for all users, including those on free plans. These new rules can block or challenge undeclared bots using fingerprinting techniques. So even if a bot pretends to be a browser, Cloudflare says it can now detect and stop it.

Over 2.5 million sites have already opted out of AI crawling through Cloudflare’s Content Independence Day initiative, and this incident is only likely to push more site owners to do the same.

This whole situation raises a huge question: if these claims are true, what other AI companies might be using similar tactics? And if they’re not true, is Cloudflare opening itself up to legal trouble?

Either way, this report makes it clear that the battle over who controls online content is just getting started.

Author

Brian Fagioli

Brian Fagioli is a technology journalist and founder of NERDS.xyz. Known for covering Linux, open source software, AI, and cybersecurity, he delivers no-nonsense tech news for real nerds.

Cloudflare says Perplexity is using hidden crawlers to bypass site blocks and I’m honestly shocked

Author

Leave a Comment Cancel reply