Cloudflare alleges Perplexity is still playing dirty

Cloudflare accuses Perplexity of disguising its crawlers to bypass robots.txt and harvest protected content, while Perplexity claims the scans are legitimate user-requested queries.

By Jorge Mediavilla. Jorge is a journalist with a degree from the Complutense University of Madrid. He has worked at Yahoo!, PRISA Group and many others. In 2018, he founded CMS MAG in Spain and is now recognized as an expert by leading CMS and DXP companies and media outlets. Today, he works as a freelancer, specializing in CMS migration as his main service.

Published:

8 August 2025

Is your project no longer Google’s favorite?

Switch your CMS: enhance performance, CWV, taxonomy and more!

Ask CMS MAG experts

Major technology companies are currently engaged in an unbridled, lawless scramble for the most valuable data, harvesting material even when its authors expressly forbid its use by artificial-intelligence firms. Everyone has played dirty at some point, but Perplexity appears to be the worst offender, so media outlets and businesses that already have agreements with the company should ask themselves whether partnering with an organisation that shows not the slightest trace of ethics is advisable.

According to Cloudflare, Perplexity first tries to obtain content for free from the open web and, when it encounters a block—usually a robots.txt file—it refuses to give up and comes back to steal the material by switching its user agent and masking its origin. Despite these tactics, Cloudflare says it has caught Perplexity in the act.

It all began with complaints from clients who, although they did everything in their power to tell Perplexity that their content was not authorised for use by its AI, discovered that their data was nevertheless being scanned and employed. After an experiment, they confirmed that Perplexity was indeed taking the content in full knowledge that it was barred.

The same experiment with ChatGPT showed that OpenAI is being careful and honouring the wishes of webmasters who do not wish to cede their material. Of course, Sam Altman’s company and other reputable AI firms are also harmed by Perplexity’s bad practices, not just content creators.

Cloudflare is becoming better known—if anything—for its determination to defend content creators, their copyrights and the free web against big-tech misbehaviour. It is only fair to note, however, that Cloudflare itself is embroiled in a dispute with Spain’s professional football league, La Liga, for hosting unauthorised content.

Update: Perplexity’s response

Perplexity has not remained silent in a controversy that gravely threatens its image. The company argues that its information scans should not be considered an ordinary spider, because they are triggered and requested by users—something it says makes a big difference.

It also says it neither crawls nor stores that information, but merely serves it on user request, regardless of any restrictions on the data. Perplexity therefore maintains that it is not retaining prohibited content.

Perplexity further insists that its service does not behave like a traditional autonomous AI spider and that Cloudflare is confusing those spiders with legitimate queries from users and helpful AI assistants. In short, Perplexity contends that Cloudflare is, in its view, blocking legitimate traffic and that the dispute is based on a misunderstanding of the technology.

In this writer’s opinion, Perplexity’s explanations are at best weak, and it should definitely refrain from using or serving content whose rightful owner has explicitly forbidden its use. Ethics, after all, can neither be scraped nor masked.

* Original article written in Spanish, translated with chatGPT and reviewed in English by Jorge Mediavilla.

Is Your CMS Falling Short?

Is your CMS or DXP slow, outdated, unreliable, insecure, disliked by your employees, stagnant or overly expensive? It’s possible to improve and save money! Talk to CMS MAG experts!

Switch your CMS now!

Popular articles

Cloudflare alleges Perplexity is still playing dirty

Is your project no longer Google’s favorite?

Is Your CMS Falling Short?

LATEST posts

We attended a demo of WP Engine Newsroom: First impressions

A new study confirms that your traffic drop is not only due to AI Overviews, there is something else

José Luis Núñez Fernández-Vivanco, new CEO of Protecmedia

Free Google Ads credit, why you shouldn’t use it