Major technology companies are currently engaged in an unbridled, lawless scramble for the most valuable data, harvesting material even when its authors expressly forbid its use by artificial-intelligence firms. Everyone has played dirty at some point, but Perplexity appears to be the worst offender, so media outlets and businesses that already have agreements with the company should ask themselves whether partnering with an organisation that shows not the slightest trace of ethics is advisable.
According to Cloudflare, Perplexity first tries to obtain content for free from the open web and, when it encounters a block—usually a robots.txt file—it refuses to give up and comes back to steal the material by switching its user agent and masking its origin. Despite these tactics, Cloudflare says it has caught Perplexity in the act.
It all began with complaints from clients who, although they did everything in their power to tell Perplexity that their content was not authorised for use by its AI, discovered that their data was nevertheless being scanned and employed. After an experiment, they confirmed that Perplexity was indeed taking the content in full knowledge that it was barred.
The same experiment with ChatGPT showed that OpenAI is being careful and honouring the wishes of webmasters who do not wish to cede their material. Of course, Sam Altman’s company and other reputable AI firms are also harmed by Perplexity’s bad practices, not just content creators.
Cloudflare is becoming better known—if anything—for its determination to defend content creators, their copyrights and the free web against big-tech misbehaviour. It is only fair to note, however, that Cloudflare itself is embroiled in a dispute with Spain’s professional football league, La Liga, for hosting unauthorised content.
Update: Perplexity’s response
Perplexity has not remained silent in a controversy that gravely threatens its image. The company argues that its information scans should not be considered an ordinary spider, because they are triggered and requested by users—something it says makes a big difference.
It also says it neither crawls nor stores that information, but merely serves it on user request, regardless of any restrictions on the data. Perplexity therefore maintains that it is not retaining prohibited content.
Perplexity further insists that its service does not behave like a traditional autonomous AI spider and that Cloudflare is confusing those spiders with legitimate queries from users and helpful AI assistants. In short, Perplexity contends that Cloudflare is, in its view, blocking legitimate traffic and that the dispute is based on a misunderstanding of the technology.
In this writer’s opinion, Perplexity’s explanations are at best weak, and it should definitely refrain from using or serving content whose rightful owner has explicitly forbidden its use. Ethics, after all, can neither be scraped nor masked.
* Original article written in Spanish, translated with chatGPT and reviewed in English by Jorge Mediavilla.
Leave a Reply