While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data

Wait 5 sec.

From the personal blog of interface expert Bruce Ediger:Early in March 2025, I noticed that a web crawler with a useragent string of meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) was hitting my blog's machine at an unreasonable rate. I followed the URL and discovered this is what Meta uses to gather premium,human-generated content to train its LLMs. I found the rate ofrequests to be annoying. I already have a PHP program that creates the illusion of an infinite website. I decided to answer any HTTP request that had"meta-externalagent" in its user agent string with the contentsof a bork.php generated file... This workedbrilliantly. Meta ramped up to requesting 270,000 URLs on May 30 and31, 2025...After about 3 months, I got scared that Meta's insatiableconsumption of Super Great Pages about condiments, underwear andcirca 2010 C-List celebs would start costing me money. So I switchedto giving "meta-externalagent" a 404 status code. I decided tosee how long it would take one of the highest valued companies in theworld to decide to go away.The answer is 5 months.Read more of this story at Slashdot.