#Development #Techniques
Poisoning well · An effort to dupe nasty AI crawlers with nonsense https://ilo.im/1632tq
_____
#AI #ChatBots #SEO #Content #Protection #RobotsTxt #WebDev #Backend #Frontend #HTML
#Business #Introductions
Meet LLMs.txt · A proposed standard for AI website content crawling https://ilo.im/16318s
_____
#SEO #GEO #AI #Bots #Crawlers #LlmsTxt #RobotsTxt #Development #WebDev #Backend
When the greed becomes automatist, something like this happens here:
»Open Source World – FOSS infrastructure is under attack by AI companies:
LLM scrapers are taking down FOSS projects' infrastructure, and it's getting worse.«
https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/
Tracked down my Forgejo CPU spikes with pprof: an otherwise acceptable crawler is indexing each commit of my personal weather station data. All 107,980 of them. Blame info, too.
Many Forgejo paths are nonsensical to crawl, even by good bots. Codeberg's robots.txt is a great start for these.
https://codeberg.org/robots.txt
This should both relieve pressure and expose more bad bots.
#Development #Reports
Google AI Mode is here · How to access it and control it with robots.txt https://ilo.im/162o8h
_____
#Business #Google #SearchEngine #AnswerEngine #AI #RobotsTxt #WebDev #Frontend #Backend
They are now back online!
FediDB is back? FediDB is back!!!
Update: It's been fixed. FediDB is now back online!
YES!
IT HAPPENED!
#FediDB is still working on the #NoRobots, #RobotsTxt...
Hi, got a question.
Is there a standard for Anti-AI/Anti-SEO etc robots.txt file? Or a trustworthy site that explains how to build one if prefab isn't available? Is there anything else I should consider?
Thanks.
#Development #Releases
AI Insights · Cloudflare Radar brings deeper insights into AI trends https://ilo.im/1626sk
_____
#Cloudflare #CloudflareRadar #Trends #AI #AiModels #Bots #RobotsTxt #WebDev #Frontend #Backend
Nepenthes
This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.
Already been reported...
FediDB is pausing all crawling because they gotta implement robots.txt support.
"We have paused all crawling as of Feb 6th, 2025 until we implement robots.txt support."
"Stats will not update during this period."
#fediVerse #AI #dataMining #robotsTXT #fediAdmin
This looks to me much more like that we should burry troyan horses right into the bellies of the beasts. My server rules and profiles state that all data is CC-BY-SA-NC.
If they use and train that data they definitely should become in serious legal and financial trouble.
Worrying is their self centered megalomanic ego trip, not realizing that they are the remaining world power, armed to their teeth with weapons of all kind, and with all the private data of the worlds population.
That said, having in mind that apparently you, being in charge of several #mastodon instances in the #fediVerse, are not able to fix the #robotsTxt of them while wasting time about talking of other countries internal affairs is kinda embarrassing.
sry
#meanWhile ..
.. the #mastodon community wastes it's time trying to pimp up the stars of it's #APP in #googlePlay, the #robotsTxt of it's instances disallows exactly one #AI bot scrapper to not search for all public data available about the #fediVerse. Not only on it's mother ship but on all instances, so the elonGated can create his target lists of "the enemy inside".
.. good job, well done! ..
> User-agent: GPTBot
> Disallow: /
New robots.txt just dropped https://github.com/ai-robots-txt/ai.robots.txt/releases/tag/v1.22