AI bots (OpenAI ChatGPT et al) - comment les bloquer - Didier J. MARY (blog)

Bon, il faut commencer à lutter contre les IA... en leur coupant l'accès à nos contenus.

Je copicolle ici pour plus tard... Voyez le site pour plus de précisions (vraiment très bien expliqué)

1. robots.txt

👉 Rappel, il se met à la racine du site

On y colle

# AI crawlers
User-agent: anthropic-ai
User-agent: Bytespider 🆕
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: cohere-ai
User-Agent: FacebookBot
User-agent: Google-Extended
User-agent: GPTBot
User-agent: Omgilibot
Disallow: /

2. le header

On y ajoute <meta name="robots" content="noai, noimageai">

3. le fichier ai.txt

Apparemment pas encore obligatoire mais qui pourrait le devenir.

# Spawning AI
# Prevent datasets from using the following file types

User-Agent: *
Disallow: *.txt
Disallow: *.pdf
Disallow: *.doc
Disallow: *.docx
Disallow: *.odt
Disallow: *.rtf
Disallow: *.tex
Disallow: *.wks
Disallow: *.wpd
Disallow: *.wps
Disallow: *.html
Disallow: *.bmp
Disallow: *.gif
Disallow: *.ico
Disallow: *.jpeg
Disallow: *.jpg
Disallow: *.png
Disallow: *.svg
Disallow: *.tif
Disallow: *.tiff
Disallow: *.webp
Disallow: *.aac
Disallow: *.aiff
Disallow: *.amr
Disallow: *.flac
Disallow: *.m4a
Disallow: *.mp3
Disallow: *.oga
Disallow: *.opus
Disallow: *.wav
Disallow: *.wma
Disallow: *.mp4
Disallow: *.webm
Disallow: *.ogg
Disallow: *.avi
Disallow: *.mov
Disallow: *.wmv
Disallow: *.flv
Disallow: *.mkv
Disallow: *.py
Disallow: *.js
Disallow: *.java
Disallow: *.c
Disallow: *.cpp
Disallow: *.cs
Disallow: *.h
Disallow: *.css
Disallow: *.php
Disallow: *.swift
Disallow: *.go
Disallow: *.rb
Disallow: *.pl
Disallow: *.sh
Disallow: *.sql
Disallow: /
Disallow: *

4.le .htaccess

👉 Rappel, il se met à la racine du site

On y ajoute

RewriteCond %{HTTP_USER_AGENT} (anthropic-ai|Bytespider|CCBot|ChatGPT-User|FacebookBot|GPTBot|Omgilibot) [NC]
RewriteRule ^ – [F]

5.l'IP

En bloquant ces adresses:

20.9.164.0/24
20.15.240.64/28
20.15.240.80/28
20.15.240.96/28
20.15.240.176/28
20.15.241.0/28
20.15.242.128/28
20.15.242.144/28
20.15.242.192/28
52.230.152.0/24
Via https://sebsauvage.net/links/

✍ Écrire un commentaire

les commentaires relevant du SPAM seront filtrés et dégagés direct...

Quelle est le cinquième caractère du mot 2mqgyx6e ?