Home » Latest Insights » Protect Your Sites from AI Bots

Happy mature Latin man using laptop at home - Technology and smart working concept

As artificial intelligence continues to reshape the digital landscape, web administrators need to be aware of a new source of automated traffic: AI agents.

In particular, OpenAI’s latest AI agent—known as Operator—can autonomously browse websites to perform tasks for its users. While these agents are designed to improve productivity by automating everyday tasks such as filling out forms, ordering groceries, or filing expense reports, they can also generate unplanned traffic on your site. In this article, we explain what OpenAI Operator is, how it might impact your website traffic, and what technical measures you can implement to block such unwanted requests.

What Is OpenAI Operator?

OpenAI Operator is one of the company’s first AI agents—a research preview tool currently available to ChatGPT Pro users in the United States. Powered by a model called the Computer-Using Agent (CUA), Operator leverages the vision and reasoning capabilities of GPT-4o to interact with websites as if it were a human user. It can click links, scroll pages, fill out forms, and complete multi-step tasks autonomously.

Potential for Unintended Automated Traffic

While Operator’s primary purpose is to help users automate routine tasks, its ability to navigate and interact with the web means that it can also generate automated requests to websites. For site owners, this can translate into:
  • Increased Crawl Rates: Operator might visit your pages repeatedly or in an unplanned pattern, which could overload your server resources.
  • Scraping and Data Extraction: The agent could inadvertently collect data from your site, possibly violating your content usage policies.
  • Unwanted Interactions: Automated interactions might skew your site’s analytics, affecting metrics such as bounce rate and user engagement.
Since Operator (and similar AI agents) operates by simulating human behavior, its requests may not be easily distinguishable from those of genuine users—unless you implement specific measures to detect and block them.

Using robots.txt

As a first line of defense, you can instruct compliant bots not to crawl your site by adding directives to your robots.txt file. Although robots.txt is voluntary and won’t stop malicious or non-compliant bots, it is a useful method for guiding well-behaved crawlers.
Create or update your robots.txt file with the following content:

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

How to Block AI Bot Traffic

Web administrators have several options to block requests coming from AI agents like Operator. A common strategy is to examine the User-Agent header in incoming HTTP requests. OpenAI’s agents tend to include distinct substrings in their User-Agent strings—for example:
  • OAI-SearchBot
  • ChatGPT-User
  • GPTBot

Because these identifiers are version agnostic (the version numbers may change over time), you can block any request that contains these key tokens. Below are examples for several popular web server platforms:

Nginx

Add the following snippet inside your server block in your Nginx configuration file. The regular expression is case‑insensitive and matches any User-Agent that contains one of the target tokens:
if ($http_user_agent ~* "(OAI-SearchBot|ChatGPT-User|GPTBot)") {
return 403;
}

Optional: To be more explicit about optional version numbers, you can use:

if ($http_user_agent ~*
"
(OAI-SearchBot(?:\/[\d\.]+)?|ChatGPT-User(?:\/[\d\.]+)?|GPTBot(?:\/[\d\.]+)?)") {
return 403;
}

Apache (.htaccess)

You have two common approaches with Apache—using environment variables or mod_rewrite.
Option 1: Using SetEnvIfNoCase
Place these lines in your site’s .htaccess file:
SetEnvIfNoCase User-Agent "OAI-SearchBot" bad_bot
SetEnvIfNoCase User-Agent "ChatGPT-User" bad_bot
SetEnvIfNoCase User-Agent "GPTBot" bad_bot

Order Allow,Deny
Allow from all
Deny from env=bad_bot
Option 2: Using mod_rewrite
Alternatively, add the following rewrite rules:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (OAI-SearchBot|ChatGPT-User|GPTBot) [NC]
RewriteRule .* - [F,L]

The [NC] flag ensures the match is case‑insensitive.

IIS

For IIS, you can use the URL Rewrite module in your web.config file. Insert this rule inside the section:
<rule name="Block OpenAI Bots" stopProcessing="true">
<match url=".*" / >
<conditions>
<add input="{HTTP_USER_AGENT}"
pattern="(OAI-SearchBot|ChatGPT-User|GPTBot)" />
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden"
statusDescription="Access Denied" /&lgt;
</rule>

Caddy (v2)

In Caddy, define a named matcher that uses a regular expression on the User-Agent header. For example, in your Caddyfile:
@openaiBots {
header_regexp User-Agent (?i)(OAI-SearchBot|ChatGPT-User|GPTBot)
}

handle @openaiBots {
respond 403
}
The (?i) flag ensures the regular expression is case‑insensitive.

Final Thoughts

OpenAI’s Operator is an exciting development in AI automation—capable of performing tasks on behalf of its users without constant human oversight. However, if you run a website, you need to be aware that such AI agents may generate automated traffic that could have unintended consequences. Whether you’re concerned about server load, data scraping, or skewed analytics, the techniques outlined above for Nginx, Apache, IIS, and Caddy provide robust options for blocking requests from these agents.
By proactively monitoring and managing your site’s access controls, you can protect your site from unanticipated bot traffic while still welcoming genuine users. Stay vigilant, and consider these configurations as part of your broader web security strategy.
Shopping Basket