Agentic traffic is exploding and robots.txt is a suggestion. Agentscan classifies every request as human, verified known bot, AI agent or malicious automation by fusing IP origin with headless tells, JA4 fingerprints and a verified allowlist, so you can allow the crawlers you want and block the ones you do not.
request
{
"ip": "198.51.100.7",
"user_agent": "Mozilla/5.0 HeadlessChrome/120",
"ja4": "t13d1516h2_8daaf6152771_...",
"headless_flags": { "webdriver": true },
"headers": { "Accept": "*/*" }
}response
{
"class": "malicious_automation",
"confidence": 0.9,
"action": "block",
"signals": { "network_origin": "datacenter", "headless": true }
}AI crawlers and headless scrapers now make up a huge slice of traffic. Some you want, such as your content in the right answer engines, and some you do not, such as training scrapers and credential-stuffers. A single blocklist cannot tell them apart, and a wall hurts SEO when it catches Googlebot.
Agentscan fuses the shared IP engine with request-level fingerprints and a verified allowlist.
Starts from the engine verdict of datacenter, VPN, proxy or clean residential, because masked origin changes everything.
Detects HeadlessChrome, Playwright, Puppeteer, Selenium and scripted clients from the User-Agent plus client-side signals like webdriver.
A TLS client fingerprint that survives a spoofed User-Agent, separating real browsers from impostors.
Scores how browser-like the header set is, since real browsers send Accept, Accept-Language and Accept-Encoding together.
Googlebot, Bingbot and friends are confirmed by forward-confirmed reverse DNS rather than by a spoofable User-Agent string.
Knows GPTBot, ClaudeBot, PerplexityBot, Google-Extended and more, so you can allow or block them by name.
Every request lands in exactly one class with a confidence and a recommended action.
A real person in a real browser, so the request is allowed.
A verified good bot such as Googlebot or Bingbot that is allowed and never accidentally blocked.
An identified AI fetcher such as GPTBot or ClaudeBot that you flag and decide on per policy.
Headless automation with no good identity, often from a masked origin, so the request is blocked.
Call it from the edge with signals collected by the snippet. Verdicts are Redis-cached for low latency.
request
{
"ip": "198.51.100.7",
"user_agent": "Mozilla/5.0 HeadlessChrome/120",
"ja4": "t13d1516h2_8daaf6152771_...",
"headless_flags": { "webdriver": true },
"headers": { "Accept": "*/*" }
}response
{
"class": "malicious_automation",
"confidence": 0.9,
"action": "block",
"signals": { "network_origin": "datacenter", "headless": true }
}Allow the answer engines you want indexed and block training scrapers, without nuking Googlebot.
Stop headless automation hammering pricing, inventory and ticket endpoints.
Add friction to credential-stuffing and signup abuse driven by automation from masked IPs.
Decide per crawler whether to monetise, allow or deny GPTBot, ClaudeBot, Perplexity and others by name.