Skip to content

Commit

Permalink
Pattern updates for better recognition (#276)
Browse files Browse the repository at this point in the history
  • Loading branch information
omrilotan authored Aug 19, 2024
1 parent 1ab8da3 commit 66cdef1
Show file tree
Hide file tree
Showing 6 changed files with 19 additions and 5 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## [5.1.17](https://github.com/omrilotan/isbot/compare/v5.1.16...v5.1.17)

- [Pattern] Pattern updates for better recognition

## [5.1.16](https://github.com/omrilotan/isbot/compare/v5.1.15...v5.1.16)

- [Pattern] Treat CCleaner broswer as an actual browser, not a bot
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,8 @@ Recognising good bots such as web crawlers is useful for multiple purposes. Alth

`isbot` is an asset when it can most accurately identify bots by the user agent string. It uses expansive and regularly updated lists of user agent strings to create a regular expression that matches bots and only bots.

And above everything else, it is maintained by a community of contributers who help keep the list up to date.

### Fallback

The pattern uses lookbehind methods which are not supported in all environments. A fallback is provided for environments that do not support lookbehind. The fallback is less accurate. The test suite includes a percentage of false positives and false negatives which is deemed acceptable for the fallback: 1% false positive and 75% bot coverage.
Expand Down
5 changes: 5 additions & 0 deletions fixtures/browsers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ Brave:
- Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Brave Chrome/76.0.3809.132 Safari/537.36
Camino:
- Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en; rv:1.9.2.28) Gecko/20120308 Camino/2.1.2 (like Firefox/3.6.28)
CamScanner:
- Mozilla/5.0 (Linux; Android 13; 2201116SI Build/TKQ1.221114.001; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/125.0.6422.165 Mobile Safari/537.36 CamScanner/4.5.0.2305051722
- Mozilla/5.0 (Linux; Android 14; SM-S918B Build/UP1A.231005.007; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/124.0.6367.123 Mobile Safari/537.36 CamScanner/4.8.5.2310271827
CCleaner:
- Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 CCleaner/122.0.0.0
- Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36 CCleaner/125.0.0.0
Expand Down Expand Up @@ -579,6 +582,8 @@ Snapchat:
- Mozilla/5.0 (iPhone; CPU iPhone OS 13_1_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Snapchat/10.69.5.72 (iPhone11,6; iOS 13.1.3; gzip)
- Mozilla/5.0 (iPhone; CPU iPhone OS 13_3_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Mobile/15E148 Snapchat/10.77.5.59 (like Safari/604.1)
- Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Snapchat/11.36.0.36 (like Safari/604.1)
- Mozilla/5.0 (Linux; Android 11; RMX2001 Build/RP1A.200720.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/124.0.6367.179 Mobile Safari/537.36 Snapchat/12.87.0.44 (RMX2001; Android 11#1647528410731#30; gzip; )
- Mozilla/5.0 (Linux; Android 13; I2011 Build/TP1A.220624.014; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/125.0.6422.54 Mobile Safari/537.36 Snapchat/12.89.0.40 (I2011; Android 13#eng.compil.20240430.095616#33; gzip; )
Snowshoe:
- Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.21 (KHTML, like Gecko) Snowshoe/1.0.0 Safari/537.21
Sogou Explorer:
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "isbot",
"version": "5.1.16",
"version": "5.1.17",
"description": "🤖/👨‍🦰 Recognise bots/crawlers/spiders using the user agent string.",
"keywords": [
"bot",
Expand Down
2 changes: 1 addition & 1 deletion src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import { fullPattern } from "./pattern";
/**
* Naive bot pattern.
*/
const naivePattern = /bot|spider|crawl|http|lighthouse/i;
const naivePattern = /bot|crawl|http|lighthouse|scan|search|spider/i;

let pattern: RegExp;
export function getPattern(): RegExp {
Expand Down
9 changes: 6 additions & 3 deletions src/patterns.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
"(?<![hg]m)score",
"@[a-z][\\w-]+\\.",
"\\(\\)",
"\\.com",
"\\b\\d{13}\\b",
"\\.com\\b",
"\\btime/",
"^<",
"^[\\w \\.\\-\\(?:\\):]+(?:/v?\\d+(?:\\.\\d+)?(?:\\.\\d{1,10})*?)?(?:,|$)",
"^[^ ]{50,}$",
Expand Down Expand Up @@ -44,6 +44,7 @@
"^jigsaw",
"^microsoft bits",
"^movabletype",
"^mozilla/5\\.0\\s[a-z\\.-]+$",
"^mozilla/\\d\\.\\d \\(compatible;?\\)$",
"^mozilla/\\d\\.\\d \\w*$",
"^navermailapp",
Expand Down Expand Up @@ -78,6 +79,7 @@
"^zdm/\\d",
"^zoom marketplace/",
"^{{.*}}$",
"adscanner/",
"analyzer",
"archive",
"ask jeeves/teoma",
Expand Down Expand Up @@ -120,6 +122,7 @@
"iplabel",
"ips-agent",
"java(?!;)",
"jsjcw_scanner",
"library",
"linkcheck",
"mail\\.ru/",
Expand All @@ -146,7 +149,7 @@
"rexx;",
"rigor",
"rss\\b",
"scan",
"scanner\\.",
"scrape",
"server",
"sogou",
Expand Down

0 comments on commit 66cdef1

Please sign in to comment.