Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Referrer-Domain] Microsoft Search Engine Spiders are blocked! #534

Open
HKPhysicist opened this issue Sep 30, 2023 · 11 comments
Open

[Referrer-Domain] Microsoft Search Engine Spiders are blocked! #534

HKPhysicist opened this issue Sep 30, 2023 · 11 comments
Assignees

Comments

@HKPhysicist
Copy link

HKPhysicist commented Sep 30, 2023

My sites have joined MS Clarity and MS search spiders began to crawl my sites frequently. nginxrepeatoffender began to jail their IPs.
Their general domain names are:
msnbot-xxx-xxx-xxx-xxx.search.msn.com

How do I whitelist them? In where xxx-xxx-xxx-xxx is a general IPv4 IP address.

@mitchellkrogza
Copy link
Owner

Please post some log line examples

@HKPhysicist
Copy link
Author

Hello,
Here are some IPs from .search.msn.com which I saw today on fail2ban log files. They are not the same every time.

2024-03-30 21:48:43,595 fail2ban.filter [743]: INFO [nginxrepeatoffender] Found 40.77.167.41 - 2024-03-30 21:48:43
2024-03-30 21:53:24,688 fail2ban.filter [743]: INFO [nginxrepeatoffender] Found 52.167.144.20 - 2024-03-30 21:53:24
2024-03-30 23:12:03,360 fail2ban.filter [743]: INFO [nginxrepeatoffender] Found 52.167.144.20 - 2024-03-30 23:12:02
2024-03-30 23:12:51,766 fail2ban.filter [743]: INFO [nginxrepeatoffender] Found 52.167.144.20 - 2024-03-30 23:12:51
2024-03-30 23:12:52,299 fail2ban.actions [743]: NOTICE [nginxrepeatoffender] Ban 52.167.144.20

@HKPhysicist
Copy link
Author

HKPhysicist commented Jul 19, 2024

any one may try this regex for MS Clarity and MS Search but I am not sure it is 100% correct.

"~*(?:\b)msnbot-'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(-|$)){4}\b'.search.msn.com(?:\b)" 0;

@HKPhysicist
Copy link
Author

another one gives these to check IPv4
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

@HKPhysicist
Copy link
Author

MS Search IPs are still banned today!!!

@mitchellkrogza
Copy link
Owner

Hello, Here are some IPs from .search.msn.com which I saw today on fail2ban log files. They are not the same every time.

2024-03-30 21:48:43,595 fail2ban.filter [743]: INFO [nginxrepeatoffender] Found 40.77.167.41 - 2024-03-30 21:48:43 2024-03-30 21:53:24,688 fail2ban.filter [743]: INFO [nginxrepeatoffender] Found 52.167.144.20 - 2024-03-30 21:53:24 2024-03-30 23:12:03,360 fail2ban.filter [743]: INFO [nginxrepeatoffender] Found 52.167.144.20 - 2024-03-30 23:12:02 2024-03-30 23:12:51,766 fail2ban.filter [743]: INFO [nginxrepeatoffender] Found 52.167.144.20 - 2024-03-30 23:12:51 2024-03-30 23:12:52,299 fail2ban.actions [743]: NOTICE [nginxrepeatoffender] Ban 52.167.144.20

I need to see your web access log not the fail2ban log

@HKPhysicist
Copy link
Author

HKPhysicist commented Sep 16, 2024

I need to see your web access log not the fail2ban

Hello, 2 more records from today.
40.77.167.1 - - [16/Sep/2024:19:59:12 +0800] "GET /search?c=37&l=35070 HTTP/2.0" 200 15725 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36"

40.77.167.50 - - [16/Sep/2024:19:59:41 +0800] "GET /search?l=20386&distance=300&c=1 HTTP/2.0" 200 16164 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36"

@mitchellkrogza
Copy link
Owner

mitchellkrogza commented Sep 16, 2024

20 more new records containing bingbot from today's log.

40.77.167.50 - - [16/Sep/2024:19:57:11 +0800] "GET /search?c=30&l=11890 HTTP/2.0" 200 15228 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 40.77.167.1 - - [16/Sep/2024:19:59:12 +0800] "GET /search?c=37&l=35070 HTTP/2.0" 200 15725 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 40.77.167.50 - - [16/Sep/2024:19:59:41 +0800] "GET /search?l=20386&distance=300&c=1 HTTP/2.0" 200 16164 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.173 - - [16/Sep/2024:20:02:41 +0800] "GET /search?l=46789&c=30&sc=31 HTTP/2.0" 404 3328 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 40.77.167.1 - - [16/Sep/2024:20:05:04 +0800] "GET /search?c=37&l=42770 HTTP/2.0" 200 15688 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.173 - - [16/Sep/2024:20:05:21 +0800] "GET /search?orderBy=date&c=114&l=26071&sc=121 HTTP/2.0" 200 15263 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 207.46.13.54 - - [16/Sep/2024:20:06:05 +0800] "GET /search?c=37&l=20521 HTTP/2.0" 200 15633 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.173 - - [16/Sep/2024:20:06:32 +0800] "GET /search?c=54&l=2596&distance=0 HTTP/2.0" 200 15198 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.173 - - [16/Sep/2024:20:06:34 +0800] "GET /search?l=46688&c=30&sc=34 HTTP/2.0" 404 3330 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 207.46.13.54 - - [16/Sep/2024:20:06:45 +0800] "GET /search?l=43079&orderBy=rating&c=62 HTTP/2.0" 200 15245 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.198 - - [16/Sep/2024:20:07:16 +0800] "GET /search?c=97&sc=113&l=7835 HTTP/2.0" 200 15365 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.198 - - [16/Sep/2024:20:07:17 +0800] "GET /search?l=46717&distance=500&c=114 HTTP/2.0" 404 3326 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.173 - - [16/Sep/2024:20:07:22 +0800] "GET /search?l=8224&c=97&sc=109 HTTP/2.0" 200 15365 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.198 - - [16/Sep/2024:20:08:05 +0800] "GET /search?c=62&sc=67&l=1279 HTTP/2.0" 200 16202 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.198 - - [16/Sep/2024:20:08:08 +0800] "GET /search?l=43079&distance=0&c=14 HTTP/2.0" 200 15993 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 207.46.13.54 - - [16/Sep/2024:20:08:10 +0800] "GET /search?c=73&l=2596&distance=500 HTTP/2.0" 200 15784 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.173 - - [16/Sep/2024:20:08:32 +0800] "GET /search?orderBy=date&c=114&l=26151&sc=121 HTTP/2.0" 200 15257 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.173 - - [16/Sep/2024:20:10:52 +0800] "GET /search?l=3889&c=97&sc=112 HTTP/2.0" 200 15380 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.173 - - [16/Sep/2024:20:10:53 +0800] "GET /search?c=62&l=46722 HTTP/2.0" 404 3329 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 52.167.144.198 - - [16/Sep/2024:20:11:07 +0800] "GET /search?l=6231&c=97&sc=109 HTTP/2.0" 200 15365 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36"

Bing Webmaster Tools

According to this log all the bingbots are getting a 200 OK and some are getting a 404 NOT FOUND - so according to this they are not being blocked unless they got a 444 error (default)

@HKPhysicist
Copy link
Author

According to this log all the bingbots are getting a 200 OK and some are getting a 404 NOT FOUND - so according to this they are not being blocked unless they got a 444 error (default)

I see. Thanks for your reply.
I will report again when there is bingbot 444 code.

@HKPhysicist
Copy link
Author

New MS Search 444 access today. Please advise.
40.77.167.55 - - [05/Oct/2024:00:34:13 +0800] "GET /globe-valves-dealers-in-kolkata/ HTTP/2.0" 444 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 40.77.167.55 - - [05/Oct/2024:01:24:58 +0800] "GET /search?l=547&orderBy=distance&c=97 HTTP/2.0" 444 0 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36"

@HKPhysicist
Copy link
Author

This software is quite nice to web masters.

I recommend this to all web masters. Everybody should donate and keep it running. ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants