Over the past few years, bad bots have become a significant problem for many server administrators and website operators. These bad bots will often relentlessly target servers, performing thousands of requests and retrieving huge amounts of data in a short period of time.
These activities can cause a spike in server resource usage, which affects the performance of a server and makes it much slower for normal visitors. In some cases, high load from aggressive bots can make a server less stable, causing websites to crash or become unresponsive.
Fortunately, there are several approaches available for blocking bad bots – crawlers, and scrapers. In this post, I am going to explain what aggressive bots, scrapers, and crawlers are, and the risk they pose. Then, I’ll share the best techniques for dealing with aggressive Chinese bots.
Internet bots are software applications that perform automated tasks over the Internet. The two most common types of bots operating online are crawlers and scrapers.
Crawlers will visit websites to read and assess content, including xml sitemaps, images, links, and HTML documents. Crawling is mostly performed by search engines to assess the content on websites. Although crawling isn’t usually performed with malicious intent, overly-active crawlers can use a substantial amount of server resources.
Scrapers are programmed to extract data from websites. They are often quite sophisticated, using AI like techniques to complete web forms and access the information they require. In many cases, scrapers use websites in unintended ways, exploiting the services that are provided to normal users.
Aggressive bots can perform hundreds of requests per second, eating up valuable server resources including RAM, CPU, and hard drive space. This can dramatically affect server response time and performance.
An server inundated with aggressive bots may experience:
On top of these types of technical problems, Aggressive bots are often malicious and looking for ways to exploit server resources.
So, how do you know that you have a bot problem? There are several warning signs, including:
There are several strings often found in the user agents data of bad bots, and this is why this the most effective way of blocking bad bots is by blacklisting several strings on the user-agent header. A few examples would be:
As a matter of fact, we have recently observed 4 different bad bots to be aggressively crawling websites on our cPanel Server Management customer servers. We’ll use them as an example below and we would also suggest to keep them on your list.
If you don’t know where to find your server logs, check the following paths depending on your main web server. Look for many requests with weird strings in the user-agent and block them.
For cPanel Servers, you can find your log files located on the following path for Apache:
/etc/apache2/logs/domlogs
Apache Web Server Log Files
/var/log/apache2 or /var/log/httpd
Nginx Web Server Log Files
/var/log/nginx
You may check the log files real-time by issuing the following command (adjust your path based on your install and domain name):
tail -f /var/log/apache2/example.com.access.log
Append the following lines in your .htaccess file
RewriteEngine On RewriteCond %{HTTP_USER_AGENT (LieBaoFast|UCBrowser|MQQBrowser|Mb2345Browser) [NC] RewriteRule .* - [F,L]
As soon as the lines above are placed in your .htaccess file any user-agents matching words “LieBaoFast or UCBrowser or MQQBrowser or Mb2345Browser” will receive a 403 response from your web server..
In order to block bad user agents in Nginx, you will need to edit the nginx vhost file for the respective website and then restart nginx.
Edit /etc/nginx/sites-enabled/<yoursite>.conf and place the following configuration inside your server { } block.
if ($http_user_agent ~ (LieBaoFast|UCBrowser|MQQBrowser|Mb2345Browser) ) { return 403; }
Exit and save your configuration file and in order for changes to take effect reload your nginx web server by issuing the following command
Make sure your nginx configuration looks good by issuing the following command:
nginx -t
If no errors show up you can reload your configuration using the following command:
service nginx reload
While this solution will stop some of the latest aggressive bad bots, you may come across with more bad bots having unusual user agents. Check server logs to discover the more user agents, then block or rate limit each one. You can also find more bad bots by visiting BotReports.com.
If only a very small number of bots are aggressively targeting your server, you can block their IP addresses with a firewall rule. However, many people find that blocking a bad bot in this way can cause it to become more aggressive as it sends bots on additional IPs to quickly retrieve more data.
The large number of IPs created by bad bots is most likely because they are using from huge bot farms or the malware-compromised computers of Internet users. The IP network of bad bots seems to be never-expanding, so it can be difficult to keep up when manually blocking IPs.
csf -d IPHERE
If you have observed that bad bots are coming from IP’s belonging to a specific country and you are certain that you are not expecting any legitimate traffic on your website from that country you can block the entire country.
1. Edit the file /etc/csf/csf.conf
nano /etc/csf/csf.conf
2. In order for the country block to work, you’ll need to sign up with an account on maxmind.com and use the serial number on your csf.conf (It’s FREE):
MM_LICENSE_KEY = "YOUR LICENSE HERE"
3. Navigate few lines down and look for the following option
CC_DENY_PORTS = ""
4. To block a country you’ll need to find the country’s code from maxmind.com and place it on the setting below. In our example, if we use CN then we’re blocking China.
CC_DENY_PORTS = "CN"
5. Next, we adjust a setting to block only web ports (80 and 443). That way your server can still communicate with email servers, dns resolvers, etc.
CC_DENY_PORTS_TCP = “80,443”
6. After adjusting the setting, close and save the configuration file. Then restart CSF using the following command:
csf -r
Thanks for reading Bad Bots Blocking – Apache, Nginx & CSF. For more server admin hint and tips, subscribe to our blog.
These cookies relate to the storage of (or access to) information either for the sole purpose of carrying out the transmission of a communication over an electronic communications network or in order for us to provide a service explicitly requested by you. Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
These cookies are exempt from the requirement of consent, which means that you cannot reject their use, since it would not be possible for us to provide our services.
Cookie | Type | Duration | Description | Cookie ID |
---|---|---|---|---|
cookielawinfo-checkbox-advertisement | 1 | 11 months 29 days 23 hours 59 minutes | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category ''Advertisement''. | cookielawinfo-checkbox-advertisement |
cookielawinfo-checkbox-analytics | 1 | 11 months 29 days 23 hours 59 minutes | This cookies is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category ''Analytics''. | cookielawinfo-checkbox-analytics |
cookielawinfo-checkbox-necessary | 0 | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not a user has consented to the use of cookies. It does not store any personal data. | cookielawinfo-checkbox-necessary |
cookielawinfo-checkbox-non-necessary | 0 | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary". | cookielawinfo-checkbox-non-necessary |
cookielawinfo-checkbox-other | 1 | 11 months 29 days 23 hours 59 minutes | No description | cookielawinfo-checkbox-other |
viewed_cookie_policy | 0 | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not a user has consented to the use of cookies. It does not store any personal data. | viewed_cookie_policy |
wp_woocommerce_session_9395ef9dc7e3839eb429f53aa1742b7d | 1 | 1 days 23 hours 59 minutes | This cookie is set by Woocommerce to keep track of cart items. | wp_woocommerce_session_9395ef9dc7e3839eb429f53aa1742b7d |
These cookies collect information about the way you use the Website, e.g. the pages you visit the most, the website from which your visit originated and other related information. These cookies collect aggregate, anonymous, statistical data which cannot be used to identify visitors individually. They are used for the sole purpose of analysing traffic and improving the Website performance.
Cookie | Type | Duration | Description | Cookie ID |
---|---|---|---|---|
_ga | 1 | 1 years 11 months 28 days 23 hours 59 minutes | This cookie collects information on how visitors use the website and their storage duration is defined by Google’s usage policy. Google Analytics’ cookies collect certain information, including a pseudonymized IP address, the number of visitors to the Website, where they came from, and the pages they visited within the Website. We use the information to compile reports that help us improve our Website. For more information please see Google privacy policy | _ga |
_gid | 1 | 23 hours 59 minutes | This cookie collects information on how visitors use the website and their storage duration is defined by Google’s usage policy. Google Analytics’ cookies collect certain information, including a pseudonymized IP address, the number of visitors to the Website, where they came from, and the pages they visited within the Website. We use the information to compile reports that help us improve our Website. For more information please see Google privacy policy | _gid |
These cookies are used to track your behavior across websites and deliver adverts more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited our Website and this information is shared with other organisations such as advertisers.
Cookie | Type | Duration | Description | Cookie ID |
---|---|---|---|---|
fr | 1 | 2 months 28 days 23 hours 59 minutes | The cookie is set by Facebook to show relevant advertisments to the users and measure and improve the advertisements. The cookie also tracks the behavior of the user across the web on sites that have Facebook pixel or Facebook social plugin. | fr |
_fbp | 1 | 2 months 28 days 23 hours 59 minutes | This cookie is set by Facebook to deliver advertisement when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website. | _fbp |