Good bots, bad bots and the troublesome ones in between
Envoy's e-commerce director Branwell Moffat talks about how bots are becoming a threat to e-commerce retailers, and why online businesses shouldn't ignore it.
Over the past 18 months we at Envoy have seen a significant increase in the number of intelligent web-bots that target specific e-commerce websites. At the very least these bots are an annoyance and at the worst they can compromise your website or even effectively cause a DDOS attack and take your website down.
Web bots have been around for a very long time and we all benefit from many of them. There are good ones (like Googlebot or Bingbot) and there are bad ones that automatically attempt to hack a web application or inject spam into websites. The good ones are generally beneficial and the bad ones can often be dealt with by a solution such as a Web Application Firewall (WAF) that will recognise malicious requests and block them.
The problematic bots are often those that sit in between good and bad. These can be hard to detect as they will often impersonate a normal user and make requests that, on their own and in isolation, are perfectly safe and legitimate. Although their intention is normally something other than a DDOS attack, the effect can sometimes be the same when they are either too aggressive or too many instances of a bot hit a website at once.
These bots are used commercially for a number of reasons including:
- Automatic purchasing of products (aggressive purchase bots can cause severe performance issues during product launches)
- Aggregation of content (your content can be passed off as someone else's)
- Competitor price analysis (competitors can use this data to undercut you)
- Aggressive content crawling (Aggressive crawlers can put strain on your web platform)
Here is a real-world example of a commercial bot that has caused a lot of issues:
We have a client who often sells limited edition exclusive products which are very sought after. These products can often fetch 3 times the RRP when sold on eBay and the retailer will only have a limited supply. Most of these products have a coordinated world-wide launch and therefore the exact time of the launch is well known. In the last 18 months we have increasingly seen extremely aggressive bots used in the many thousands to attempt to purchase these products to an extent where the performance of the e-commerce platform can be seriously compromised.
In this instance, the bots have been specifically designed for this retailer's website and know the exact requests that need to made to add the product to the basket and go through the checkout. They don't even need to visit the product display page. They are normally distributed across multiple cloud servers with multiple instances of the bot installed on each server. Because the launch time is public and coordinated, the bots all start to attempt to add the product to the basket and go through the checkout at the exact same time, normally many thousands at a time. The record we have seen is 3 million attempts to purchase a single product in a 12 hour period.
Because the requests are all legitimate and the bot is impersonating a real user, it can be hard to block the bots quickly enough before they do the damage. There is no point in waiting 1 minute to record how many requests a particular IP has made and, if the number is over a certain threshold, you then block them. By this point, the damage has already been done and you have tens of thousands of bots in your checkout.
The bots also disadvantage real users as you can guarantee that the bots will be first in the queue to get the products. Although the retailer obviously still gets the sale, they can lose brand loyalty because of this.
So how do you manage these good/bad bots?
Many organisations, such as CDNs, have been rapidly developing bot management solutions over the last year in response to the increasing problems with bots that retailers are facing. Some, such as Akamai's bot manager solution, can be very sophisticated in the way that they attempt to identify a bot but also with the options it will give the retailer in how they deal with the bot. Simply blocking the bot is not always the answer. If they know they have been blocked, they can just jump to another IP or evolve. A better solution is to fool them by showing them the wrong content (maybe higher prices) or just slow them down. This is also a useful technique to use for bots that are only harmful because they are too aggressive in their crawling. You don't want to block them altogether but you do want to slow them down a little to reduce the impact on your infrastructure.
Although a bot manager solution is certainly a useful tool, it is never going to identify and stop all bots and, in the real-world instance detailed above, by the time it would possibly identify the user as a bot, it would be too late as the damage would already be done. Bots will constantly adapt and evolve to stop bot managers blocking them and so it is a moving target.
The solution to effectively managing these bots is multi-faceted. There is no one, single, solution that will catch everything and give you all of the control you need. Different services and solutions will give protection in different areas against different types of bots. Only by deploying multiple defences and solutions can you effectively manage these bots.
There are 4 different areas that should be considered when building a bot management strategy:
A CDN can be a first line of defence against malicious or troublesome traffic. The ideal CDN configuration ensures that all requests to your web application, whether cachable or not, are filtered through the CDN. You can then use tools that the CDN will provide such as a WAF, bot manager or even some basic rate limiting rules to protect your website against the most obvious bots.
Many retailers will have a WAF layer sitting between their CDN and their hosting infrastructure. A high-quality WAF, such as Imperva WAF, can be used to automatically detect and block malicious requests such as those made by bad bots. Additionally, custom rules can be added to recognise and block or limit those bots that are not malicious but can be troublesome.
Application caching layer
Implementing a tool such as Varnish that sits between your firewall and your web application can not only improve speed and performance but can also be used to limit the impact of aggressive bots. A number of Varnish modules (Vmods) are available that can be used to effectively limit the rate of requests being made to specific urls.
Changes can be made to your application to protect it from aggressive or troublesome bots. For example, using simple tools like Google reCAPTCHA at relevant times or even introducing initiatives such as a raffle for the purchase of exclusive and limited edition products so that these products cannot be purchased in the conventional way will prevent the bots from being successful.
It is important to consider implementing some or all of the solutions above rather than just relying on one of them as each will provide defence against these bots in slightly different ways. For example, if you simply relied on an application change to prevent purchasing bots they will still be hammering the rest of your infrastructure and even cause issues such as filling apache or Varnish logs files to an extent that your server could run out of disk space.
In summary, bots are becoming an increasing commercial threat to e-commerce retailers and dealing with them effectively can be very complex. Estimates of how much web traffic is actually human versus bots vary but the general consensus is that up to 50% of all users on the web are bots. If you consider this number and the amount of bandwidth and capacity that they will use and the fact that around 50% of that bot traffic is from 'bad' or malicious bots, it is not something that you should ignore.
Some useful further reading: