Invalid traffic & ad fraud

Crawler bots and web scrapers: How to protect your site

Last updated:

Dec 11, 2024

Web scrapers & crawler bots are more powerful and prevalent than ever. Learn how to protect your online business from web scraping.

Modern Facebook ICon
Modern Twitter Icon
Modern Linkedin Icon
Crawler bots and web scrapers: How to protect your site

Rebecca Munton

Content Writer

Read about the author

Related Content

Read More

Online bots have a bad reputation. From ticket scalpers to price scraping, bots are affecting how we shop and socialize online. This is largely because web scrapers and crawler bots are becoming ever more powerful and prevalent. One report predicts that web scraping impacts up to 14.7% of annual eCommerce site revenue.

Web scraping is especially common during major shopping events like Black Friday and Cyber Monday. A 2023 DataDome report found that a huge 98% of holiday season bot attacks involve scraping and/or scalping using crawler bots and other automated tools.

This puts eCommerce sites at particular risk of web scrapers. So how can you protect your site against these malicious bot attacks?

Here, we’ll dive into everything you need to know about web scraping, and offer actionable steps to defend your site (and your revenue) from crawler bot attacks.

How web scraping works

Web scraping is the process of extracting online data at scale using bots and other automated tools. This data is then aggregated and used depending on the scraper’s needs. Commonly scraped data includes:

  • Pricing information — Often used by aggregator sites to present and compare pricing. Unscrupulous competitors can also use this to understand your pricing strategy and undercut you on price.
  • Contact information — Personal data is increasingly valuable, so information like email addresses and phone numbers are often the target of web scraping.
  • Product data — Resellers can scrape product details to learn about competitors’ market positioning, branding, and other useful info. This might include professional photos, videos and content that belong to the original product owner.

Web scraping is much faster than manually collecting and arranging information, so users can quickly access and analyze scraped data. This table shows common web scraping tools and how they work:

 Web scraper tool How it works
 Crawler bots  Crawler bots scour the target URL(s) for relevant data based on instructions from the bot creator. It indexes relevant data and stores this in a database for later analysis.
APIs  Some websites provide an API interface that allows developers to access limited data from their site.
Custom scripts  Developers can write custom scripts that then target specific web pages, downloading and extracting information directly from the site HTML.
Third party services  Internet services use the tools above to scrape sites on a customer’s behalf, then share the collated data with the customer.

Web scraping isn’t always malicious — many legitimate sites use scraping to provide genuine services for customers. For example, aggregator travel sites will scrape and display information from individual airlines and hotels to help customers find better deals.

But web scraping is also used for many immoral and even illegal practices, from stealing data to ticket scalping. This diagram from Akamai shows the intent and consequences of web scraping across some of the most affected industries:

(Source)

So where (and how) do we draw the line between ethical, unethical, and illegal web scraping?

Is web scraping illegal? 

Essentially, no, web scraping isn’t illegal. That’s why you’ll see plenty of sites advertising scraping services when you search for it on Google.

Scraping publicly available information in order to save time on manually collecting the information is generally OK (although it does depend on how you use this data — more on that shortly).

However, it is illegal to scrape private data that’s protected by passwords, paywalls, and other barriers. This includes any crawling or scraping activity that goes against a website’s terms of service. You must also abide by any copyright and/or fair use laws that apply.

To confuse matters further, the legality of web scraping also depends on where it takes place. Scraping personal information in a country subject to the General Data Protection Regulation (GDPR) may have different consequences in an area with fewer regulations around collecting personal data.

Why is scraping malicious?

While web scraping isn’t technically illegal, it’s a bit of a gray area. That’s because it can be used to damage other businesses and individuals if used improperly. It’s often the intent behind web scraping (and its subsequent use) that makes it illegal (or at least unethical).

For example, scraping real estate sites to aggregate information and display them on a search engine like Zillow is OK. You don’t intend to undercut the original estate agent or use this information for any shady purpose.

However, if one agency scrapes the site of a competing agency with the intent to poach house sellers, this is pretty unethical. It could even be illegal if they scraped (or attempted to scrape) the personal data of sellers without permission.

Not all examples are this clear-cut. In one real case, data analytics firm hiQ Labs scraped data from public LinkedIn profiles to inform their insights and provide these to clients. LinkedIn subsequently served hiQ Labs with a cease and desist letter, claiming the company was in violation of the Computer Fraud and Abuse Act (CFAA).

But hiQ Labs fought back, and eventually a district court determined hiQ Labs was likely to succeed as the data available on LinkedIn was already public. 

So the actions of hiQ Labs might not be illegal, but they do raise ethical questions around the use of personal data by unauthorized third parties for profit.

It’s hard to legislate for these gray areas, as David Emm, lead researcher at Kaspersky, explains:

“It’s hard to see how you could abolish [web scraping] without seriously harming certain parts of the economy. It’s difficult to say ‘we could put a stop to it completely’ any more than you could say ‘let’s abandon the sale of sharp knives because we know they’re used for criminal behavior’.”

Because there’s a lack of regulation, the practice is open to abuse from product resellers, unscrupulous competitors, and other bad actors.

How bots and web scraping harm businesses

Bots are a threat to businesses in many different ways, with web scraping posing a particular threat to eCommerce stores and businesses that operate online.

So how exactly do crawler bots and web scrapers damage businesses? 

1. Unfair competition and reseller abuse

Competitors can use crawler bots to collect valuable product information, such as pricing and marketing content. They can use this information to aggressively undercut the original seller, especially if they’re a smaller business without the resources to counteract this.

Certain resellers are also known to buy up quantities of valuable or limited stock around Black Friday and the holiday season, then sell this on to customers at a vastly inflated price.

These tactics negatively affect buyers as well as businesses, which can give your company a bad reputation if you’re often targeted by bad traffic bots like scalpers or scrapers.

2. Skewing site analytics

Sending crawler bots to target specific sites will inevitably skew their analytics, leaving them with inaccurate data. Low quality data has a huge impact on eCommerce sites and other online businesses, as it can:

  • Prevent you from spending your ad budget effectively.
  • Stop you making effective data-driven decisions.
  • Limit the success of automated campaigns like Performance Max.

Skewing your site analytics can have a bigger impact on your bottom line than you might think. Advertisers in particular should be aware of the risks of bot traffic on your PPC campaigns. 

3. Stealing intellectual property

From product photos to ad taglines, businesses invest a lot of money in marketing. So stealing and reusing these assets via content scraping is considered unethical, even though the information is in the public domain.

These screenshots show the original web page content and a reseller site for the same item:

(Source)

This can be especially galling if a reseller sells your product at an inflated price, especially if the new site or product page ranks higher in search engines than the original.

4. Phishing

Scraping contact information from any business website puts staff at risk of phishing attacks. Phishing attacks can provide malicious users with sensitive data or even give them access to your systems.

Here’s a phishing email example:

Phishing attacks are getting harder to spot, so it’s important to keep staff fully trained on how to spot phishing emails, calls, and messages.

5. Cyber extortion

Web scraping can form part of online ransom attacks. While not technically ransomware, hackers can find cracks in a business’s online security system and use web scraping tools to steal data or other valuable assets. 

In Europe and many US states, fines for data leaks can be extremely costly. Bad actors can take advantage of this by demanding large sums of money in exchange for not reporting the data leak. This is known as cyber extortion, and it’s becoming increasingly common.

In June 2024, NHS patient data was stolen as part of a cyber extortion attack, leading to serious disruption across multiple NHS trusts.

6. Cart abandonment scams

Cart abandonment scams happen when scalper bots add all the stock of an in-demand product to their cart without checking out. As a result, the product is shown as out of stock on a legitimate site, driving buyers to secondary marketplaces.

Scrapers can contribute to these scams by scraping launch dates and stock quantities from sites that publish this information.

7. Black Friday and Cyber Monday scams

While malicious bot attacks are damaging at any time, they often ramp up around Black Friday and Cyber Monday. Researchers found a 7.5% increase in spend on Black Friday 2023 compared with 2022, with buyers spending a record $9.8 billion online in the US alone.

Scrapers seek to capitalize on this retail frenzy with an increase of bot activity targeting shops and shoppers alike. In 2022, scraping attacks increased by 43% on the approach to the holiday season. eCommerce platforms can expect to see five to 30 times more bot traffic than usual during these events.

So what can you do to defend your business against bots and web scrapers at peak shopping times, as well as the rest of the year?

How to protect your site from bots and web scrapers

Bots are rife, so pretty much all sites are subject to invalid traffic (IVT). Without proper protection against IVT, it’s likely web scrapers and other crawler bots will impact your site in some way.

But even if you don’t have specific anti-scraping software, there are a few ways to mitigate the impact of IVT on your site:

  • Ensure your terms of service are clear and accessible — Include an acceptable usage policy within your terms of service. Within this, state that scraping is prohibited on your site and ensure any consequences are made clear.
  • Implement anti-bot measures on your site — Some tests (such as suspicious IP throttling, CAPTCHAs, and honeypots) can help human users access your site while blocking bots. These can hinder less advanced bots, though sophisticated bots may bypass them fairly easily.
  • Generate dynamic product pages — While this method won’t stop bots visiting your site, dynamically generating product pages makes it more difficult for scrapers to locate and collect the information they want.
  • Be prepared to take legal action — If web scrapers have violated your terms of service or other wider regulations, you may be able to take legal action. That said, it’s not always easy to pinpoint exactly who is scraping your site.

You can also invest in invalid traffic mitigation solutions. Lunio helps protect websites against bots, including web scrapers. When bots visit your site, Lunio works to detect them and automatically block their IP addresses.

Protecting your site with Lunio also helps minimize the risk of skewed site statistics, while also preventing unauthorized scraping of your online assets. In addition, it can help stop bots draining your ad spend — up to 8.5% of paid search clicks are driven by fake users like bots.

See how much ad spend you’re currently spending on invalid clicks with a 14-day no-obligation traffic audit from Lunio and learn how our IVT prevention solution can protect your site against web scraping and other bots.

Protect your online business with Lunio

Web scrapers pose a big threat to online businesses. eCommerce stores are often targeted, with scraping and botting activity peaking during the holiday shopping season. But any website can be affected at any time — so make sure your anti-scraping security measures are up to scratch.

Lunio has helped dozens of companies effectively reduce bot activity on their sites and ad campaigns, ultimately resulting in higher conversion rates and higher revenue.

Book your traffic audit and see how Lunio could help you protect your site and ad spend.

Say goodbye to wasted ad spend

Discover how Lunio can help you eliminate invalid ad clicks and maximize paid media performance

Get started

Say goodbye to wasted ad spend

Discover how Lunio can help you eliminate invalid ad clicks and maximize paid media performance

Get started

Stay in the loop

Boost your paid media results.
Subscribe for expert insights delivered to your inbox.