28 May 2021
Everyone needs data these days. If you're reading this article, it's a good bet that you're a CEO of an established company, a starry-eyed entrepreneur boot-strapping a start-up, or perhaps even a journalist.
Regardless of what type of business you run, or what information you're looking for, it's all about data.
Of course, that's where "web scraping" comes into the picture. In the off chance that you don't know what web scraping is, essentially it's all about acquiring information from somebody else's website.
For example, maybe your competitor has juicy info just sitting there on their website and you want it. You therefore, use a web scraping tool and you make like a bandit. Also known as spidering or crawling, web scraping has been used by many companies in their market intelligence, marketing, and lead generation activities.
Indeed, many companies use third-party services that use various web scraping tools to build databases. These third parties then sell the data they've gathered to those who need various data sets.
Now, legal issues have developed around web scraping because, hey, some businesses don't appreciate having their data scraped. Owners of businesses that have been scraped worry about things like copyright infringement, fraud, breach of contract, trade secrets being stolen, and more.
Check out our free tools for website owners:
Perhaps you worry about those things too, or maybe you want to take advantage of web scraping technologies and services but want to ensure you're on the right side of the law.
In the following article, we'll take a look at web scraping laws, why they're important, and how your website's Terms and Conditions agreement (T&C) can limit the web scraping activity of others.
First thing's first. You might be asking yourself if web scraping is even legal. You might be tempted to say "no" considering the fact that others just come along and take information straight off your website. Alternatively, you might knee-jerk a "yes" response since some companies obviously engage in spidering.
The short answer is that the entire subject is a little gray. Up until 2015, companies were getting away with crawling their competitor's sites without a hitch. However, then the Irish airline Ryanaire went to court over alleged "screen-scraping" of its website.
In fact, the issue went all the way to the Court of Justice of the European Union (CJEU), Europe's highest court.
Essentially, what happened is that PR Aviation, a company that does price comparisons for low-cost airlines, and that depends on data acquired through screen-scraping information, which is publicly available, was accused of scraping Ryanair's website.
The airline sued PR Aviation for breach of its website T&C and for infringement of its database rights under the Database Directive. Further, Ryanair sought for PR Aviation to pay damages and to have the court order PR Aviation to cease and desist from further infringement.
3. Permitted use.
You are not permitted to use this website (including the mobile app and any webpage and/or data that passes through the web domain at ryanair.com), its underlying computer programs (including application programming interfaces ("APIs")), domain names, Uniform Resource Locators ("URLs"), databases, functions or its content other than for private, non-commercial purposes. Use of any automated system or software, whether operated by a third party or otherwise, to extract any data from this website for commercial purposes ("screen scraping") is strictly prohibited.
Fast forward to 2020. The U.S. 9th Circuit Court of Appeals ruled on September 9 that the U.S.'s Computer Fraud and Abuse Act (CFAA) is not violated when a company scrapes public websites.
In other words, while there may be some limits in some regions that can be placed on scraping activity through a company's T&C Agreement, the U.S. court essentially ruled that it's not "theft" if a company scrapes information such as, product lots,open user profiles, ticket prices, etc.
The reason for this is that the scraper bot isn't any different from a legal standpoint than your web browser. Both request open data from the website, and both do something with that data on their side. As long as the data is publicly available on the site (i.e., you can see the data when browsing the site) then it's legal to scrape it.
Even though it's completely legal to scrape publicly available data, there are two types of information that you should be cautious about.
Copyrighted data is owned by businesses or individuals who have full control over its use and reproduction. This type of data may include:
Just because this type of information may be easily available online, it doesn't mean that it is free for anyone to use. In fact, if it's copyrighted then it's illegal to use without the express permission of the owner.
What this means is that while it isn't illegal for you to scrape and gather copyrighted material per se, if you use that information, it certainly might be. Remember that specific laws in various countries are not entirely the same on this issue.
For example, in some places you may be able to use parts of the copyrighted data you've scraped, while in others you won't be able to use any of it at all.
Otherwise known as Personally Identifiable Information (PII), this is a subject that's now covered in-depth by many data privacy and protection laws including Europe's General Data Protection Regulation (GDPR) as well as those of many states across America.
PII includes any information, which might be used to identify a specific individual. Examples of this type of information include:
Under most laws, PII is illegal to collect, use, or store without the owner's explicit consent. (Sometimes there are legal exceptions.)
When it comes to web scraping, you won't be able to obtain an owner's consent for collecting their data. Because you don't have a legal right to collect PII without the owner's consent, scraping that data is essentially illegal. Therefore, it's now a best practice to ensure that when scraping a website, you leave PII alone.
As noted above, there are some stiff penalties under the GDPR and the CCPA for illegally collecting PII. Of course, there are other issues involved and as stated previously, different regions have a few different rules. Let's take a look at them.
The GDPR became enforceable in 2018 and it applies to the use of PII of residents within the European Economic Area (EEA). It's worth noting that the GDPR's web scraping regulation doesn't cover data, which has been anonymized.
Essentially, the important thing to keep in mind is that the GDPR has regulations covering the protection of PII when it is acquired by data controllers and then passed to data processors. (This includes the cloud.)
If there is a data breach, the GDPR requires that both consumers and data authorities be notified. Businesses must specify the precise nature of the breach, take steps to mitigate the breach, and must specify the amount and categories of information compromised.
All companies, no matter where they are located in the world, are subject to the GDPR if they collect the PII of EEA residents. There are no exceptions.
The bottom line?
You cannot legally scrape the websites of companies in the EEA for PII.
You might be tempted to skip for joy if you plan to scrape the websites of businesses in America since the USA doesn't have any one set of federal privacy laws. However, you might not want to dance a jig just yet.
The U.S. does have a patchwork of various state laws, some of which are being looked at as a sort of proof of concept for federal use by the United States Congress. For example, California's CCPA was mentioned above. There are also a few consumer-oriented federal laws for health care like the Health Insurance Portability and Accountability Act (HIPAA) and for finance, such as The Gramm-Leach-Bliley Act of 1999 (GLBA).
It's a bit difficult to compare European laws to American ones when it comes to web scraping and determining if the practice is legal. In Europe, data security laws are combined with the GDPR, while the U.S. has never passed a federal consumer privacy law. In America, the individual states have tried to fill the federal void.
California's CCPA is the most comprehensive, internet-focused law in America and contains a laundry list of what constitutes PII. For example, the CCPA lists browsing history, geolocation, biometric data, email, and employee information as PII.
Other states haven't passed nearly the sort of restrictions that California has although they are slowly moving to put their own privacy protection laws in place.
With that said, the CCPA and the GDPR permit individuals to opt-out of data processing. They can also remove, or access their data at any time. One difference is the fact that Californians can't correct inaccurate information. Another is that while the CCPA asks for privacy notices on all websites, the GDPR requires explicit user consent.
Outside of everything mentioned so far, it's worth noting that in addition to worries over copyright infringement or illegally acquiring PII, there're also issues some bring up like breach of contract.
After all, does the use of automated software to scrape a website violate a business's T&C? (It does if the T&C is enforceable and it specifically prohibits website scraping.)
As well, in both the U.S. and the UK website operators might try to bring a common law tort, such as trespass to chattels. An example of another law that some might try to use to prohibit scraping include the UK's Computer Misuse Act of 1990 that prohibits modification of, or access to, unauthorized computer materials. (It's worth noting that this has never been attempted in connection with web scraping.)
Despite some obvious limitations, you might still want to add web scraping restrictions to your website's T&C. When you do, make sure that your language is specific so that you can prohibit any third party from scraping your website's information and then using it for their own commercial purposes.
Also, you'll need to ensure that your T&C is actually enforceable. Terms are normally enforceable when both parties agree to them. However, different courts use various criteria when determining if an agreement in reality exists.
For example, some courts may decide that when a user is merely notified that using a website constitutes an agreement to its terms, is enough. (This is usually done through a "browsewrap" agreement.) However, many agree that you'll have a stronger case if you use a "clickwrap" agreement that requires your site's users to explicitly agree to your T&C before continuing on to use the website.
Although many still see the legality of web scraping as a gray area, there are some things that are no longer in question. If the information scraped isn't protected by a login, it's legal to scrape. (Keep in mind that using that data once you've scraped it may not be legal.)
Before scraping a website, you should always check that site's T&C to ensure that you won't be in breach of contract if you do scrape it.
On the flip side, if you don't want to have your data scraped, then you need to have specific protections written into your website's T&C. Additionally, you should use a clickwrap agreement to ensure that website visitors explicitly agree to your T&C Agreement.
This article is not a substitute for professional legal advice. This article does not create an attorney-client relationship, nor is it a solicitation to offer legal advice.