Software & Technology
Web scraping (or data scraping) is more prevalent than you think. It is estimated that more than 50% of all website visits are for data scraping purposes. This is why users are often asked to go through a series of tests to prove they are not an unwanted bot. There are plenty of new businesses with large datasets or web scraping capabilities which look attractive to investors given the nature of online marketing and the appeal of tools which offer businesses new innovative ways to collect and process data. Being aware of the legal issues is of paramount importance before becoming involved with, or setting up, such businesses. This involves being aware of licences to datasets and possible infringements of database and intellectual property rights.
What is web scraping?
The process of using software to harvest automatically, or scrape, publicly available data from online sources. It has many purposes including recruitment, sentiment analysis, assessing credit risk, identifying trends, marketing and sales. It is also something permitted to certain extents under bespoke licences. In the public sector datasets often operate under the Open Government Licence (OGL), inspired and re-highlighted by an EU directive, the INSPIRE directive (2007/2), which required public authorities to make spatial information datasets publicly available.
In the news
Elections in Brazil have made an example of how marketing companies could potentially abuse web scraping software. It was alleged that political parties used software to gather phone numbers from Facebook which were then used to create WhatsApp groups and spread fake news. Brazil’s electoral court are to investigate whether this undermined the legitimacy of the elections.
In the UK, the investigation of Cambridge Analytica and Facebook by the Information Commissioner’s Office (ICO) has put data scraping under public scrutiny. Facebook were fined a maximum £500,000 for two breaches of the Data Protection Act (UK) 1998 for not adequately safeguarding users’ personal data. When reflecting on the investigation, Elizabeth Denham, the UK information Commissioner, called for an “ethical pause” to allow Government, Parliament, regulators, political parties, online platforms and the public to reflect on their responsibilities in the era of big data before there is greater expansion in the use of new technologies.
Businesses should therefore consider what the legal implications may be if they intend to scrape data. If operating under a licence to scrape data, a business should understand the scope of such licence and, if personal data is involved, whether the activity complies with data protection laws. If no licence exists then scraping data may infringe copyright and database rights. If the website you wish to scrape has an acceptable use policy or other similar terms and conditions attached to it, the chances are that any scraping activity will breach that policy or conditions.
A recent case in the UK has explored the extent of licences and database rights when applied to web scraping.
77m Ltd v Ordnance Survey Ltd [2019] EWHC 3007 (Ch)
The high court found a geospatial address dataset creator liable for database right infringement and in breach of a number of licences.
The claimant, 77m, created a dataset called Matrix of the geospatial co-ordinates of all residential and non-residential addresses in Great Britain, for which it wished to sell access. It had created Matrix by combining large amounts of data from various datasets. The data at issue derived from the defendant, Ordnance Survey (OS). 77m did not contract with OS but with Her Majesty’s Land Registry (HMLR) and Registers of Scotland (RoS). It also accessed data including addresses and geospatial co-ordinates made public by Lichfield District Council (LDC) under the Open Government Licence (OGL) (Lichfield data). HMLR, RoS and LDC licensed the relevant data from OS.
Before looking at database rights, the court had to decide whether 77m had acted within the terms of the licences; if they did, then 77m’s activities in relation to OS’s datasets would be shielded from database right infringement claim; if they did not, then 77m would remain exposed to the infringement claim.
77m had extracted data under the terms of a number of licences. It was found that in many instances 77m had gone beyond the behaviour permitted by the licences. Under the OGL the court deemed the use of publicly available data to create software which was not then sold or included in the software itself, lawful. In most instances however 77m’s use of the data to specify geospatial co-ordinates was in breach of the licences.
The court then went on to see whether 77m’s activity infringed database rights. Firstly it was critical to access whether or not the database in question was subject to such rights. The Database Directive (EU), implemented in the UK in 1997, states that protection shall be granted to the maker of a database who shows that there has been qualitatively and/or quantitively a substantial investment in either the obtaining, verification or presentation of the contents. The court ruled that Ordnance Survey clearly had made such an investment when putting the database together. The High Court judge, Mr Justice Birss, specifically pointed to the investment that went into verifying new addresses as they came into Ordnance Survey’s database which in recent years had an operating expenditure of £6 million per annum.
The way in which 77m used the database was then put into question. The important distinction here is between extraction or consultation of the data within the database. Where extraction would be an infringement of database rights. Some muddled case law coming from the ECJ made the question laborious. Put simply consultation has been defined as being limited to a person merely reading data on a screen, where the only possible other medium to which the data was transferred was the person’s brain. Whereas extraction would be transferring data to a medium other than the person’s brain such as downloading the data onto your own computer.
Therefore 77m’s use of data on such a vast scale and for commercial purposes was always going to amount to an extraction and thus an infringement. The court made clear, however, that in some instances data could be consulted for a commercial purpose. But a user who took all or part of a database’s contents and transferred them to another medium so that they could use them, appropriated to themselves a substantial part of the investment that went into creating the database and was therefore clearly in breach of database rights. Database rights are not only about protecting the data but also about the work that went into compiling the data and synthesising it.
This case highlights the need to be aware of licences a company has in place to use data, the scope of such licencing and if there is no licence, or the licence has been breached, if database rights could protect the database owner.
Web scraping things to consider
Below is a list of things to consider before you scrape data or before you buy a business that has been scraping data:
- Check the scope of the licences to scrape data, and to store and use that data.
- If there is no licence in place then a business should consider whether the scraped data is subject to copyright and/or database rights.
- If no licence exists you could then also check the website’s acceptable use policy and/or term and conditions. If they explicitly forbid scraping or contain other content restrictions this may enable the website owner to sue under breach of contract. Although there is no clear precedent on whether website terms and conditions form binding contracts in the UK, it is worth assuming they could be. The Irish High Court recently ruled that such terms and conditions could form a binding contract. Even if there is no acceptable use policy and/or terms and conditions, it should be noted that such a website may still be subject to copyright and/or database rights.
- Check whether the target business you want to purchase uses a third party to scrape or store data and, if so, their contractual arrangements.
- Legal positions differ by country, even between European countries. This is important to be aware of especially when storing data from one nation and making it available to another.
- Check if personal data is involved and therefore if GDPR / Data Protection Act 2018 / other data protection laws are applicable.
The US perspective on Web Scraping
A recent case involved LinkedIn and HiQ, a small data analytics company that used automated bots to scrape information from public LinkedIn profiles. The Ninth Circuit Court of Appeals ruled in favour of HiQ implying that data scraping of publicly available information from social media websites is permitted. LinkedIn have expressed intent to escalate the case to the supreme court and therefore the law may still be amended.
In the US, similarly to the UK, data scrapers may find themselves on the receiving end of legal action under the following regimes:
- Intellectual property: Scraping data from websites may infringe intellectual property rights. In 2013 a Federal Court ruled that a software as a service company, Meltwater U.S. Holdings, which offered subscribers access to scraped information about news articles had been acting illegally. Such companies are often referred to as ‘news aggregators’. The news provider, whose data had been scraped, sold licences to many companies and without one, when copying 0.4% to 60% of each article, Meltwater was deemed to have had ‘substantial’ negative effect upon the potential market or the value of the copyrighted work. Therefore getting a licence before scraping data in the US is advised. As mentioned above in the LinkedIn v. HiQ case though it may still be possible to scrape publicly available information from social media sites without a licence.
- Contract: In the US, if a website user is bound by the Website’s terms of service and causes damage by breaching those terms, the user may be liable for breach of contract.
- The Computer Fraud and Abuse Act: This provides a civil cause of action against anyone who accesses a computer without authorisation, as well as providing for criminal offences. Although courts have come to differing conclusions, it has generally been ruled that if a scraper uses technical steps, i.e. specialised and complex methods, to circumvent protections to data on websites then the scraper can become liable under the act.
- Data protection: The US does not currently have comprehensive data privacy legislation at the federal level. On the state level there are plenty of statutes that mandate certain privacy-related rights, but most do not broadly regulate the collection and use of personal data. This is not always the case. California recently passed a state law which regulates data privacy. Coming into effect in 2020, it requires certain companies collecting personal data to disclose how such data will be used and allow consumers to opt-out of data collection. Data scrapers who collect such personal data in California could therefore be found liable when not disclosing the use of such data and allowing an opt-out option.
Final Thoughts
Most business aren’t in the business of web scraping – most business owners or directors aren’t even aware of what web scraping is. However, it’s something to be aware of. Maybe with this awareness you now want to make sure that your website has an acceptable use policy or other security measures in place. If you buy data you should think about how that data was collected. If you are buying a business you should include checks in your due diligence and appropriate warranties in the share purchase agreement to protect yourself from buying a business that collected data unlawfully.
If you have any questions on the points raised above please contact one of our technology lawyers.