Data Scraping – Navigating the Challenges Old and New (Part 2: Intellectual Property) - EM Law

October 31, 2023

Contract Law

Data Protection Law

Intellectual property

Software & Technology

Colin Lambertus and Neil Williamson were featured in OneTrust’s DataGuidance, writing an article on ‘data scraping’ – also known as web scraping. That article was a condensed version of a larger, two part series, that addressed the two key legal aspects of web scraping: data protection and intellectual property. Part 1 considered the position from a data protection perspective. Part 2 considers the intellectual position, building on the research carried out for OneTrust’s DataGuidance.

All writing, images, and other forms of intellectual creativity attract intellectual property protection at varying degrees. In some ways, the internet can be considered to be the main repository of human knowledge. Accordingly, the legal protections for intellectual creation that existed well before the invention of the internet have been carried over into the Information Age. This poses a problem for potential scrapers.

Copyright and web scraping

At the most basic level the normal rules of copyright protection apply to the content that is scraped. An arrangement of words or individual images, for example, are works in which copyright may subsist on a standalone basis.

Further removed is the copyright protection afforded to tabular arrangements, compilations, and (separately) databases. A significant amount of data on the internet is presented in this way. From a UK perspective, the Copyright, Designs and Patents Act 1988 (CDPA) provides that a ‘literary work’ (in which copyright may subsist) can include a ‘table or compilation’ (s. 3(1)(a) CDPA). The normal rules apply – namely that the table/compilation must be ‘original’, fixed within a physical/electronic medium, and be created in or be for the UK. This would cover an example like a curated catalogue of artworks. The table/compilation would not, however, mean that there is an additional layer of copyright protection in the names of the paintings included in the table (unless copyright attached to those names independently). The work protected is the arrangement.

For the purposes of data scraping, trawling a website to extract names of paintings for a specific artist, that have not been arranged in any way, would not infringe copyright, unless there is copyright in the names themselves.

EU legislation arguably enhanced protections against web scraping. The Database Directive (96/9/EC) the Directive) provided for the legal protection of databases. The Directive harmonised database rights in copyright across the EU and created a separate distinct type of intellectual property over databases. The Directive was implemented in the UK as The Copyright and Rights in Databases Regulations 1997 (the Regulations).

In copyright, the CDPA was thus amended (s. 3A CDPA) was thus amended to afford copyright protection to databases as a literary work.

Databases are collections of electronic data that are arranged in a ‘systematic or methodical way’ where the selection of the data itself constitutes an intellectual creation. Like with tables/compilations, it is the structure of the database that attracts copyright protection, the contents may or may not hold their own copyright.

The difference between a table/compilation and a database in copyright law may at first glance appear opaque. The UK Courts have, in Forensic Telecommunications Services Limited v The Chief Constable of West Yorkshire Police [2011] EWHC 2892 (Ch) at [89] – [94] however provided a helpful explanation of the difference. A table/compilation is an independent work planned to an overall design that required the application of skill, judgement, and labour. Whereas a database is a collection of electronic data that has been actively selected according to a system. Data that is collected ‘by happenstance’ (or as a pure mechanical, automatic process without substantive thought), will not be a database for the purpose of copyright.

What does this mean for web scraping? Regarding the risk of copyright infringement, you may believe that the risk is low. If your web scraping activities do not copy how the data collected was originally arranged and the information that you collect is not of itself afforded copyright protection, then you would not be infringing copyright. Further, it is to be noted that the CDPA (s. 29A CDPA) provides an exception for ‘text and data analysis for non-commercial research’. In other words, copies of the table/compilation or database would not infringe copyright if there was a research purpose to that copying. This may be of use for scrapers that are not commercially exploiting that database.

However, you would need to approach your activities with caution to check that this was the case, and this analysis may not be straightforward.

Unfortunately for potential scrapers, copyright is not the sole intellectual property right in play. The EU has gone still further to provide protection to databases.

Sui generis database right and web scraping

Copyright is relevant where there has been real intellectual labour put into the table, complication or database. But it does not protect the contents of the database (other than such content that in itself is afforded copyright protection). The Directive was undoubtedly meant to address this partial gap, and grant intellectual property rights to the data within and only as a part of a database. This sui generis (unique) right, referred to in the Regulations as a “database right”, may step in where copyright does not (and it may co-exist with copyright).

definition of a database for the purpose of the database right is the same as in respect of copyright. Likewise, there is also a requirement for the creator of the database to be within the UK (or EU state). Database right subsists where there has been a ‘substantial investment in obtaining, verifying or presenting the contents of the database’ (Reg. 13(1) of the Regulations). The key point here is that the ‘investment’ (in time, money, and/or materials) must be in the creation of the database and not in the creation of the data itself. The relevant English cases here are usually around sports and betting.

In one case, Football Dataco Ltd v Sportradar GmbH [2013] EWCA Civ 27 a host of (well paid) football analysts attended live matches and fed relevant points to other employees that in turn entered that information into a database, this gave rise to a database right.

In another case, British Horseracing Board v William Hill Organisation Ltd [2005] EWCA Civ 863, conversely, resources expended in drawing up a list of horses to run a race did not count towards the resources used to publish a final list of horses approved to run (which were minimal).

Required resources must be spent during the in-between stage of creating the data and its inclusion in the database for the contents of that database to be qualify for a database right.

It is to be noted that the EU has implemented additional legislation that may be of assistance to web scrapers – the Digital Copyright Directive ((EU) 2019/790). This directive replicates the ‘text and data analysis’ for non-commercial research exception to copyright infringement under the CDPA, but it goes further in that ‘extractions’ from databases for ‘text and data mining’ purposes (including commercial purposes) is lawful unless the rights holder reserves its rights in the database to prevent such data mining (in its terms and conditions or website metadata). It is to be noted that text and data mining means an automated technique to generate information (such as identifying patterns, trends, and correlations). Accordingly, the web scraper’s activities may not automatically fall under this exception. Straightforwardly copying and reproducing the database elsewhere, for example, would likely remain an infringement of intellectual property rights. It is noteworthy that the UK declined to implement this particular exemption into law. The concern was that it would give too much scope to scrapers looking to infringe creative works. In this respect the UK position is out of step with the EU.

Contractual position?

It can be seen from the above that it is less than clear whether an intellectual property right may or may not arise in respect of an electronic collection of data.

For potential web scrapers, this lack of clarity may appear helpful. But the reality is that web scrapers accessing public databases without a private licence in place will almost certainly be operating in the dark around how the information was created. Without this knowledge, it will always be a risk.

There are some narrow legal exceptions allowing web scrapers to, in addition to the research exception under the CDPA and the exceptions under the Digital Copyright Directive. That discussion would take up another article.

So what is the position if no intellectual property rights exist?

Potentially, a website operator can include provisions in its contract with users of its website to prevent them from web scraping. However, as to whether such provisions would be enforceable if they were included in terms published on a website but which the web scraper does not explicitly or implicitly agree to is, according to the European Court of Justice, a matter for domestic law – Ryanair Ltd v PR Aviation BV (C-30/14).

The Courts in the UK have not clearly ruled on this issue (although it is possible, depending on how the website is set up – Ebury Partners Belgium SA/NV v Technical Touch BV [2022] EWHC 2927. The EU position is similarly fact specific.The EU position is similarly fact specific – Content Services Ltd v Bundesarbeitskammer (C-49/11).

Web scraping – guidance

While the legal position as it pertains to intellectual property has been addressed – it remains murky for both scrapers and database owners. The issues will be highly fact specific; and the potential protection or exploitation of a database will require rigorous analysis.

Web scrapers would benefit from clarifying with a potential database owner how they might be able to use a web scraping tool lawfully. Despite the forgoing analysis, when a database owner is not actively attempting to restrict data scraping there are mechanisms that can enable lawful conduct on the part of the web scraper.

Database owners should implement clear policies around web scraping, potentially amending their existing website terms to permit or prohibit web scraping.

If such owners think that they may qualify for a database right, the best thing that can be done is to document in extensive detail the investment into creating the database. Being able to demonstrate this to a Court in any dispute will assist a business in protecting from web scrapers.

Data Scraping – Navigating the Challenges Old and New (Part 2: Intellectual Property)

Copyright and web scraping

Sui generis database right and web scraping

Contractual position?

Web scraping – guidance

Further Reading

Data Scraping – Navigating the Challenges Old and New (Part 1)

InsurTech – Big Data, AI and Web Scraping

Web Scraping – Legal Issues

Contact Us

Data Scraping – Navigating the Challenges Old and New (Part 2: Intellectual Property)

Copyright and web scraping

Sui generis database right and web scraping

Contractual position?

Web scraping – guidance

Further Reading

Data Scraping – Navigating the Challenges Old and New (Part 1)

InsurTech – Big Data, AI and Web Scraping

Web Scraping – Legal Issues

Contact Us

Sign up for our newsletter.