The copyright landscape around using third party data and materials to train generative AI models remains thorny territory. 

On the one hand, AI developers require vast amounts of high quality, often copyright-protected material to train their increasingly intelligent AI models. On the other, the widespread scraping and digestion of copyright-protected materials for AI training purposes (often without AI companies having obtained any licences or consents) has left creatives, publishers and other rightsholders up-in-arms. 

Back in June 2023, the UK Intellectual Property Office (UKIPO) embarked on a mission to develop a voluntary ‘code of practice’ on copyright and AI in an attempt to bridge this gap – with a central aim of “making licences for data mining more available” and helping to “overcome barriers that AI firms and users currently face, and ensure there are protections for rights holders”.  However, the attempts to broker a deal between creative industry stakeholders and AI companies failed, leaving the issue of copyright and AI training firmly unresolved. 

Parliamentary select committees have waded into the debate in recent months:

  • The Science and Technology Select Committee said that a key challenge was that “some AI models and tools make use of other people's content: policy must establish the rights of the originators of this content, and these rights must be enforced.”
  • The Culture, Media and Sport Committee reported on creator remuneration. It recommended that the UK government ensures that creators have proper mechanisms to enforce their consent and receive fair compensation for use of their work by AI developers and that it should be ready to legislate on the matter.
  • The House of Lords Communications and Digital Committee enquiry into Large Language Models report supports the fact that tech firms should not use copyright-protected works without permission or compensation, and that these firms should seek licences and create transparency for rightsholders. 

Various court claims have been issued in the US and UK to litigate this issue of copyright infringement and AI training. Getty Images filed a lawsuit in the UK and US against Stability AI (the company behind Stable Diffusion) back in February 2023 for the alleged unlawful copying of the Getty Image bank to train the AI system. Similarly, the New York Times launched a high-profile legal claim against OpenAI in the US in December 2023 for the alleged copying of its vast catalogue of online content, after negotiations broke down between the parties over a “fair market value” for license fees owed.

While the issues pass through the courts and the UK’s new Labour government grapples with how to square the need to protect the creative sector with a pro-innovation approach to AI – two separate organisations have written to major AI companies to assert that their members do not authorise the use of any of their copyright-protected works in relation to the training, development or operation of generative AI systems.

Creative Rights Alliance

The Creators’ Rights Alliance (CRA) acts on behalf of major creator-led groups, trade associations, and unions.  It has written to AI developers about the unauthorised use of its members’ works to develop generative AI models.

It says its members welcome new and innovative technologies. However, it is concerned that AI technology is accelerating and being implemented at pace, without enough consideration of issues around ethics, accountability, and the economics of creative human endeavour. 

It calls on developers of all, but especially generative, AI systems:

  • To provide full transparency about the works which have been used to develop their models;
  • To make detailed requests for any works they seek to use in future; 
  • To obtain authorisation (in advance) from the relevant creator and rightsholder, and where a rightsholder is licensing a catalogue of works to seek assurances that the creators of those works have specifically consented to the licensing arrangement;
  • To offer appropriate remuneration for all uses - past and future;
  • To give appropriate attribution to all creators concerned with the work, in all cases; 
  • To engage in good faith licensing negotiations to redress past practices, to remove from their systems any copyright protected works (including but not limited to literature, images, music, and performances) which has been used without authorisation (datasets and programs) and to show evidence of such removal; 
  • To respect the fact that creators and/or their representatives may, on ethical and/or economic grounds, withhold their consent to use their work.

The CRA also says that it encourages all creators to make use of the Intellectual Property Enterprise Court to combat infringement, and signposts them to the small claims track option, which should be accessible to individual creators running a small business. 

Society of Authors

The Society of Authors (SoA) represents writers, scriptwriters, illustrators, and literary translators. It has written to AI developers urging them to agree terms on a commercial basis with respective rightsholders, given that licensing opportunities exist and are being developed. It requests that AI developers:

  • identify the works which have been used to date to develop their AI models;
  • undertake to make suitably detailed requests for permission to use any of SoA members’ works in the future; 
  • undertake that, before using any copyright-protected works, AI developers will first obtain permission from the relevant rightsholder; 
  • undertake to pay appropriate remuneration for all uses of copyright-protected works, both past and future;
  • undertake to give appropriate attribution of the author of the work in all cases; and
  • undertake that, on request (whether general or in relation to a specific work), AI developers will remove any work which has been used without permission from their systems and will provide evidence of compliance.

Are these letters a ‘reservation of rights’ in the context of TDM?

The letters issued by the CRA and SoA may come as a result of Text-and-Data Mining (TDM) being at the heart of the ongoing debate over the use of copyright-protected works for the training of generative AI systems. 

TDM, which involves the automated crawling and analysis of large volumes of text and data from the internet, is a critical component of AI development. Developers of the most advanced AI tools on the market have carried out vast amounts of TDM, relying heavily on repositories of web crawl data as a means of training the Large Language Models (LLMs) that underpin their clever AI machine-learning tools. 

Under EU law, Article 4 of the Copyright in the Digital Single Market Directive 2019 (DSM Directive) permits acts of “reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining” (not limited to any particular purpose) except where rightsholders have expressly reserved their rights in an appropriate manner.

The coordinated move by these two organisations may serve as an express reservation of their members' rights under Article 4 of the DSM Directive – asserting that their members do not authorise the use of their works for TDM by AI companies without explicit permission, thereby reinforcing the importance of obtaining licences and ensuring fair compensation.

Conclusion

This development highlights the growing tension between the need for vast datasets to fuel AI innovation and the rights of creators to control and benefit from the use of their works. As legal proceedings continue and policymakers grapple with these issues, it is clear that a balanced approach is needed—one that fosters technological advancement while safeguarding the rights and livelihoods of creators. The letters from the Society of Authors and the Creative Rights Alliance serve as a reminder that the creative community is vigilant and prepared to defend its rights in the evolving digital landscape.