Scarlett Johansson vs. AI: OpenAI ‘recreates Scarlett Johansson's voice’, Georgia Davies, Joanna De Fonseka, Adrian Aronsson-Storrier

Scarlett Johansson has said she was "shocked" and "angered" in a statement released after OpenAI allegedly mimicked her voice without her consent for a new ChatGPT system.

In September of last year, Sam Altman, CEO of OpenAI, reached out to Scarlett Johansson's team to ask if she would provide an additional voice for their ChatGPT generative AI system. Johansson declined for “personal reasons”. Later that month OpenAI introduced text-to-speech voice chat capabilities into ChatGPT, with 5 different voice options, including a voice named ‘Sky’.

Fast forward 9 months to May 2024 when ChatGPT's 4o demo was announced and released in a livestream from OpenAI. The improved ‘Sky’ voice model in ChatGPT-4o showcased in the demo “sounded so eerily similar to [Johansson] that my closest friends and news outlets could not tell the difference”, Johansson commented.

On the same day of the ChatGPT-4o release, Altman posted the word “her” on the platform X (formerly known as Twitter). This was widely understood as a reference to the 2013 film ‘Her’ in which Joaquin Phoenix’s character forms an intimate relationship with an AI virtual assistant programme voiced by Johansson. Altman’s social media post fuelled speculation that the improved ‘Sky’ voice model had been deliberately chosen to sound like Johansson.

Johansson has stated that Altman contacted her agent again to ask her to reconsider voicing the system two days before its release. The demo was released before Johansson's team could respond. OpenAI have since paused the use of the ‘Sky’ voice in their products and have released a statement claiming that “Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice”. OpenAI said that it had “cast the voice actor behind Sky’s voice before any outreach to Ms. Johansson.”

Here, we consider the legal issues arising from this dispute, including whether the commercial release of a “soundalike” voice model may breach UK data protection or intellectual property laws.

Data protection

Personal data is defined in the UK and EU General Data Protection Regulation (‘GDPR’) as any information relating to an identified or identifiable natural person. In other words, if a person can be identified, directly or indirectly, from the information then it will be personal data. If information is personal data, it is afforded certain protections under the GDPR.

Some information is relatively straightforward to class as personal data, such as a person’s name, address or bank details. However, something more nebulous like a person’s voice can be harder to classify, even more so when it has been created using AI.

Speech vs. voice

An oral speech given by a person, either live or a recording of the speech, will usually constitute personal data, either by:

the speaker introducing themselves at the beginning, meaning the whole speech relates to the identified individual and so is subject to the GDPR; or
elements within the speech, which when linked with other information, mean it's possible to infer the speaker's identity.

Can a voice without any metadata (no author, no identified speaker, no name) constitute personal data? Further, when we’re speaking about voice, do we mean the concept of voice in the abstract, a specific vocal recording of someone’s voice or the characteristics of someone’s voice?

It may make the most sense to think of the voice as the characteristics of someone’s voice in the context of the alleged recreation of Scarlett Johansson’s voice by OpenAI. Research into inferences that can be made about an individual from how they speak are wide ranging, including biometric identity, native language, body measures, age and gender, personality traits, physical and mental health, socioeconomic status, sleepiness, intoxication and even intention to deceive.

It seems safe to say that if you can infer the types of information above just from hearing it, the voice is a type of personal data. Depending on how it is used, processing of voice data may also amount to processing of biometric data or special category biometric data (current ICO guidance clarifies that while not all biometric data is special category data, it will be special category data if used to uniquely identify someone). Processing of such data is subject to more stringent controls under the GDPR, due to the fact it is unique to the specific person and the potential risks if it were to be misused.

We note the current definition of biometric data under Article 4 of the GDPR does not expressly include or exclude the voice:

"… personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic [fingerprint] data.”

Through comparison with other biometric identifiers like fingerprints which are wholly unique to a person, the voice is likely through analogy with other unique identifiers to comprise biometric data.

When is a person’s voice not personal data?

Under Recital 26 of the GDPR, there are two measures controllers can take to mitigate the risks associated with processing personal data; pseudonymisation and anonymisation. Pseudonymisation involves replacing or removing certain information within the data that is capable of identifying the individual. However, pseudonymisation is seen only as an extra layer of protection; personal data that has been pseudonymised is still considered personal data, and remains within the scope of the GDPR. On the other hand, if data is anonymised to the point that it does not relate to a identified or identifiable natural person, then it will no longer be considered personal data and the GDPR will not apply to it. “Anonymisation” refers to ensuring the data cannot be linked to an identified or identifiable person anymore; how this is done will differ depending on each case individually and the specific circumstances. As long as altered sufficiently to not relate to an identified or identifiable natural person, using voice data in this way would be permissible, though do note that the process of anonymising the data itself amounts to processing personal data.

Further, true anonymisation, with the effect of the data no longer being personal data, is difficult to achieve in practice. A voice is unique to the individual and even with all direct identifiers (like names or addresses) removed, it is likely to still be considered personal data on the basis that people are often recognisable from their voices alone. Removing all direct identifiers will have the effect of pseudonymising the data and so reducing the risk of processing, but by its nature, it will be very difficult to fully anonymise a voice.

It is also worth noting that personal data is only defined as information relating to an identifiable living person. Information relating to anyone deceased is no longer considered personal data, and so is not subject to data protection law, at least from a UK perspective (although in some other jurisdictions, local laws do contain specific requirements for data relating to deceased individuals; the French Digital Republic Act allows data subjects to provide guidelines regarding their data in the event of their death i.e. for it be erased).

While there are still ethical considerations and reputational risks associated with the use of deceased individuals’ data for commercial purposes, this creates a potential gap in data protection law where, for example, the voices of deceased people can be utilised by generative AI providers to create “replicas” of deceased people, which with the advance of AI will be more and more convincing. Such digital clones have already been used in Hollywood, with ongoing litigation concerning rights in relation to the likeness of Star Wars actor Peter Cushing, which we have discussed here.

While exciting for fans of celebrities now passed or even in some cases grieving loved ones, it raises predictable ethical dilemmas of someone's vocal likeness being used without consent, the possibility of impersonator scams by bad actors, and bigger existential debates of whether it is in fact desirable for a facsimile of someone to live on after death.

Intellectual Property

We have previously discussed that UK intellectual property law does not provide a free-standing general right for a person to control the reproduction or replication of their voice or vocal characteristics. Instead, someone seeking to prevent an unauthorised replication of their voice will have to piece together protection from a patchwork of laws, including IP laws such as passing off, copyright or performers’ rights or through other forms of protection such as defamation.

For example, in the UK the doctrine of passing off could be used by celebrities with significant reputation or goodwill in their voice to prevent an AI generated replication of their voice or a sound-alike imitation being used without permission to endorse or advertise a particular product. While there have not been any decided UK cases on using passing off to protect a voice, to succeed a celebrity would likely need a distinctive voice. They would also need to be able to show that by mimicking or replicating that voice there has been a misrepresentation which has deceived the public into thinking that there is a connection between them and the defendant which has caused the celebrity damage. The requirement for goodwill to succeed in a passing off action is likely to restrict the availability of this form of intellectual property protection to celebrities or others with a sufficiently high public profile.

While the law of passing off does not exist in the same form in the US, there have been a handful of cases in the US where singers have been successful in bringing similar actions against companies who have used soundalike vocal performances in the context of advertising and false endorsement. For example, in the 1980’s the Ford car company approached singer Better Midler to use a recording of her performance of the song ‘Do you wanna dance’ in an advertisement. Midler refused, and Ford hired a soundalike singer who re-recorded the song for the commercial. Midler successfully invoked US publicity law in that case to protect her voice’s unique sound. Similarly, in the 1990’s singer Tom Waits successfully sued chip company Frito-Lay and their advertising agency for their unauthorised use of a soundalike vocal impersonator for a radio advert with similarity to Waits’ voice and song “Step Right Up”. As we have discussed, last year also saw singer Rick Astley seek to extend the US protection for voices beyond the context of advertising, bringing an action for vocal impersonation against rapper Yung Gravy for the recreation of Astley’s voice on the track ‘Betty (Get Money)’, however that dispute was settled before trial.

It is generally accepted that voice or vocal characteristics are not protected directly under UK copyright law. While copyright protection was recently extended to the protection of a TV programme character from the show Only Fools and Horses, it is perhaps unlikely that vocal characteristics alone would meet the copyright requirements of being “the author’s own intellectual creation” and being fixed in a form which makes those characteristics “identifiable with sufficient precision and objectivity”. More complex issues arise under copyright and performers’ rights law where an AI tool has been trained on actual recordings of a celebrity’s voice, which are then used to create an AI model that can imitate the vocal characteristics of that celebrity. There is current litigation in the UK between Getty Images and Stability AI which considers whether training an image generation AI model is consistent with UK copyright law, and the eventual outcome of that case may provide further clarity on copyright law and training AI models.

I have no voice and I must scream

The use of AI vocal likenesses is a concern raised by artists who rely on their voice for employment. For example, last year we wrote about the “Stop AI stealing the show” campaign by Equity, the UK performing arts and entertainment trade union, and also covered the AI voice agreement made on behalf of professional voice-over artists by the US SAG-AFTRA organisation with AI voice technology company, Replica Studios.

More recently, the state of Tennessee in the US passed a law considered to be the first of its kind, which is geared at deterring the unauthorised use of an individual's voice by AI. The Ensuring Likeness Voice and Image Security (ELVIS) Act will come into effect on 1 July, and updates Tennessee’s Protection of Personal Rights law to include protections for songwriters, performers, and music industry professionals’ voice from the misuse via AI systems. That Act explicitly protects both the sound of an individual’s actual voice and a simulation of their voice; and the protection extends both during the individual’s lifetime and for 10 years after their death. The Act is notable as protection extends to protect both the commercial exploitation of voice as well as unauthorised non-commercial uses of an individual’s voice.

The Sky voice was of course not Johansson's voice. However, it does raise the question that, whether the voice was purposefully made to sound like Johansson or not, if it mimics her voice to a high enough degree that the consensus is it could be her voice, should personal data protections apply? And what would the limits be in such cases?

Summary

With the increased availability of AI tools able to create realistic vocal imitations from short recordings of speech, the pressure for legislative attention to the issue is likely to mount, especially for the next UK government after the election this July. Labour, the favourites in the race currently, have confirmed that they will reconsider how to regulate AI if they win.

It’s worth noting that current leader of the Opposition, Sir Keir Starmer, also fell victim to AI’s increased capabilities; two deepfake audios of Starmer were released on the first day of Labour Party conference back in October 2023. With the threats AI generated vocal imitations pose to different walks of life becoming clearer, it will remain to be seen how and whether new legislation can tackle these risks. As Johansson herself has recently stated, the dispute highlights our vulnerability to AI vocal impersonation and that she is “supporting the passing of legislation to protect everybody's individual rights”.

Humanoid AI robot working at the radio station studio, artificial intelligence and entertainment concept

unknownx500

Contributors

Lewis Silkin - Technology Archive

Scarlett Johansson vs. AI: OpenAI ‘recreates Scarlett Johansson's voice’