While seventeen American writers represented by the Authors Guild, including Jonathan Franzen and John Grisham, filed a complaint in September against OpenAI and its conversational agent, ChatGPT, for copyright infringement, an English team from the Imperial College London has found a way to detect whether a literary or scientific text was seen by a language model when it was being edited.
As a reminder, a language model is software allowing a chatbot to respond or interact with a human in natural language: the conversational robot makes grammatically correct sentences, adapts its style, creates original utterances, etc. These capabilities are obtained by a fairly “brutal” learning method which consists of making it guess the next word in a sentence taken from an enormous corpus of texts, reaching trillions of “tokens” (or semantic subunits, such as syllables, prefixes, suffixes, etc.). These texts come from web pages, forums, scientific articles, books and newspaper articles, most likely protected by copyright.
Few players detail this corpus, including those whose language models are said to be open source. OpenAI does not communicate this information, Meta did so for Llama, but not for Llama 2. Google, for Bard, was not more forthcoming…
Can we, despite the lack of transparency, read the “brains” of these algorithms made up of billions of parameters? Can we know what they have read or not? The English team answers in the affirmative. “We were motivated by the idea of making this aspect of language models less opaque, because what they know comes precisely from this data”explains Yves-Alexandre de Montjoye, associate professor at Imperial College.
An opaque learning corpus
The researchers carried out a so-called “membership inference” attack on a large language model, Llama, from the Meta company, or more precisely on an identical version, OpenLlama, whose learning corpus was made public – which made it possible to validate the researchers’ predictions, presented in a preprint (an article not yet accepted by a scientific journal) submitted on October 23 to a conference.
The researchers first selected their own corpus of books (38,300 in number) and scientific articles (1.6 million), drawn from the Redpajama database of the company Hugging Face. Each of these families was divided into two, “possible member of the training corpus” Or “non-member” (because taken at a later date during OpenLlama training). For each token in these texts, they tested the language model by studying which word it suggests after a sentence of approximately 128 tokens and what probability it assigns to the real word. These gaps between the model and reality over thousands of sentences make it possible to construct a sort of signature of each book or article. “In fact, we are looking to see if the model is “surprised” by a text”, summarizes Yves-Alexandre de Montjoye. In a second step, they built a program capable of classifying a text as “member of the training corpus” or “non-member”, by training this program with their results obtained on the two types of text. These calculations take approximately one minute per pound of approximately 100,000 tokens.
You have 30% of this article left to read. The rest is reserved for subscribers.
Gn Fr Enter