Researchers analyzed 13,962 paragraph excerpts from 34 #OReilly books, finding that #OpenAI #GPT4o "recognized" significantly more paywalled content than older models like GPT-3.5 Turbo. Technique, also known as a "membership inference attack," tests whether a model can reliably distinguish human-authored texts from paraphrased versions.
"GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O'Reilly books published prior to its training cutoff date"
https://techcrunch.com/2025/04/01/researchers-suggest-openai-trained-ai-models-on-paywalled-oreilly-books/
