How Much Are AI Companies Paying for News Content?
How news publishers and AI companies are shaping the future of news reporting with AI
AI Mastery Digest Newsletter
News publishers are making deals with AI companies like OpenAI to let them use their news stories to train their AI models. But how much are these AI companies paying for the news content?
According to The Information, OpenAI pays between $1 million and $5 million a year to news publishers for the right to use their articles to train its AI models. This is one of the first clues we have about how much AI companies are willing to spend on licensed content. It comes along with another report that says Apple wants to work with media companies to use their content for AI training and is offering at least $50 million over several years for the data. The Verge asked OpenAI to confirm these numbers.
These numbers seem similar to some older deals that were not related to AI. When Meta started the Facebook News tab — which is no longer available in Europe — it reportedly paid up to $3 million a year to news publishers for the right to show their stories, headlines, and previews. But we don't know if these payments are as big as some of the other ones we have seen. Google said in 2020 that it would spend $1 billion in total to work with news publishers, for example. And after a new law forced it to do so, Google also agreed to pay Canadian publishers $100 million a year in total for the right to link to their articles.
Most of the large language models we have today have been trained on data from the internet, as far as we know what is in their training data. Some AI models don't tell us how they got their training data, but we can often find out which datasets or web crawlers they used. The price of training datasets depends on who provides them, how big they are, and what kind of data they have. Some data providers, like LAION, are open source and free and are used by models like Stable Diffusion. AI developers also often use web crawlers to get data from the internet to help train their models. (But AI developers still have to pay people to check, label, and sometimes fix the training data, which adds a lot to their costs.)
But this way of doing things is now facing some big problems. For one thing, some companies, like The New York Times and Vox Media, which owns The Verge, have blocked OpenAI's GPT crawler from getting their data. For another thing, some organizations say that training on their data is breaking the law. The New York Times and others have taken OpenAI and Microsoft to court for breaking the law, saying that ChatGPT and Microsoft's Copilot can make output that is almost the same as their work.
Making partnerships lets AI companies avoid these problems, and it has become more common in the last year. Publishers like Axel Springer — which owns Politico and Business Insider — and The Associated Press have made deals with OpenAI to let them use their stories to train models like GPT-4 and make technology for news reporting.
OpenAI and Apple are not the only AI developers who want to work with news publishers. Google showed an AI tool called Genesis to executives from The New York Times, The Wall Street Journal, and The Washington Post. This tool can take facts and make news stories out of them. Some news publishers have also used AI tools that can make content in their newsrooms, but with mixed results.