0

OpenAI inks deal to train AI on Reddit data | TechCrunch

OpenAI has reached an agreement With Reddit to use the social news site’s data for training AI models.

one in blog post On OpenAI’s press relations site, the company said the Reddit partnership will give it access to “real-time, structured and unique content” from Reddit – such as posts and replies – allowing its tools and models to “better understand and demonstrate “Will be allowed. Material. Reddit content will be included chatgptOpenAI’s popular conversational AI, and the companies will work together to bring unspecified new “AI-powered features” to both Reddit users and moderators.

OpenAI will also become a Reddit advertising partner.

“Reddit will build on OpenAI’s platform of AI models to bring its powerful vision to life,” OpenAI wrote in the post. “The use of ML, ML, and AI allows Reddit to improve the user experience for everyone.”

OpenAI has a number of similar licensing deals with content providers ranging from stock media libraries to news publishers. But what’s unusual is that OpenAI CEO Sam Altman has a 8.7% stake in RedditMaking him the third largest shareholder, and was also once a member of the company’s board of directors.

In an effort to discourage investigation, OpenAI stated in its press release that, while Altman remains a Reddit shareholder, the partnership was led by OpenAI’s COO [Brad Lightcap]approved by “and” [OpenAI’s] Independent Board of Directors.” (I will note here that Altman is a member of OpenAI’s board; he defended this decision to himself, however, an OpenAI spokesperson tells TechCrunch.)

Reddit has made data licensing agreements a central part of its growth strategy as it navigates the market as a public company.

In its IPO prospectus, Reddit revealed that it has contractual agreements to license its data Customers including Google The total value exceeds $200 million. And, in its first earnings report as a public company, Reddit reported a 450% year-over-year increase in non-advertising revenue, largely due to those deals.

Reddit stock was up 11% in extended trading after the OpenAI deal was announced.

“The paradox I see is that, as more content on the Internet is written by machines, the premium on content coming from real people is increasing,” Reddit CEO Steve Huffman said during the company’s earnings call in March. ” “And we’ve had almost two decades of authentic conversations.”

Reddit’s platform – which has more than 1 billion posts and 16 billion comments, figures that grow every day thanks to its millions of active users – is a goldmine for generative AI companies, whose models use examples of content like text and Let’s learn from. Images to generate new, similar content.

But the company may face pushback from users who are concerned about how it is monetizing their data.

It is instructive to look at StackOverflow, a question-and-answer forum for software developers, which recently entered into an agreement with OpenAI to supply data for OpenAI’s model training. In protest, some users removed their top-rated answers to questions on the community. But Stack Overflow restored the deleted posts and banned those users, claiming they were not in compliance with its terms of service.

Reddit has already expressed its displeasure over an effort to give Reddit users more control over their data.

wannaA startup built on blockchain is attempting to launch a data “DAO” (digital autonomous organization) to let Reddit users pool their data and let them decide together how to use that combined data (or be sold). Reddit banned Wana’s subreddit dedicated to discussion about the DAO in a statement to TechCrunch and accused the company of “exploiting” its data export controls.

openai-inks-deal-to-train-ai-on-reddit-data-techcrunch