The New York Times is suing OpenAI and Microsoft for copyright infringement, claiming the two companies built their AI models by “copying and using millions” of the publication’s articles and now “directly compete” with its content as a result.

As outlined in the lawsuit, the Times alleges OpenAI and Microsoft’s large language models (LLMs), which power ChatGPT and Copilot, “can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style.” This “undermine[s] and damage[s]” the Times’ relationship with readers, the outlet alleges, while also depriving it of “subscription, licensing, advertising, and affiliate revenue.”

The complaint also argues that these AI models “threaten high-quality journalism” by hurting the ability of news outlets to protect and monetize content. “Through Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment,” the lawsuit states.

The full text of the lawsuit can be found here

  • CJOtheReal@ani.social
    link
    fedilink
    arrow-up
    0
    ·
    11 months ago

    Its not piracy to just webscrap everything for data…

    There isn’t a person sitting around and pirating shit, its a Algorithm that takes everything from the internet it can reach.

    • HarkMahlberg@kbin.social
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      11 months ago

      Yeah… That’s not a good defense if you think about it. If someone made a Reddit comment with the entire contents of Discworld (idk, just an example), and OpenAI scraped all of Reddit to train their model, well now they’ve used copyrighted material without paying for a commercial license, and now they’re on the hook. By being unscrupulous about their scraping, they actually open themselves up to more liability than if they were more careful about what they scrape and where.

      This is all to say nothing of the fact that several other major companies were caught pants down by training with databases explicitly created by torrenting a ton of books.

      https://torrentfreak.com/authors-accuse-openai-of-using-pirate-sites-to-train-chatgpt-230630/

      There is no direct evidence that OpenAI used pirate sites to train ChatGPT. That said, it is no secret that some AI projects have trained on pirated material in the past, as an excellent summary from Search Engine Journal highlights.

      The mainstream media has picked up this issue too. The Washington Post previously reported that the “C4 data set,” which Google and Facebook used to train their AI models, included Z-Library and various other pirate sites.