POLICY

Artists may “poison” AI models before Copyright Office can issue guidance

2023-11-03 14:11

Artists have spent the past year fighting companies that have been training AI image generators—including popular tools like the impressively photorealistic Midjourney or the ultra-sophisticated DALL-E 3—on their original works without consent or compensation. Now, the United States has promised to finally get serious about addressing their copyright concerns raised by AI, President Joe Biden said in his much-anticipated executive order on AI, which was signed this week. The US Copyright Office had already been seeking public input on AI concerns over the past few months through a comment period ending on November 15. Biden's executive order has clarified that following this comment period, the Copyright Office will publish the results of its study. And then, within 180 days of that publication—or within 270 days of Biden's order, "whichever comes later"—the Copyright Office's director will consult with Biden to "issue recommendations to the President on potential executive actions relating to copyright and AI." "The recommendations shall address any copyright and related issues discussed in the United States Copyright Office’s study, including the scope of protection for works produced using AI and the treatment of copyrighted works in AI training," Biden's order said. That means that potentially within the next six to nine months (or longer), artists may have answers to some of their biggest legal questions, including a clearer understanding of how to protect their works from being used to train AI models. Currently, artists do not have many options to stop AI image makers—which generate images based on user text prompts—from referencing their works. Even companies like OpenAI, which recently started allowing artists to opt out of having works included in AI training data, only allow artists to opt out of future training data. Artists can't opt out of training data that fuels existing tools because, as OpenAI says:

After AI models have learned from their training data, they no longer have access to the data. The models only retain the concepts that they learned. When someone makes a request of a model, the model generates output based on its understanding of the concepts included in the request. It does not search for or copy content from an existing database.

According to The Atlantic, this opt-out process—which requires artists to submit requests for each artwork and could be too cumbersome for many artists to complete—leaves artists stuck with only the option of protecting new works that "they create from here on out." It seems like it's too late to protect any work "already claimed by the machines" in 2023, The Atlantic warned. And this issue clearly affects a lot of people. A spokesperson told The Atlantic that Stability AI alone has fielded “over 160 million opt-out requests in upcoming training.” Until federal regulators figure out what rights artists ought to retain as AI technologies rapidly advance, at least one artist—cartoonist and illustrator Sarah Andersen—is advancing a direct copyright infringement claim against Stability AI, maker of Stable Diffusion, another remarkable AI image synthesis tool. Andersen, whose proposed class action could impact all artists, has about a month to amend her complaint to "plausibly plead that defendants’ AI products allow users to create new works by expressly referencing Andersen’s works by name," if she wants "the inferences" in her complaint "about how and how much of Andersen’s protected content remains in Stable Diffusion or is used by the AI end-products" to "be stronger," a judge recommended. In other words, under current copyright laws, Andersen will likely struggle to win her legal battle if she fails to show the court which specific copyrighted images were used to train AI models and demonstrate that those models used those specific images to spit out art that looks exactly like hers. Citing specific examples will matter, one legal expert told TechCrunch, because arguing that AI tools mimic styles likely won't work—since "style has proven nearly impossible to shield with copyright." Andersen's lawyers told Ars that her case is "complex," but they remain confident that she can win, possibly because, as other experts told The Atlantic, she might be able to show that "generative-AI programs can retain a startling amount of information about an image in their training data—sometimes enough to reproduce it almost perfectly." But she could fail if the court decides that using data to train AI models is fair use of artists' works, a legal question that remains unclear.

Stronger copyright laws could favor Big Tech

The last thing that AI companies want to do at this point of peaking popularity for image generators is to retrain AI models, but there is a chance that the government could one day require that. Beyond Biden pushing to deliver much-needed copyright guidance next year, the Federal Trade Commission may have the power to intervene. Andrew Burt, a founder of an AI-focused law firm called BNH.ai, told TechCrunch that the FTC is "already pursuing" what the FTC calls “algorithmic disgorgement”—which is where the FTC "forces tech firms to kill problematic algorithms along with any ill-gotten data that they used to train them." It seems possible then that, should artists suing AI makers win or should the Copyright Office provide such a recommendation, enforcers could one day order AI image makers to retrain models using only permitted, licensed data. While stronger copyright protections like that likely sound like a win for artists, it may not be the best for the field of AI. The Atlantic reported that companies adopting more "artist-friendly alternatives" to opting out—like revenue-sharing or requiring artists to opt in to have their works referenced to train AI models—could end up squeezing out small companies and largely benefiting only the biggest tech companies that can afford those alternatives. That could result in concentrating even more power over AI into the hands of only the largest tech companies, potentially limiting innovation while deepening the pockets of AI companies fighting legal claims. As regulatory machines churn, mulling how best to address AI copyright concerns, artists expect that it could take years before laws change, Bloomberg reported. And in the meantime, some artists who fear being replaced by robots are doing what they do best: They're getting creative. Rather than rely on opting out of future AI training data sets—or, as OpenAI recommends, blocking AI makers' web crawlers from accessing and scraping their sites in the future—artists are figuring out how to manipulate their images to block AI models from correctly interpreting their content. One service released in August, called Glaze, slightly modifies the pixels in an artist's images to trick the AI model into seeing a different art style. And the maker of that tool, a professor at the University of Chicago named Ben Zhao, is releasing another tool to help artists even further scramble how AI models interpret their works. Nightshade, which Bloomberg said will be released in the coming weeks, alters artists' images with the intent to corrupt and destroy AI training models, as Ars reported last week. It works by tricking AI models into misidentifying objects in images, and as the number of poisoned images being added to the training data increases, the more confused the AI model will theoretically be. "The point of this tool is to balance the playing field between model trainers and content creators," Zhao told Ars in a statement. "Right now model trainers have 100 percent of the power. The only tools that can slow down crawlers are opt-out lists and do-not-crawl directives, all of which are optional and rely on the conscience of AI companies, and of course none of it is verifiable or enforceable and companies can say one thing and do another with impunity. This tool would be the first to allow content owners to push back in a meaningful way against unauthorized model training." For AI makers, widespread use of this tool could become so problematic that they could possibly be forced to retrain their models without any regulatory pressure. And ultimately, there's an argument to be made that AI makers need artists to support their quests for innovation. Artists think it's worth noting that AI tools will only be as powerful as the original artworks used to train them. So, in the long run, AI companies could risk diminishing the value of their products by alienating artists whose works remain key to their models' success. AI companies will have a lot of ground to cover if they wish to make amends with artists struggling to see how AI models have not stolen their art. Artists intent on defending their works have described how it feels to watch machines make alleged "fakes" with no repercussions as not just "egregiously" violating but also "enraging."