POLICY

Child safety org flags new CSAM with AI trained on real child sex abuse images

AI will make it harder to spread CSAM online, child safety org says.

2024-11-21 22:11

For years, hashing technology has made it possible for platforms to automatically detect known child sexual abuse materials (CSAM) to stop kids from being retraumatized online. However, rapidly detecting new or unknown CSAM remained a bigger challenge for platforms as new victims continued to be victimized. Now, AI may be ready to change that. Today, a prominent child safety organization, Thorn, in partnership with a leading cloud-based AI solutions provider, Hive, announced the release of an API expanding access to an AI model designed to flag unknown CSAM. It's the earliest use of AI technology striving to expose unreported CSAM at scale. An expansion of Thorn's CSAM detection tool, Safer, the AI feature uses "advanced machine learning (ML) classification models" to "detect new or previously unreported CSAM," generating a "risk score to make human decisions easier and faster." The model was trained in part using data from the National Center for Missing and Exploited Children (NCMEC) CyberTipline, relying on real CSAM data to detect patterns in harmful images and videos. Once suspected CSAM is flagged, a human reviewer remains in the loop to ensure oversight. It could potentially be used to probe suspected CSAM rings proliferating online. It could also, of course, make mistakes, but Kevin Guo, Hive's CEO, told Ars that extensive testing was conducted to reduce false positives or negatives substantially. While he wouldn't share stats, he said that platforms would not be interested in a tool where "99 out of a hundred things the tool is flagging aren't correct." Rebecca Portnoff, Thorn's vice president of data science, told Ars that it was a "no-brainer" to partner with Hive on Safer. Hive provides content moderation models used by hundreds of popular online communities, and Guo told Ars that platforms have consistently asked for tools to detect unknown CSAM, much of which currently festers in blindspots online because the hashing database will never expose it. Those platforms include everything from social media to e-commerce to dating apps, Guo told Ars. Any platform where users can upload content could potentially benefit from the functionality, Guo said, but early partners in the AI rollout are being kept confidential. Thorn appears open to working with all comers. "CSAM is platform-agnostic and requires a proactive, technology-led approach," Safer's partnership criteria says. "We will engage in dialogue with any company that comes to us interested in using Safer to better protect their platform." Adoption is key to the AI tool's impact. The AI model is expected to be refined over time, theoretically getting better at detecting new CSAM the more widely it's used across the Internet. Portnoff told Ars that Thorn's goal is to work "iteratively," retraining the models "at a regular cadence" to avoid false negatives or positives and improve the overall performance. Portnoff told Ars that AI solutions can also make it easier for platforms to respond when bad actors try to break content moderation systems. Portnoff has been working at the intersection of AI and child safety for more than a decade. She said that Safer's CSAM classifier could inspire a range of classifiers that could use AI to target specific child safety concerns and "deploy at scale in a way that is able to bring impact to a broad range of platforms as quickly as possible." Next in Thorn and Hive's pipeline, the Safer tool will soon use an AI text classifier to "flag conversations that may indicate child exploitation," Thorn's website said. Guo told Ars that Hive is not yet ready to release the text classifier, but it has been highly requested by platforms as well. Other future classifiers might focus on flagging when young children are depicted in suspected CSAM to escalate cases for police, Portnoff suggested. Classifiers could also similarly escalate certain kinds of child exploitation detected that's considered more egregious, Portnoff said. "The core of the value of the CSAM classifier is, you're able to find children who may be in an active abuse scenario and help to prevent future revictimization," Portnoff told Ars.

Will it detect AI-generated CSAM?

Since Safer launched in 2019, the tool has identified more than 6 million potential CSAM files, Thorn's press release said. And if the predictive AI feature is widely adopted, that could result in "a material decrease in the amounts of CSAM content that we see on the Internet," Guo told Ars. For Hive, AI content moderation solutions are an area of active research, with AI deepfakes emerging as a top concern for platforms and a key focus for Hive. Guo told Ars that deepfake detection will be urgently needed as AI makes it easier than ever to generate harmful content and CSAM. "You can just imagine the intersection of AI-generated content and CSAM presents a whole new host of issues, and we really need to get ahead of it," Guo told Ars. Currently, Safer is not designed to flag AI-generated CSAM. But Guo told Ars that "making the model more robust towards handling new domains like AI-generated content" will unavoidably become necessary in a "future world where you have essentially endless amounts of AI-generated content making its way through." Thorn isn't currently seeing the "flood" of AI-generated CSAM that police warned is making it harder to investigate CSAM cases in the real world. But Thorn agrees with the US Department of Justice that AI-generated CSAM is CSAM and will continue working on AI solutions that limit harms in the absence of proactive solutions that would prevent CSAM from being generated by AI. That's "obviously an issue that's emerging that needs to be reckoned with," Portnoff told Ars. She suggested that a "holistic strategy of response" would pair "strong detection solutions" like Safer with efforts from AI companies to "prevent the creation of this material to begin with."