A group of computer scientists at the University of California, Riverside has introduced a new technique to remove private and copyrighted data from artificial intelligence models without needing access to the original training material. The research was presented in July at the International Conference on Machine Learning in Vancouver, Canada.
The method addresses concerns about personal and copyrighted content remaining embedded in AI systems, even after creators attempt to delete or restrict access to their information. These issues have become more prominent as privacy regulations such as the European Union’s General Data Protection Regulation and California’s Consumer Privacy Act require stronger protection for personal data used in machine learning.
Ümit Yiğit Başaran, a doctoral student in electrical and computer engineering at UC Riverside and lead author of the study, explained the significance of their work: “In real-world situations, you can’t always go back and get the original data,” Başaran said. “We’ve created a certified framework that works even when that data is no longer available.”
The approach allows AI models to “forget” selected information while maintaining their performance with remaining data. This process does not require retraining models from scratch—a task that can be expensive and energy-consuming—making it possible to amend existing systems efficiently.
The research team included Başaran, professor Amit Roy-Chowdhury, and assistant professor Başak Güler. They developed what they describe as a “source-free certified unlearning” method. This involves using a surrogate dataset that statistically resembles the original training set to adjust model parameters and add calibrated random noise so targeted information is erased and cannot be reconstructed.
Their framework builds on established concepts in AI optimization but introduces a new mechanism for calibrating noise to account for differences between original and surrogate datasets. Tests using both synthetic and real-world datasets showed privacy guarantees close to those achieved by full retraining but with significantly less computational demand.
Roy-Chowdhury noted that while this work currently applies to simpler AI models still widely used today, it could eventually scale up to complex systems like ChatGPT. He is co-director of UCR’s Riverside Artificial Intelligence Research and Education (RAISE) Institute.
Güler emphasized the broader impact: “People deserve to know their data can be erased from machine learning models—not just in theory, but in provable, practical ways,” she said.
Future plans include adapting the method for more complex model types and developing tools so AI developers worldwide can use this technology.
The paper is titled “A Certified Unlearning Approach without Access to Source Data.” It was completed with Sk Miraj Ahmed from Brookhaven National Laboratory, who earned his doctorate at UC Riverside. Both Roy-Chowdhury and Güler hold faculty positions in UCR’s Department of Electrical and Computer Engineering with secondary appointments in Computer Science and Engineering.



