I was reading about wearable technologies (Recent Advances in Wearable Sensing Technologies). They briefly talk about Right to forget and a question came to my mind. Suppose that we trained a deep learning model (e.g., CNN) by using face images collected from 1000 participants (10 face images from each participant = 10×1000 images in sum). After training, participant 1 wanted to remove his/her face data from the model. Instead of re-train the model with 999 participants, are there any way of removing the impact of participant 1's data from the trained model? I made a quick search but I could not find it. Did you hear about something like that?
PS: When I think about the forward and backward propagation processes and how we decrease the error, it seems to me to be not possible.
Best Answer
The keyword you're looking for is machine unlearning; if you search for that on Google scholar you'll find a large number of relevant studies. This is an active active area of research for exactly the reason you described. For CNNs, it seems to me that there is not really great solution yet (but I might be wrong).
For example, one solution that people (Bourtoule et al. 2021) have proposed is to split the training data into separate shards (=smaller subdatasets) and then train separate models on each of these shards. For prediction/inference, the output of these separate weak learners can then be combined in various ways (see Boosting). Why is this helpful for unlearning? Well, the influence of a single training point is thereby limited to a single submodel, and if that datapoint must be removed, then "only" this submodel has to be retrained.
There are various other methods proposed, but as I said, it seems to me to be an essentially open research question. A comprehensive reference list can be found here.
Two remarks that may or may not be of interest: