A downloadable project

We trained a simple Convolutional Neural Network on a poisoned version of the MNIST dataset. Some elements of the dataset include a watermark, for which the label has been modified. We describe the process for uncovering the path through the network the watermark takes by method of ablation and poisoning visualization through feature maximization methods. We also discuss applications to safety and further generalizations.

Github repo for the project

More information

Status	Released
Category	Other
Author	kkittif

Download

Write up.pdf 1.3 MB

AutoAdminsteredAntidotes: Circuit detection in a poisoned model for MNIST classification

Download

Leave a comment