Deep learning has recently become state-of-the-art in many computer vision applications and in image classification in particular. It is now a mature technology that can be used in several real-life tasks. However, it is possible to create adversarial examples, containing changes unnoticeable to humans, which cause an incorrect classification by a deep convolutional neural network. This represents a serious threat for machine learning methods. In this paper we investigate the robustness of the representations learned by the fooled neural network. Specifically, we use a kNN classifier over the activations of hidden layers of the convolutional neural network, in order to define a strategy for distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.
@INPROCEEDINGS{2017-Carrara-CBMI, author={F. Carrara, F. Falchi, R. Caldelli, G. Amato, R. Fumarola, R. Beccarelli}, booktitle={2017 16th International Workshop on Content-Based Multimedia Indexing (CBMI)}, title={Detecting adversarial examples in deep neural networks}, year={2017}, pages={1-7}, }
<generation_method>/<original_class_id>_<original_filename>_adversarial.png
In the following table, we report the scores assigned to adversarial images by our best approach (pool5 + PCA + DW-kNN).
From left to right, columns respectively report:
A low kNN score indicates that the adversarial is correctly detected while a high score means that our approach is wrongly confident about the prediction of the CNN. The results show that high scoring adversarials examples often share some common visual aspects and semantic with the predicted (adversarial) class, resulting in a more challenging detection.
Adversarial Image | Generation Algorithm | Actual Class | Fooled Class | Nearest Neighbor | kNN score {{adversarials.asd}} |
---|---|---|---|---|---|
{{adv.type}} | {{adv.actual_text}} | {{adv.pred_text}} | {{adv.knn_score | number : 2}} |
This work was partially supported by Smart News, Social sensing for breaking news, co-founded by the Tuscany region under the FAR-FAS 2014 program, CUP CIPE D58C15000270008. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.