Towards Robust Deep Learning Systems against Stealthy Attacks
Alex Bardas
Fengjun Li
Zijun Yao
John Symons
The deep neural network (DNN) models are the core components of the machine learning solutions. However, their wide adoption in real-world applications raises increasing security concerns. Various attacks have been proposed against DNN models, such as the evasion and backdoor attacks. Attackers utilize adversarially altered samples, which are supposed to be stealthy and imperceptible to human eyes, to fool the targeted model into misbehaviors. This could result in severe consequences, such as self-driving cars ignoring traffic signs or colliding with pedestrians.
In this work, we aim to investigate the security and robustness of deep learning systems against stealthy attacks. To do this, we start by reevaluating the stealthiness assumptions made by the start-of-the-art attacks through a comprehensive study. We implement 20 representative attacks on six benchmark datasets. We evaluate the visual stealthiness of the attack samples using 24 metrics for image similarity or quality and over 30,000 annotations in a user study. Our results show that the majority of the existing attacks introduce non-negligible perturbations that are not stealthy. Next, we propose a novel model-poisoning neural Trojan, namely LoneNeuron, which introduces only minimum modification to the host neural network with a single neuron after the first convolution layer. LoneNeuron responds to feature-domain patterns that transform into invisible, sample-specific, and polymorphic pixel-domain watermarks. With high attack specificity, LoneNeuron achieves a 100% attack success rate and does not compromise the primary task performance. Additionally, its unique watermark polymorphism further improves watermark randomness, stealth, and resistance to Trojan detection.