On the Security of Modern AI: Backdoors, Robustness, and Detectability
Alex Bardas
Fengjun Li
Zijun Yao
John Symons
The rapid development of AI has significantly impacted security and privacy, introducing both new cyber-attacks targeting AI models and challenges related to responsible use. As AI models become more widely adopted in real-world applications, attackers exploit adversarially altered samples to manipulate their behaviors and decisions. Simultaneously, the use of generative AI, like ChatGPT, has sparked debates about the integrity of AI-generated content.
In this dissertation, we investigate the security of modern AI systems and the detectability of AI-related threats, focusing on stealthy AI attacks and responsible AI use in academia. First, we reevaluate the stealthiness of 20 state-of-the-art attacks on six benchmark datasets, using 24 image quality metrics and over 30,000 user annotations. Our findings reveal that most attacks introduce noticeable perturbations, failing to remain stealthy. Motivated by this, we propose a novel model-poisoning neural Trojan, LoneNeuron, which minimally modifies the host neural network by adding a single neuron after the first convolution layer. LoneNeuron responds to feature-domain patterns that transform into invisible, sample-specific, and polymorphic pixel-domain watermarks, achieving a 100% attack success rate without compromising main task performance and enhancing stealth and detection resistance. Additionally, we examine the detectability of ChatGPT-generated content in academic writing. Presenting GPABench2, a dataset of over 2.8 million abstracts across various disciplines, we assess existing detection tools and challenges faced by over 240 evaluators. We also develop CheckGPT, a detection framework consisting of an attentive Bi-LSTM and a representation module, to capture subtle semantic and linguistic patterns in ChatGPT-generated text. Extensive experiments validate CheckGPT’s high applicability, transferability, and robustness.