Invernet: An Adversarial Attack Framework to Infer Downstream Context Distribution Through Word Embedding Inversion


Student Name: Ishrak Haye
Defense Date:
Location: Nichols Hall, Room 246
Chair: Bo Luo

Zijun Yao, Co-Chair

Alex Bardas

Fengjun Li

Abstract:

Word embedding has become a popular form of data representation that is used to train deep neural networks in many natural

language processing tasks, such as Machine Translation, Question Answer Generation, Named Entity Recognition, Next

Word/Sentence Prediction etc. With embedding, each word is represented as a dense vector which captures its semantic relationship

with other words and can better empower Machine Learning models to achieve state-of-the-art performance.

However, due to the memory and time intensive nature of learning such word embeddings, transfer learning has emerged as a

common practice to warm start the training process. As a result, an efficient way is to initialize with pretrained word vectors and then

fine-tune those on downstream domain specific smaller datasets. This study aims to find whether we can infer the contextual

distribution (i.e., how words cooccur in a sentence driven by syntactic regularities) of the downstream datasets given that we have

access to the embeddings from both pre-training and fine-tuning processes.

In this work, we propose a focused sampling method along with a novel model inversion architecture “Invernet” to invert word

embeddings into the word-to-word context information of the fine-tuned dataset. We consider the popular word2Vec models

including CBOW, SkipGram, and GloVe based algorithms with various unsupervised settings. We conduct extensive experimental

study on two real-world news datasets: Antonio Gulli’s News Dataset from Hugging Face repository and a New York Times dataset

from both quantitative and qualitative perspectives. Results show that “Invernet” has been able to achieve an average F1 score of 0.75

and an average AUC score of 0.85 in an attack scenario.

A concerning pattern from our experiments reveal that embedding models that are generally considered superior in different tasks

tend to be more vulnerable to model inversion. Our results suggest that a significant amount of context distribution information from

the downstream dataset can potentially leak if an attacker gets access to the pretrained and fine-tuned word embeddings. As a result,

attacks using “Invernet” can jeopardize the privacy of the users whose data might have been used to fine-tune the word embedding

model.

Degree: MS Thesis Defense (CS)
Degree Type: MS Thesis Defense
Degree Field: Computer Science