Defense Notices


All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Vinay Kumar Reddy Budideti

NutriBot: An AI-Powered Personalized Nutrition Recommendation Chatbot Using Rasa

When & Where:


Eaton Hall, Room 2001B

Committee Members:

David Johnson, Chair
Victor Frost
Prasad Kulkarni


Abstract

In recent years, the intersection of Artificial Intelligence and healthcare has paved the way for intelligent dietary assistance. NutriBot is an AI-powered chatbot developed using the Rasa framework to deliver personalized nutrition recommendations based on user preferences, diet types, and nutritional goals. This full-stack system integrates Rasa NLU, a Flask backend, the Nutritionix API for real-time food data, and a React.js + Tailwind CSS frontend for seamless interaction. The system is containerized using Docker and deployable on cloud platforms like GCP.

The chatbot supports multi-turn conversations, slot-filling, and remembers user preferences such as dietary restrictions or nutrient focus (e.g., high protein). Evaluation of the system showed perfect intent and entity recognition accuracy, fast API response times, and user-friendly fallback handling. While NutriBot currently lacks persistent user profiles and multilingual support, it offers a highly accurate, scalable framework for future extensions such as fitness tracker integration, multilingual capabilities, and smart assistant deployment.


Arun Kumar Punjala

Deep Learning-Based MRI Brain Tumor Classification: Evaluating Sequential Architectures for Diagnostic Accuracy

When & Where:


Eaton Hall, Room 2001B

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Dongjie Wang


Abstract

Accurate classification of brain tumors from MRI scans plays a vital role in assisting clinical diagnosis and treatment planning. This project investigates and compares three deep learning-based classification approaches designed to evaluate the effectiveness of integrating recurrent layers into conventional convolutional architectures. Specifically, a CNN-LSTM model, a CNN-RNN model with GRU units, and a baseline CNN classifier using EfficientNetB0 are developed and assessed on a curated MRI dataset.

The CNN-LSTM model uses ResNet50 as a feature extractor, with spatial features reshaped and passed through stacked LSTM layers to explore sequential learning on static medical images. The CNN-RNN model implements TimeDistributed convolutional layers followed by GRUs, examining the potential benefits of GRU-based modeling. The EfficientNetB0-based CNN model, trained end-to-end without recurrent components, serves as the performance baseline.

All three models are evaluated using training accuracy, validation loss, confusion matrices, and class-wise performance metrics. Results show that the CNN-LSTM architecture provides the most balanced performance across tumor types, while the CNN-RNN model suffers from mild overfitting. The EfficientNetB0 baseline offers stable and efficient classification for general benchmarking.


Mahmudul Hasan

Assertion-Based Security Assessment of Hardware IP Protection Methods

When & Where:


Eaton Hall, Room 2001B

Committee Members:

Tamzidul Hoque, Chair
Esam El-Araby
Sumaiya Shomaji


Abstract

Combinational and sequential locking methods are promising solutions for protecting hardware intellectual property (IP) from piracy, reverse engineering, and malicious modifications by locking the functionality of the IP based on a secret key. To improve their security, researchers are developing attack methods to extract the secret key.  

While the attacks on combinational locking are mostly inapplicable for sequential designs without access to the scan chain, the limited applicable attacks are generally evaluated against the basic random insertion of key gates. On the other hand, attacks on sequential locking techniques suffer from scalability issues and evaluation of improperly locked designs. Finally, while most attacks provide an approximately correct key, they do not indicate which specific key bits are undetermined. This thesis proposes an oracle-guided attack that applies to both combinational and sequential locking without scan chain access. The attack applies light-weight design modifications that represent the oracle using a finite state machine and applies an assertion-based query of the unlocking key. We have analyzed the effectiveness of our attack against 46 sequential designs locked with various classes of combinational locking including random, strong, logic cone-based, and anti-SAT based. We further evaluated against a sequential locking technique using 46 designs with various key sequence lengths and widths. Finally, we expand our framework to identify undetermined key bits, enabling complementary attacks on the smaller remaining key space.


Masoud Ghazikor

Distributed Optimization and Control Algorithms for UAV Networks in Unlicensed Spectrum Bands

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Morteza Hashemi, Chair
Victor Frost
Prasad Kulkarni


Abstract

UAVs have emerged as a transformative technology for various applications, including emergency services, delivery, and video streaming. Among these, video streaming services in areas with limited physical infrastructure, such as disaster-affected areas, play a crucial role in public safety. UAVs can be rapidly deployed in search and rescue operations to efficiently cover large areas and provide live video feeds, enabling quick decision-making and resource allocation strategies. However, ensuring reliable and robust UAV communication in such scenarios is challenging, particularly in unlicensed spectrum bands, where interference from other nodes is a significant concern. To address this issue, developing a distributed transmission control and video streaming is essential to maintaining a high quality of service, especially for UAV networks that rely on delay-sensitive data.

In this MSc thesis, we study the problem of distributed transmission control and video streaming optimization for UAVs operating in unlicensed spectrum bands. We develop a cross-layer framework that jointly considers three inter-dependent factors: (i) in-band interference introduced by ground-aerial nodes at the physical layer, (ii) limited-size queues with delay-constrained packet arrival at the MAC layer, and (iii) video encoding rate at the application layer. This framework is designed to optimize the average throughput and PSNR by adjusting fading thresholds and video encoding rates for an integrated aerial-ground network in unlicensed spectrum bands. Using consensus-based distributed algorithm and coordinate descent optimization, we develop two algorithms: (i) Distributed Transmission Control (DTC) that dynamically adjusts fading thresholds to maximize the average throughput by mitigating trade-offs between low-SINR transmission errors and queue packet losses, and (ii) Joint Distributed Video Transmission and Encoder Control (JDVT-EC) that optimally balances packet loss probabilities and video distortions by jointly adjusting fading thresholds and video encoding rates. Through extensive numerical analysis, we demonstrate the efficacy of the proposed algorithms under various scenarios.


Ganesh Nurukurti

Customer Behavior Analytics and Recommendation System for E-Commerce

When & Where:


Eaton Hall, Room 2001B

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Han Wang


Abstract

In the era of digital commerce, personalized recommendations are pivotal for enhancing user experience and boosting engagement. This project presents a comprehensive recommendation system integrated into an e-commerce web application, designed using Flask and powered by collaborative filtering via Singular Value Decomposition (SVD). The system intelligently predicts and personalizes product suggestions for users based on implicit feedback such as purchases, cart additions, and search behavior.

 

The foundation of the recommendation engine is built on user-item interaction data, derived from the Brazilian e-commerce Olist dataset. Ratings are simulated using weighted scores for purchases and cart additions, reflecting varying degrees of user intent. These interactions are transformed into a user-product matrix and decomposed using SVD, yielding latent user and product features. The model leverages these latent factors to predict user interest in unseen products, enabling precise and scalable recommendation generation.

 

To further enhance personalization, the system incorporates real-time user activity. Recent search history is stored in an SQLite database and used to prioritize recommendations that align with the user’s current interests. A diversity constraint is also applied to avoid redundancy, limiting the number of recommended products per category.

 

The web application supports robust user authentication, product exploration by category, cart management, and checkout simulations. It features a visually driven interface with dynamic visualizations for product insights and user interactions. The home page adapts to individual preferences, showing tailored product recommendations and enabling users to explore categories and details.

 

In summary, this project demonstrates the practical implementation of a hybrid recommendation strategy combining matrix factorization with contextual user behavior. It showcases the importance of latent factor modeling, data preprocessing, and user-centric design in delivering an intelligent retail experience.


Srijanya Chetikaneni

Plant Disease Prediction Using Transfer Learning

When & Where:


Eaton Hall, Room 2001B

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Han Wang


Abstract

Timely detection of plant diseases is critical to safeguarding crop yields and ensuring global food security. This project presents a deep learning-based image classification system to identify plant diseases using the publicly available PlantVillage dataset. The core objective was to evaluate and compare the performance of a custom-built Convolutional Neural Network (CNN) with two widely used transfer learning models—EfficientNetB0 and MobileNetV3Small. 

All models were trained on augmented image data resized to 224×224 pixels, with preprocessing tailored to each architecture. The custom CNN used simple normalization, whereas EfficientNetB0 and MobileNetV3Small utilized their respective pre-processing methods to standardize the pretrained ImageNet domain inputs. To improve robustness, the training pipeline included data augmentation, class weighting, and early stopping.

Training was conducted using the Adam optimizer and categorical cross-entropy loss over 30 epochs, with performance assessed using accuracy, loss, and training time metrics. The results revealed that transfer learning models significantly outperformed the custom CNN. EfficientNetB0 achieved the highest accuracy, making it ideal for high-precision applications, while MobileNetV3Small offered a favorable balance between speed and accuracy, making it suitable for lightweight, real-time inference on edge devices.

This study validates the effectiveness of transfer learning for plant disease detection tasks and emphasizes the importance of model-specific preprocessing and training strategies. It provides a foundation for deploying intelligent plant health monitoring systems in practical agricultural environments.


Rahul Purswani

Finetuning Llama on custom data for QA tasks

When & Where:


Eaton Hall, Room 2001B

Committee Members:

David Johnson, Chair
Drew Davidson
Prasad Kulkarni


Abstract

Fine-tuning large language models (LLMs) for domain-specific use cases, such as question answering, offers valuable insights into how their performance can be tailored to specialized information needs. In this project, we focused on the University of Kansas (KU) as our target domain. We began by scraping structured and unstructured content from official KU webpages, covering a wide array of student-facing topics including campus resources, academic policies, and support services. From this content, we generated a diverse set of question-answer pairs to form a high-quality training dataset. LLaMA 3.2 was then fine-tuned on this dataset to improve its ability to answer KU-specific queries with greater relevance and accuracy. Our evaluation revealed mixed results—while the fine-tuned model outperformed the base model on most domain-specific questions, the original model still had an edge in handling ambiguous or out-of-scope prompts. These findings highlight the strengths and limitations of domain-specific fine-tuning, and provide practical takeaways for customizing LLMs for real-world QA applications.


Ahmet Soyyigit

Anytime Computing Techniques for LiDAR-based Perception In Cyber-Physical Systems

When & Where:


Nichols Hall, Room 250 (Gemini Room)

Committee Members:

Heechul Yun, Chair
Michael Branicky
Prasad Kulkarni
Hongyang Sun
Shawn Keshmiri

Abstract

The pursuit of autonomy in cyber-physical systems (CPS) presents a challenging task of real-time interaction with the physical world, prompting extensive research in this domain. Recent advances in artificial intelligence (AI), particularly the introduction of deep neural networks (DNN), have significantly improved the autonomy of CPS, notably by boosting perception capabilities.

CPS perception aims to discern, classify, and track objects of interest in the operational environment, a task that is considerably challenging for computers in a three-dimensional (3D) space. For this task, the use of LiDAR sensors and processing their readings with DNNs has become popular because of their excellent performance However, in CPS such as self-driving cars and drones, object detection must be not only accurate but also timely, posing a challenge due to the high computational demand of LiDAR object detection DNNs. Satisfying this demand is particularly challenging for on-board computational platforms due to size, weight, and power constraints. Therefore, a trade-off between accuracy and latency must be made to ensure that both requirements are satisfied. Importantly, the required trade-off is operational environment dependent and should be weighted more on accuracy or latency dynamically at runtime. However, LiDAR object detection DNNs cannot dynamically reduce their execution time by compromising accuracy (i.e. anytime computing). Prior research aimed at anytime computing for object detection DNNs using camera images is not applicable to LiDAR-based detection due to architectural differences. This thesis addresses these challenges by proposing three novel techniques: Anytime-LiDAR, which enables early termination with reasonable accuracy; VALO (Versatile Anytime LiDAR Object Detection), which implements deadline-aware input data scheduling; and MURAL (Multi-Resolution Anytime Framework for LiDAR Object Detection), which introduces dynamic resolution scaling. Together, these innovations enable LiDAR-based object detection DNNs to make effective trade-offs between latency and accuracy under varying operational conditions, advancing the practical deployment of LiDAR object detection DNNs.


Rithvij Pasupuleti

A Machine Learning Framework for Identifying Bioinformatics Tools and Database Names in Scientific Literature

When & Where:


LEEP2, Room 2133

Committee Members:

Cuncong Zhong, Chair
Dongjie Wang
Han Wang
Zijun Yao

Abstract

The absence of a single, comprehensive database or repository cataloging all bioinformatics databases and software creates a significant barrier for researchers aiming to construct computational workflows. These workflows, which often integrate 10–15 specialized tools for tasks such as sequence alignment, variant calling, functional annotation, and data visualization, require researchers to explore diverse scientific literature to identify relevant resources. This process demands substantial expertise to evaluate the suitability of each tool for specific biological analyses, alongside considerable time to understand their applicability, compatibility, and implementation within a cohesive pipeline. The lack of a central, updated source leads to inefficiencies and the risk of using outdated tools, which can affect research quality and reproducibility. Consequently, there is a critical need for an automated, accurate tool to identify bioinformatics databases and software mentions directly from scientific texts, streamlining workflow development and enhancing research productivity. 

 

The bioNerDS system, a prior effort to address this challenge, uses a rule-based named entity recognition (NER) approach, achieving an F1 score of 63% on an evaluation set of 25 articles from BMC Bioinformatics and PLoS Computational Biology. By integrating the same set of features such as context patterns, word characteristics and dictionary matches into a machine learning model, we developed an approach using an XGBoost classifier. This model, carefully tuned to address the extreme class imbalance inherent in NER tasks through synthetic oversampling and refined via systematic hyperparameter optimization to balance precision and recall, excels at capturing complex linguistic patterns and non-linear relationships, ensuring robust generalization. It achieves an F1 score of 82% on the same evaluation set, significantly surpassing the baseline. By combining rule-based precision with machine learning adaptability, this approach enhances accuracy, reduces ambiguities, and provides a robust tool for large-scale bioinformatics resource identification, facilitating efficient workflow construction. Furthermore, this methodology holds potential for extension to other technological domains, enabling similar resource identification in fields like data science, artificial intelligence, or computational engineering.


Past Defense Notices

Dates

BHARGHAVA DESU

VIN Database Application to Assist National Highway Traffic Safety Agency

When & Where:


246 Nichols Hall

Committee Members:

Prasad Kulkarni, Chair
Andy Gill
Richard Wang


Abstract

The number of vehicle manufacturers and the number of vehicles produced have been significantly increasing each year. With more vehicles on road, the number of accidents on the National Highways in the US increased notably. NHTSA (National Highway Traffic Safety Agency) is a federal agency which works towards preventing vehicle crashes and their attendant costs. They plan and execute several operations and control measures to find and solve the problems causing accidents. One such initiative is to analyze the primary causes of all the vehicle crashes and maintain a streamlined data of vehicle Identification catalog customized for DOT and NHTSA. Maintaining a data on about 250+ millions of vehicles and analyze them needs a robust database and an application for its maintenance. At StrongBridge Corporation, we developed VPICLIST, an application for NHTSA to assist their analytic projects with data entry and pattern decoding of VIN information catalog. The application employs precise pattern matching techniques to dump data into distributed databases which in turn collaborate to a central database of NHTSA. It allows decoding of VIN each at a time by the public and also decoding thousands of VINS simultaneously for internal use of NHTSA. To hold and operate upon several PBs of data, insertion and retrieval process of the application emulates a distributed architecture. The application is developed in Java and uses Oracle enterprise database for distributed small collections and NoSQL system for the central database.


VENKATA SUBRAMANYA HYMA YADAVALLI

Framework for Shear Wave Velocity 2D Profiling with Topography

When & Where:


246 Nichols Hall

Committee Members:

Prasad Kulkarni, Chair
Perry Alexander
Heechul Yun


Abstract

The study of shear wave velocity (Vs) of near surface materials has been one of the primary areas of interest in seismic analyses. ‘Vs’ serves as the best indicator in evaluating the stiffness of a material from its shear modulus. One of the economical methods to obtain Vs profiling information is through the analysis of dispersion property of surface waves. SurfSeis4 - Software developed by the Kansas Geological Survey (KGS) utilizes Multichannel Analysis of Surface Waves (MASW) method to obtain shear wave velocity 2D (Surface location and depth) profiling. The profiling information is obtained in the form of a grid through inversion of dispersion curves. The Vs 2D map module of SurfSeis4, integrates the functionality of interpolating this grid to approximate the variation of shear wave velocity across the surface locations. The current project is an extension of the existing SurfSeis4 Vs 2D mapping module in its latest release of SurfSeis5 that incorporates topography in shear wave velocity variation and facilitates users with advanced image interpolation options.


LIYAO WANG

High Current Switch for Switching Power Supplies

When & Where:


2001B Eaton Hall

Committee Members:

Jim Stiles, Chair
Chris Allen
Glenn Prescott


Abstract

One of the main components in switching power supply is switch. However, there are two main negative issues the switch will cause in a switching power supply. The first one is that the power dissipation of the switch will be unimaginable high, especially when the current go through the switch gets higher. Secondly, because there are so many parasitic inductances and capacitances in the circuit, transient will cause problems when the operating state of the switch changes. In this project, P-Spice is used to design a qualify swith and suppress the negative effect as much as possible. The purpose of this project is to design a switch for hardware design in switching power supplies. Therefore, all the components used in P-Spice simulation are the actual models which is able to get from electronic market, and all the situations which may be happen in hardware design will be consider in the simulation. Both Mosfet and bipolar transistor switch will be discussed in the project. The project will give solutions for reducing the power dissipation cause by the switch and transient problems.


MANOGNA BHEEMINENI

Implementation and Comparison of FSA Filters

When & Where:


246 Nichols Hall

Committee Members:

Fengjun Li, Chair
Victor Frost
Bo Luo


Abstract

Packet Filtering is a process of filtering the packets based on the filters rules that are being defines by the user. The focus of this project is to implement and compare the performance of two different packet filtering techniques (SFSA and PFSA), that uses FSA(finite state automaton) for the filtering process. Stateless FSA(SFSA) is a packet filtering technique where an FSA is generated based on the input packet and the filtering criteria. Then succeed early algorithm is applied to the automaton which simplifies by the automaton by shortening long trails to the the final state which reduces the packet filtering time. It also uses transition compaction algorithm which helps in avoiding certain areas in packet inspection which are not necessary for packet filtering. 
PFSA (predicates of FSA) does the filtering based on predicates generated by the predicate evaluator. In this filtering process the FSA generated as state transitions which depend on the input symbol and also the predicate value. In order to simplify the FSA algorithms like predicates Cartesian product and predicates anticipation algorithms are being used. These algorithms consider all states that are possible and merge them to make the FSA deterministic. There is also a proto FSA that is being generated for the predicates to speed up the filtering process. 


SREENIVAS VEKAPU

Chemocaffe: A Platform providing Deep Learning as a Service to Cheminformatics Researchers

When & Where:


2001B Eaton Hall

Committee Members:

Luke Huan, Chair
Man Kong
Prasad Kulkarni


Abstract

Neural Networks were studied and applied to many research problems from a long time. With gaining popularity of deep neural networks in the area of machine learning, many researchers in various domains want to try deep learning framework. Deep learning requires lot of memory and high processing power. One way of doing it faster is to make use of GPUs which use distributed and parallel processing, thereby increasing speed. But because of the computation (lot of vector and matrix operations) deep learning requires, expensive infrastructure required (GPUs and clusters), hardware and software installation overhead, not many researchers prefer deep learning. The current application is a solution to cheminformatics problems using Convolutional Architecture for Fast Feature Embedding (Caffe) deep learning framework. The application provides a framework/service to researchers who want to try deep learning on their datasets. The application accepts datasets from users along with options for hyper parameter configuration, runs cross fold validation on the training dataset, and makes predictions on the test dataset. The (tuning) results of running caffe on the training dataset and predictions made on test dataset are sent to user via an email. The current version supports binary classification that predicts activity/inactivity of a chemical compound based on molecular fingerprints which are binary features.


YUFEI CHENG

Future Internet Routing Design for Massive Failures and Attacks

When & Where:


246 Nichols Hall

Committee Members:

James Sterbenz, Chair
Jiannong Cao
Victor Frost
Fengjun Li
Michael Vitevitch

Abstract

Given the high complexity and increasing traffic load of the current Internet, the geographically-correlated challenge caused by large-scale disasters or malicious attacks pose a significant threat to dependable network communications. To understand its characteristics, we start our research by first proposing a critical-region identification mechanism. Furthermore, the identified regions are incorporated into a new graph resilience metric, compensated Total Geographical Graph Diversity (cTGGD), which is capable of characterizing and differentiating resiliency levels for different topologies. We further propose the path geodiverse problem (PGD) that requires the calculation of a number of geographically disjoint paths, and two heuristics with less complexity compared to the optimal algorithm. We present two flow-diverse multi-commodity flow problems, a linear minimum-cost and a nonlinear delay-skew optimization problem to study the tradeoff among cost, end-to-end delay, and traffic skew on different geodiverse paths. We further prototype and integrate the solution from above models into our cross-layer resilient protocol stack, ResTP--GeoDivRP. Our protocol stack is implemented in the network simulator ns-3 and emulated in the KanREN testbed. By providing multiple geodiverse paths, our protocol stack provides better path protection than Multipath TCP (MPTCP) against geographically-correlated challenges. Finally, we analyze the mechanism attackers could utilize to maximize the attack impact and demonstrate the effectiveness of a network restoration plan. 


HARSHITH POTU

Android Application for Interactive Teaching

When & Where:


250 Nichols Hall

Committee Members:

Prasad Kulkarni, Chair
Esam El-Araby
Andy Gill


Abstract

In a world with enormously growing technologies and applications, most people use smart 
devices. This provides a means to develop smart applications that will be help students learn effectively. 
In this project, we develop a smart android application which will provide digital means of 
interaction between the professors and students. Instead of using traditional emails for every 
discussion, this application helps to broadcast multiple messages to the class through a single 
click. The students will also be able to follow multiple professors and participate in the active 
discussions. And also this application allows the users to send personal messages to the other 
users in order to participate in an active discussion. It provides unique logins to every student 
and professor. It uses mongoDB as the database and "parse" backend as a service.The main 
inspiration for this project was an application called Tophat. 


ABDULMALIK HUMAYED

Security Protection for Smart Cars — A CPS Perspective

When & Where:


246 Nichols Hall

Committee Members:

Bo Luo, Chair
Arvin Agah
Prasad Kulkarni
Heechul Yun
Prajna Dhar

Abstract

As the passenger vehicles evolve to be “smart”, electronic components, including communication, intelligent control and entertainment, are continuously introduced to new models and concept vehicles. The new paradigm introduces new features and benefits, but also brings new security issues, which is often overlooked in the industry as well as in the research community. 

Smart cars are considered cyber-physical systems (CPS) because of their integration of cyber- and physical- components. In recent years, various threats, vulnerabilities, and attacks have been discovered from different models of smart cars. In the worst- case scenario, external attackers may remotely obtain full control of the vehicle by exploiting an existing vulnerability. 

In this research, we investigate smart cars’ security from a CPS’ perspective and derive a taxonomy of threats, vulnerabilities, attacks, and controls. In addition, we investigate three security solutions that would improve the security posture of automotive networks. First, as automotive networks are highly vulnerable to Denial of Service (DoS) attacks, we investigate a solution that effectively mitigates such attacks, namely ID-Hopping. In addition, because several attacks have successfully exploited the poor separation between critical and non-critical components in the automotive networks, we propose to investigate the effectiveness of firewalls and Intrusion Detection Systems (IDS) to prevent and detect such exploitations. To evaluate our proposals, we built a test bench that is composed of five microcontrollers and a communication bus to simulate an automotive network. Simulations and experiments performed with the testbed demonstrates the effectiveness of ID-hopping against DoS attacks. 


CAITLIN McCOLLISTER

Predicting Author Traits Through Topic Modeling of Multilingual Social Media Text

When & Where:


246 Nichols Hall

Committee Members:

Bo Luo, Chair
Arvin Agah
Luke Huan


Abstract

One source of insight into the motivations of a modern human being is the text they write and post for public consumption online, in forms such as personal status updates, product reviews, or forum discussions. The task of inferring traits about an author based on their writing is often called "author profiling." One challenging aspect of author profiling in today’s world is the increasing diversity of natural languages represented on social media websites. Furthermore, the informal nature of such writing often inspires modifications to standard spelling and grammatical structure which are highly language-specific. 
These are some of the dilemmas that inspired a series of so-called "shared task" competitions, in which many participants work to solve a single problem in different ways, in order to compare their methods and results. This thesis describes our submission to one author profiling shared task in which 22 teams implemented software to predict the age, gender, and certain personality traits of Twitter users based on the content of their posts to the website. We will also analyze the performance and implementation of our system compared to those of other teams, all of which were described in open-access reports. 
The competition organizers provided a labeled training dataset of tweets in English, Spanish, Dutch, and Italian, and evaluated the submitted software on a similar but hidden dataset. Our approach is based on applying a topic modeling algorithm to an auxiliary, unlabeled but larger collection of tweets we collected in each language, and representing tweets from the competition dataset in terms of a vector of 100 topics. We then trained a random forest classifier based on the labeled training dataset to predict the age, gender and personality traits for authors of tweets in the test set. Our software ranked in the top half of participants in English and Italian, and the top third in Dutch.


ANIRUDH NARASIMMAN

Arcana: Private Tweets on a Public Microblog Platform

When & Where:


250 Nichols Hall

Committee Members:

Bo Luo, Chair
Luke Huan
Prasad Kulkarni


Abstract

As one of the world’s most famous online social networks (OSN), Twitter now has 320 million monthly active users. Accompanying the large user group and abundant personal information, users increasingly realize the vulnerability of tweets and have reservations of showing certain tweets to different follower groups, such as colleagues, friends and other followers. However, Twitter does not offer enough privacy protection or access control functions. Users can just set an account as protected, which results in only the user’s followers seeing the tweet. The protected tweet does not appear in the public domain, third party sites and search engines cannot access the tweet. However, a protected account cannot distinguish between different follower groups or users who use multiple accounts. To serve the demand of the user so that they can restrict the access of each tweet to certain follower groups, we propose a browser plug-in system, which utilizes CP-ABE (Ciphertext Policy Attribute based encryption), allowing the user to select followers based on predefined attributes. Through simple installation and pre-setting, the user can encrypt and decrypt tweets conveniently and can avoid the fear of information leakage.