Defense Notices
All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.
Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.
Upcoming Defense Notices
Sudha Chandrika Yadlapalli
BERT-Driven Sentiment Analysis: Automated Course Feedback Classification and RatingsWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
David Johnson, ChairPrasad Kulkarni
Hongyang Sun
Abstract
Automating the analysis of unstructured textual data, such as student course feedback, is crucial for gaining actionable insights. This project focuses on developing a sentiment analysis system leveraging the DeBERTa-v3-base model, a variant of BERT (Bidirectional Encoder Representations from Transformers), to classify feedback sentiments and generate corresponding ratings on a 1-to-5 scale.
A dataset of 100,000+ student reviews was preprocessed and fine-tuned on the model to handle class imbalances and capture contextual nuances. Training was conducted on high-performance A100 GPUs, which enhanced computational efficiency and reduced training times significantly. The trained BERT sentiment model demonstrated superior performance compared to traditional machine learning models, achieving ~82% accuracy in sentiment classification.
The model was seamlessly integrated into a functional web application, providing a streamlined approach to evaluate and visualize course reviews dynamically. Key features include a course ratings dashboard, allowing students to view aggregated ratings for each course, and a review submission functionality where new feedback is analyzed for sentiment in real-time. For the department, an admin page provides secure access to detailed analytics, such as the distribution of positive and negative reviews, visualized trends, and the access to view individual course reviews with their corresponding sentiment scores.
This project includes a comprehensive pipeline, starting from data preprocessing and model training to deploying an end-to-end application. Traditional machine learning models, such as Logistic Regression and Decision Tree, were initially tested but yielded suboptimal results. The adoption of BERT, trained on a large dataset of 100k reviews, significantly improved performance, showcasing the benefits of advanced transformer-based models for sentiment analysis tasks.
Shriraj K. Vaidya
Exploring DL Compiler Optimizations with TVMWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Prasad Kulkarni, ChairDongjie Wang
Zijun Yao
Abstract
Deep Learning (DL) compilers, also called Machine Learning (ML) compilers, take a computational graph representation of a ML model as input and apply graph-level and operator-level optimizations to generate optimized machine-code for different supported hardware architectures. DL compilers can apply several graph-level optimizations, including operator fusion, constant folding, and data layout transformations to convert the input computation graph into a functionally equivalent and optimized variant. The DL compilers also perform kernel scheduling, which is the task of finding the most efficient implementation for the operators in the computational graph. While many research efforts have focused on exploring different kernel scheduling techniques and algorithms, the benefits of individual computation graph-level optimizations are not as well studied. In this work, we employ the TVM compiler to perform a comprehensive study of the impact of different graph-level optimizations on the performance of DL models on CPUs and GPUs. We find that TVM's graph optimizations can improve model performance by up to 41.73% on CPUs and 41.6% on GPUs, and by 16.75% and 21.89%, on average, on CPUs and GPUs, respectively, on our custom benchmark suite.
Zhaohui Wang
Enhancing Security and Privacy of IoT Systems: Uncovering and Resolving Cross-App ThreatsWhen & Where:
Nichols Hall, Room 250 (Gemini Room)
Committee Members:
Fengjun Li, ChairAlex Bardas
Drew Davidson
Bo Luo
Haiyang Chao
Abstract
The rapid growth of Internet of Things (IoT) technology has brought unprecedented convenience to our daily lives, enabling users to customize automation rules and develop IoT apps to meet their specific needs. However, as IoT devices interact with multiple apps across various platforms, users are exposed to complex security and privacy risks. Even interactions among seemingly harmless apps can introduce unforeseen security and privacy threats.
In this work, we introduce two innovative approaches to uncover and address these concealed threats in IoT environments. The first approach investigates hidden cross-app privacy leakage risks in IoT apps. These risks arise from cross-app chains that are formed among multiple seemingly benign IoT apps. Our analysis reveals that interactions between apps can expose sensitive information such as user identity, location, tracking data, and activity patterns. We quantify these privacy leaks by assigning probability scores to evaluate the risks based on inferences. Additionally, we provide a fine-grained categorization of privacy threats to generate detailed alerts, enabling users to better understand and address specific privacy risks. To systematically detect cross-app interference threats, we propose to apply principles of logical fallacies to formalize conflicts in rule interactions. We identify and categorize cross-app interference by examining relations between events in IoT apps. We define new risk metrics for evaluating the severity of these interferences and use optimization techniques to resolve interference threats efficiently. This approach ensures comprehensive coverage of cross-app interference, offering a systematic solution compared to the ad hoc methods used in previous research.
To enhance forensic capabilities within IoT, we integrate blockchain technology to create a secure, immutable framework for digital forensics. This framework enables the identification, tracing, storage, and analysis of forensic information to detect anomalous behavior. Furthermore, we developed a large-scale, manually verified, comprehensive dataset of real-world IoT apps. This clean and diverse benchmark dataset supports the development and validation of IoT security and privacy solutions. Each of these approaches has been evaluated using our dataset of real-world apps, collectively offering valuable insights and tools for enhancing IoT security and privacy against cross-app threats.
Hao Xuan
A Unified Algorithmic Framework for Biological Sequence AlignmentWhen & Where:
Nichols Hall, Room 250 (Gemini Room)
Committee Members:
Cuncong Zhong, ChairFengjun Li
Suzanne Shontz
Hongyang Sun
Liang Xu
Abstract
Sequence alignment is pivotal in both homology searches and the mapping of reads from next-generation sequencing (NGS) and third-generation sequencing (TGS) technologies. Currently, the majority of sequence alignment algorithms utilize the “seed-and-extend” paradigm, designed to filter out unrelated or nonhomologous sequences when no highly similar subregions are detected. A well-known implementation of this paradigm is BLAST, one of the most widely used multipurpose aligners. Over time, this paradigm has been optimized in various ways to suit different alignment tasks. However, while these specialized aligners often deliver high performance and efficiency, they are typically restricted to one or few alignment applications. To the best of our knowledge, no existing aligner can perform all alignment tasks while maintaining superior performance and efficiency.
In this work, we introduce a unified sequence alignment framework to address this limitation. Our alignment framework is built on the seed-and-extend paradigm but incorporates novel designs in its seeding and indexing components to maximize both flexibility and efficiency. The resulting software, the Versatile Alignment Toolkit (VAT), allows the users to switch seamlessly between nearly all major alignment tasks through command-line parameter configuration. VAT was rigorously benchmarked against leading aligners for DNA and protein homolog searches, NGS and TGS read mapping, and whole-genome alignment. The results demonstrated VAT’s top-tier performance across all benchmarks, underscoring the feasibility of using a unified algorithmic framework to handle diverse alignment tasks. VAT can simplify and standardize bioinformatic analysis workflows that involve multiple alignment tasks.
Manu Chaudhary
Utilizing Quantum Computing for Solving Multidimensional Partial Differential EquationsWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Esam El-Araby, ChairPerry Alexander
Tamzidul Hoque
Prasad Kulkarni
Tyrone Duncan
Abstract
Quantum computing has the potential to revolutionize computational problem-solving by leveraging the quantum mechanical phenomena of superposition and entanglement, which allows for processing a large amount of information simultaneously. This capability is significant in the numerical solution of complex and/or multidimensional partial differential equations (PDEs), which are fundamental to modeling various physical phenomena. There are currently many quantum techniques available for solving partial differential equations (PDEs), which are mainly based on variational quantum circuits. However, the existing quantum PDE solvers, particularly those based on variational quantum eigensolver (VQE) techniques, suffer from several limitations. These include low accuracy, high execution times, and low scalability on quantum simulators as well as on noisy intermediate-scale quantum (NISQ) devices, especially for multidimensional PDEs.
In this work, we propose an efficient and scalable algorithm for solving multidimensional PDEs. We present two variants of our algorithm: the first leverages finite-difference method (FDM), classical-to-quantum (C2Q) encoding, and numerical instantiation, while the second employs FDM, C2Q, and column-by-column decomposition (CCD). Both variants are designed to enhance accuracy and scalability while reducing execution times. We have validated and evaluated our algorithm using the multidimensional Poisson equation as a case study. Our results demonstrate higher accuracy, higher scalability, and faster execution times compared to VQE-based solvers on noise-free and noisy quantum simulators from IBM. Additionally, we validated our approach on hardware emulators and actual quantum hardware, employing noise mitigation techniques. We will also focus on extending these techniques to PDEs relevant to computational fluid dynamics and financial modeling, further bridging the gap between theoretical quantum algorithms and practical applications.
Venkata Sai Krishna Chaitanya Addepalli
A Comprehensive Approach to Facial Emotion Recognition: Integrating Established Techniques with a Tailored ModelWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
David Johnson, ChairPrasad Kulkarni
Hongyang Sun
Abstract
Facial emotion recognition has become a pivotal application of machine learning, enabling advancements in human-computer interaction, behavioral analysis, and mental health monitoring. Despite its potential, challenges such as data imbalance, variation in expressions, and noisy datasets often hinder accurate prediction.
This project presents a novel approach to facial emotion recognition by integrating established techniques like data augmentation and regularization with a tailored convolutional neural network (CNN) architecture. Using the FER2013 dataset, the study explores the impact of incremental architectural improvements, optimized hyperparameters, and dropout layers to enhance model performance.
The proposed model effectively addresses issues related to data imbalance and overfitting while achieving enhanced accuracy and precision in emotion classification. The study underscores the importance of feature extraction through convolutional layers and optimized fully connected networks for efficient emotion recognition. The results demonstrate improvements in generalization, setting a foundation for future real-time applications in diverse fields.
Ye Wang
Deceptive Signals: Unveiling and Countering Sensor Spoofing Attacks on Cyber SystemsWhen & Where:
Nichols Hall, Room 250 (Gemini Room)
Committee Members:
Fengjun Li, ChairDrew Davidson
Rongqing Hui
Bo Luo
Haiyang Chao
Abstract
In modern computer systems, sensors play a critical role in enabling a wide range of functionalities, from navigation in autonomous vehicles to environmental monitoring in smart homes. Acting as an interface between physical and digital worlds, sensors collect data to drive automated functionalities and decision-making. However, this reliance on sensor data introduces significant potential vulnerabilities, leading to various physical, sensor-enabled attacks such as spoofing, tampering, and signal injection. Sensor spoofing attacks, where adversaries manipulate sensor input or inject false data into target systems, pose serious risks to system security and privacy.
In this work, we have developed two novel sensor spoofing attack methods that significantly enhance both efficacy and practicality. The first method employs physical signals that are imperceptible to humans but detectable by sensors. Specifically, we target deep learning based facial recognition systems using infrared lasers. By leveraging advanced laser modeling, simulation-guided targeting, and real-time physical adjustments, our infrared laser-based physical adversarial attack achieves high success rates with practical real-time guarantees, surpassing the limitations of prior physical perturbation attacks. The second method embeds physical signals, which are inherently present in the system, into legitimate patterns. In particular, we integrate trigger signals into standard operational patterns of actuators on mobile devices to construct remote logic bombs, which are shown to be able to evade all existing detection mechanisms. Achieving a zero false-trigger rate with high success rates, this novel sensor bomb is highly effective and stealthy.
Our study on emerging sensor-based threats highlights the urgent need for comprehensive defenses against sensor spoofing. Along this direction, we design and investigate two defense strategies to mitigate these threats. The first strategy involves filtering out physical signals identified as potential attack vectors. The second strategy is to leverage beneficial physical signals to obfuscate malicious patterns and reinforce data integrity. For example, side channels targeting the same sensor can be used to introduce cover signals that prevent information leakage, while environment-based physical signals serve as signatures to authenticate data. Together, these strategies form a comprehensive defense framework that filters harmful sensor signals and utilizes beneficial ones, significantly enhancing the overall security of cyber systems.
SM Ishraq-Ul Islam
Quantum Circuit Synthesis using Genetic Algorithms Combined with Fuzzy LogicWhen & Where:
LEEP2, Room 1420
Committee Members:
Esam El-Araby, ChairTamzidul Hoque
Prasad Kulkarni
Abstract
Quantum computing emerges as a promising direction for high-performance computing in the post-Moore era. Leveraging quantum mechanical properties, quantum devices can theoretically provide significant speedup over classical computers in certain problem domains. Quantum algorithms are typically expressed as quantum circuits composed of quantum gates, or as unitary matrices. Execution of quantum algorithms on physical devices requires translation to machine-compatible circuits -- a process referred to as quantum compilation or synthesis.
Quantum synthesis is a challenging problem. Physical quantum devices support a limited number of native basis gates, requiring synthesized circuits to be composed of only these gates. Moreover, quantum devices typically have specific qubit topologies, which constrain how and where gates can be applied. Consequently, logical qubits in input circuits and unitaries may need to be mapped to and routed between physical qubits on the device.
Current Noisy Intermediate-Scale Quantum (NISQ) devices present additional constraints, through their gate errors and high susceptibility to noise. NISQ devices are vulnerable to errors during gate application and their short decoherence times leads to qubits rapidly succumbing to accumulated noise and possibly corrupting computations. Therefore, circuits synthesized for NISQ devices need to have a low number of gates to reduce gate errors, and short execution times to avoid qubit decoherence.
The problem of synthesizing device-compatible quantum circuits, while optimizing for low gate count and short execution times, can be shown to be computationally intractable using analytical methods. Therefore, interest has grown towards heuristics-based compilation techniques, which are able to produce approximations of the desired algorithm to a required degree of precision. In this work, we investigate using Genetic Algorithms (GAs) -- a proven gradient-free optimization technique based on natural selection -- for circuit synthesis. In particular, we formulate the quantum synthesis problem as a multi-objective optimization (MOO) problem, with the objectives of minimizing the approximation error, number of multi-qubit gates, and circuit depth. We also employ fuzzy logic for runtime parameter adaptation of GA to enhance search efficiency and solution quality of our proposed quantum synthesis method.
Sravan Reddy Chintareddy
Combating Spectrum Crunch with Efficient Machine-Learning Based Spectrum Access and Harnessing High-frequency Bands for Next-G Wireless NetworksWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Morteza Hashemi, ChairVictor Frost
Erik Perrins
Dongjie Wang
Shawn Keshmiri
Abstract
There is an increasing trend in the number of wireless devices that is now already over 14 billion and is expected to grow to 40 billion devices by 2030. In addition, we are witnessing an unprecedented proliferation of applications and technologies with wireless connectivity requirements such as unmanned aerial vehicles, connected health, and radars for autonomous vehicles. The advent of new wireless technologies and devices will only worsen the current spectrum crunch that service providers and wireless operators are already experiencing. In this PhD study, we address these challenges through the following research thrusts, in which we consider two emerging applications aimed at advancing spectrum efficiency and high-frequency connectivity solutions.
First, we focus on effectively utilizing the existing spectrum resources for emerging applications such as networked UAVs operating within the Unmanned Traffic Management (UTM) system. In this thrust, we develop a coexistence framework for UAVs to share spectrum with traditional cellular networks by using machine learning (ML) techniques so that networked UAVs act as secondary users without interfering with primary users. We propose federated learning (FL) and reinforcement learning (RL) solutions to establish a collaborative spectrum sensing and dynamic spectrum allocation framework for networked UAVs. In the second part, we explore the potential of millimeter-wave (mmWave) and terahertz (THz) frequency bands for high-speed data transmission in urban settings. Specifically, we investigate THz-based midhaul links for 5G networks, where a network's central units (CUs) connect to distributed units (DUs). Through numerical analysis, we assess the feasibility of using 140 GHz links and demonstrate the merits of high-frequency bands to support high data rates in midhaul networks for future urban communications infrastructure. Overall, this research is aimed at establishing frameworks and methodologies that contribute toward the sustainable growth and evolution of wireless connectivity.
Arnab Mukherjee
Attention-Based Solutions for Occlusion Challenges in Person TrackingWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Prasad Kulkarni, ChairSumaiya Shomaji
Hongyang Sun
Jian Li
Abstract
Person tracking and association is a complex task in computer vision applications. Even with a powerful detector, a highly accurate association algorithm is necessary to match and track the correct person across all frames. This method has numerous applications in surveillance, and its complexity increases with the number of detected objects and their movements across frames. A significant challenge in person tracking is occlusion, which occurs when an individual being tracked is partially or fully blocked by another object or person. This can make it difficult for the tracking system to maintain the identity of the individual and track them effectively.
In this research, we propose a solution to the occlusion problem by utilizing an occlusion-aware spatial attention transformer. We have divided the entire tracking association process into two scenarios: occlusion and no-occlusion. When a detected person with a specific ID suddenly disappears from a frame for a certain period, we employ advanced methods such as Detector Integration and Pose Estimation to ensure the correct association. Additionally, we implement a spatial attention transformer to differentiate these occluded detections, transform them, and then match them with the correct individual using the Cosine Similarity Metric.
The features extracted from the attention transformer provide a robust baseline for detecting people, enhancing the algorithms adaptability and addressing key challenges associated with existing approaches. This improved method reduces the number of misidentifications and instances of ID switching while also enhancing tracking accuracy and precision.
Agraj Magotra
Data-Driven Insights into Sustainability: An Artificial Intelligence (AI) Powered Analysis of ESG Practices in the Textile and Apparel IndustryWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Sumaiya Shomaji, ChairPrasad Kulkarni
Zijun Yao
Abstract
The global textile and apparel (T&A) industry is under growing scrutiny for its substantial environmental and social impact, producing 92 million tons of waste annually and contributing to 20% of global water pollution. In Bangladesh, one of the world's largest apparel exporters, the integration of Environmental, Social, and Governance (ESG) practices is critical to meet international sustainability standards and maintain global competitiveness. This master's study leverages Artificial Intelligence (AI) and Machine Learning (ML) methodologies to comprehensively analyze unstructured corporate data related to ESG practices among LEED-certified Bangladeshi T&A factories.
Our study employs advanced techniques, including Web Scraping, Natural Language Processing (NLP), and Topic Modeling, to extract and analyze sustainability-related information from factory websites. We develop a robust ML framework that utilizes Non-Negative Matrix Factorization (NMF) for topic extraction and a Random Forest classifier for ESG category prediction, achieving an 86% classification accuracy. The study uncovers four key ESG themes: Environmental Sustainability, Social : Workplace Safety and Compliance, Social: Education and Community Programs, and Governance. The analysis reveals that 46% of factories prioritize environmental initiatives, such as energy conservation and waste management, while 44% emphasize social aspects, including workplace safety and education. Governance practices are significantly underrepresented, with only 10% of companies addressing ethical governance, healthcare provisions and employee welfare.
To deepen our understanding of the ESG themes, we conducted a Centrality Analysis to identify the most influential keywords within each category, using measures such as degree, closeness, and eigenvector centrality. Furthermore, our analysis reveals that higher certification levels, like Platinum, are associated with a more balanced emphasis on environmental, social, and governance practices, while lower levels focus primarily on environmental efforts. These insights highlight key areas where the industry can improve and inform targeted strategies for enhancing ESG practices. Overall, this ML framework provides a data-driven, scalable approach for analyzing unstructured corporate data and promoting sustainability in Bangladesh’s T&A sector, offering actionable recommendations for industry stakeholders, policymakers, and global brands committed to responsible sourcing.
Samyoga Bhattarai
‘Pro-ID: A Secure Face Recognition System using Locality Sensitive Hashing to Protect Human ID’When & Where:
Eaton Hall, Room 2001B
Committee Members:
Sumaiya Shomaji, ChairTamzidul Hoque
Hongyang Sun
Abstract
Face recognition systems are widely used in various applications, from mobile banking apps to personal smartphones. However, these systems often store biometric templates in raw form, posing significant security and privacy risks. Pro-ID addresses this vulnerability by incorporating SimHash, an algorithm of Locality Sensitive Hashing (LSH), to create secure and irreversible hash codes of facial feature vectors. Unlike traditional methods that leave raw data exposed to potential breaches, SimHash transforms the feature space into high-dimensional hash codes, safeguarding user identity while preserving system functionality.
The proposed system creates a balance between two aspects: security and the system’s performance. Additionally, the system is designed to resist common attacks, including brute force and template inversion, ensuring that even if the hashed templates are exposed, the original biometric data cannot be reconstructed.
A key challenge addressed in this project is minimizing the trade-off between security and performance. Extensive evaluations demonstrate that the proposed method maintains competitive accuracy rates comparable to traditional face recognition systems while significantly enhancing security metrics such as irreversibility, unlinkability, and revocability. This innovative approach contributes to advancing the reliability and trustworthiness of biometric systems, providing a secure framework for applications in face recognition systems.
Shalmoli Ghosh
High-Power Fabry-Perot Quantum-Well Laser Diodes for Application in Multi-Channel Coherent Optical Communication SystemsWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Rongqing Hui , ChairShannon Blunt
Jim Stiles
Abstract
Wavelength Division Multiplexing (WDM) is essential for managing rapid network traffic growth in fiber optic systems. Each WDM channel demands a narrow-linewidth, frequency-stabilized laser diode, leading to complexity and increased energy consumption. Multi-wavelength laser sources, generating optical frequency combs (OFC), offer an attractive solution, enabling a single laser diode to provide numerous equally spaced spectral lines for enhanced bandwidth efficiency.
Quantum-dot and quantum-dash OFCs provide phase-synchronized lines with low relative intensity noise (RIN), while Quantum Well (QW) OFCs offer higher power efficiency, but they have higher RIN in the low frequency region of up to 2 GHz. However, both quantum-dot/dash and QW based OFCs, individual spectral lines exhibit high phase noise, limiting coherent detection. Output power levels of these OFCs range between 1-20 mW where the power of each spectral line is typically less than -5 dBm. Due to this requirement, these OFCs require excessive optical amplification, also they possess relatively broad spectral linewidths of each spectral line, due to the inverse relationship between optical power and linewidth as per the Schawlow-Townes formula. This constraint hampers their applicability in coherent detection systems, highlighting a challenge for achieving high-performance optical communication.
In this work, coherent system application of a single-section Quantum-Well Fabry-Perot (FP) laser diode is demonstrated. This laser delivers over 120 mW optical power at the fiber pigtail with a mode spacing of 36.14 GHz. In an experimental setup, 20 spectral lines from a single laser transmitter carry 30 GBaud 16-QAM signals over 78.3 km single-mode fiber, achieving significant data transmission rates. With the potential to support a transmission capacity of 2.15 Tb/s (4.3 Tb/s for dual polarization) per transmitter, including Forward Error Correction (FEC) and maintenance overhead, it offers a promising solution for meeting the escalating demands of modern network traffic efficiently.
Anissa Khan
Privacy Preserving Biometric MatchingWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Perry Alexander, ChairPrasad Kulkarni
Fengjun Li
Abstract
Biometric matching is a process by which distinct features are used to identify an individual. Doing so privately is important because biometric data, such as fingerprints or facial features, is not something that can be easily changed or updated if put at risk. In this study, we perform a piece of the biometric matching process in a privacy preserving manner by using secure multiparty computation (SMPC). Using SMPC allows the identifying biological data, called a template, to remain stored by the data owner during the matching process. This provides security guarantees to the biological data while it is in use and therefore reduces the chances the data is stolen. In this study, we find that performing biometric matching using SMPC is just as accurate as performing the same match in plaintext.
Bryan Richlinski
Prioritize Program Diversity: Enumerative Synthesis with Entropy OrderingWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Sankha Guria, ChairPerry Alexander
Drew Davidson
Jennifer Lohoefener
Abstract
Program synthesis is a popular way to create a correct-by-construction program from a user-provided specification. Term enumeration is a leading technique to systematically explore the space of programs by generating terms from a formal grammar. These terms are treated as candidate programs which are tested/verified against the specification for correctness. In order to prioritize candidates more likely to satisfy the specification, enumeration is often ordered by program size or other domain-specific heuristics. However, domain-specific heuristics require expert knowledge, and enumeration by size often leads to terms comprised of frequently repeating symbols that are less likely to satisfy a specification. In this thesis, we build a heuristic that prioritizes term enumeration based on variability of individual symbols in the program, i.e., information entropy of the program. We use this heuristic to order programs in both top-down and bottom-up enumeration. We evaluated our work on a subset of the PBE-String track of the 2017 SyGuS competition benchmarks and compared against size-based enumeration. In top-down enumeration, our entropy heuristic shortens runtime in ~56% of cases and tests fewer programs in ~80% before finding a valid solution. For bottom-up enumeration, our entropy heuristic improves the number of enumerated programs in ~30% of cases before finding a valid solution, without improving the runtime. Our findings suggest that using entropy to prioritize program enumeration is a promising step forward for faster program synthesis.
Elizabeth Wyss
A New Frontier for Software Security: Diving Deep into npmWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Drew Davidson, ChairAlex Bardas
Fengjun Li
Bo Luo
J. Walker
Abstract
Open-source package managers (e.g., npm for Node.js) have become an established component of modern software development. Rather than creating applications from scratch, developers may employ modular software dependencies and frameworks--called packages--to serve as building blocks for writing larger applications. Package managers make this process easy. With a simple command line directive, developers are able to quickly fetch and install packages across vast open-source repositories. npm--the largest of such repositories--alone hosts millions of unique packages and serves billions of package downloads each week.
However, the widespread code sharing resulting from open-source package managers also presents novel security implications. Vulnerable or malicious code hiding deep within package dependency trees can be leveraged downstream to attack both software developers and the users of their applications. This downstream flow of software dependencies--dubbed the software supply chain--is critical to secure.
This research provides a deep dive into the npm-centric software supply chain, exploring various facets and phenomena that impact the security of this software supply chain. Such factors include (i) hidden code clones--which obscure provenance and can stealthily propagate known vulnerabilities, (ii) install-time attacks enabled by unmediated installation scripts, (iii) hard-coded URLs residing in package code, (iv) the impacts open-source development practices, and (v) package compromise via malicious updates. For each facet, tooling is presented to identify and/or mitigate potential security impacts. Ultimately, it is our hope that this research fosters greater awareness, deeper understanding, and further efforts to forge a new frontier for the security of modern software supply chains.
Jagadeesh Sai Dokku
Intelligent Chat Bot for KU Website: Automated Query Response and Resource NavigationWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
David Johnson, ChairPrasad Kulkarni
Hongyang Sun
Abstract
This project introduces an intelligent chatbot designed to improve user experience on our university website by providing instant, automated responses to common inquiries. Navigating a university website can be challenging for students, applicants, and visitors who seek quick information about admissions, campus services, events, and more. To address this challenge, we developed a chatbot that simulates human conversation using Natural Language Processing (NLP), allowing users to find information more efficiently. The chatbot is powered by a Bidirectional Long Short-Term Memory (BiLSTM) model, an architecture well-suited for understanding complex sentence structures. This model captures contextual information from both directions in a sentence, enabling it to identify user intent with high accuracy. We trained the chatbot on a dataset of intent-labeled queries, enabling it to recognize specific intentions such as asking about campus facilities, academic programs, or event schedules. The NLP pipeline includes steps like tokenization, lemmatization, and vectorization. Tokenization and lemmatization prepare the text by breaking it into manageable units and standardizing word forms, making it easier for the model to recognize similar word patterns. The vectorization process then translates this processed text into numerical data that the model can interpret. Flask is used to manage the backend, allowing seamless communication between the user interface and the BiLSTM model. When a user submits a query, Flask routes the input to the model, processes the prediction, and delivers the appropriate response back to the user interface. This chatbot demonstrates a successful application of NLP in creating interactive, efficient, and user-friendly solutions. By automating responses, it reduces reliance on manual support and ensures users can access relevant information at any time. This project highlights how intelligent chatbots can transform the way users interact with university websites, offering a faster and more engaging experience.
Past Defense Notices
Anjali Pare
Exploring Errors in Binary-Level CFG RecoveryWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Prasad Kulkarni, ChairFengjun Li
Bo Luo
Abstract
The control-flow graph (CFG) is a graphical representation of the program and holds information that is critical to the correct application of many other program analysis, performance optimization, and software security algorithms and techniques. While CFG generation is an ordinary task for source-level tools, like the compiler, the loss of high-level program information makes accurate CFG recovery a challenging issue for binary-level software reverse engineering (SRE) tools. Earlier research has shown that while advanced SRE tools can precisely reconstruct most of the CFG for the programs, important gaps and inaccuracies remain that may hamper critical tasks, from vulnerability and malicious code detection to adequately securing software binaries.
In this paper, we study three reverse engineering tools - angr, radare2 and Ghidra and perform an in-depth analysis of control-flow graphs generated by these tools. We develop a unique methodology using manual analysis and automated scripting to understand and categorize the CFG errors over a large benchmark set. Of the several interesting observations revealed by this work, one that is particularly unexpected is that most errors in the reconstructed CFGs appear to not be intrinsic limitations of the binary-level algorithms, as currently believed, and may be simply eliminated by more robust implementations. We expect our work to lead to more accurate CFG reconstruction in SRE tools and improved precision for other algorithms that employ CFGs.
Kailani Jones
Security Operation Centers: Analyzing COVID-19's Work-from-Home Influence on Endpoint Management and Developing a Sociotechnical Metrics FrameworkWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Alex Bardas, ChairDrew Davidson
Fengjun Li
Bo Luo
John Symons
Abstract
Security Operations Centers (SOCs) are central components of modern enterprise networks. Organizations in industry, government, and academia deploy SOCs to manage their networks, defend against cyber threats, and maintain regulatory compliance. For reporting, SOC leadership typically use metrics such as “number of security incidents”, “mean time to remediation/ticket closure”, and “risk analysis” to name a few. However, these commonly leveraged metrics may not necessarily reflect the effectiveness of a SOC and its supporting tools.
To better understand these environments, we employ ethnographic approaches (e.g., participant observation) and embed a graduate student (a.k.a., field worker) in a real-world SOC. As the field worker worked in-person, alongside SOC employees and recorded observations on technological tools, employees and culture, COVID-19's work-from-home (WFH) phenomena occurred. In response, this dissertation traces and analyzes the SOC's effort to adapt and reprioritize. By intersecting historical analysis (starting in the 1970s) and ethnographic field notes (analyzed 352 field notes across 1,000+ hours in a SOC over 34 months) whilst complementing with quantitative interviews (covering 7 other SOCs), we find additional causal forces that, for decades, have pushed SOC network management toward endpoints.
Although endpoint management is not a novel concept to SOCs, COVID-19's WFH phenomena highlighted the need for flexible, supportive, and customizable metrics. As such, we develop a sociotechnical metrics framework with these qualities in mind and limit the scope to a core SOC function: alert handling. With a similar ethnographic approach (participant observation paired with semi-structured interviews covering 15 SOC employees across 10 SOCs), we develop the framework's foundation by analyzing and capturing the alert handling process (a.k.a., alert triage). This process demonstrates the significance of not only technical expertise (e.g., data exfiltration, command and control, etc.) but also the social characteristics (e.g., collaboration, communication, etc.). In fact, we point out the underlying presence and importance of expert judgment during alert triaging particularly during conclusion development.
In addition to the aforementioned qualities, our alert handling sociotechnical metrics framework aims to capture current gaps during the alert triage process that, if improved, could help SOC employees' effectiveness. With the focus upon this process and the uncovered limitations SOCs usually face today during alert handling, we validate not only this flexibility of our framework but also the accuracy in a real-world SOC
Gordon Ariho
MULTIPASS SAR PROCESSING FOR ICE SHEET VERTICAL VELOCITY AND TOMOGRAPHY MEASUREMENTSWhen & Where:
Nichols Hall, Room 317 (Richard K. Moore Conference Room)
Committee Members:
James Stiles, ChairJohn Paden (Co-Chair)
Christopher Allen
Shannon Blunt
Emily Arnold
Abstract
We apply differential interferometric synthetic aperture radar (DInSAR) techniques to data from the Multichannel Coherent Radar Depth Sounder (MCoRDS) to measure the vertical displacement of englacial layers within an ice sheet. DInSAR’s accuracy is usually on the order of a small fraction of the wavelength (e.g., millimeter to centimeter precision is typical) in monitoring displacement along the radar line of sight (LOS). Ground-based Autonomous phase-sensitive Radio-Echo Sounder (ApRES) units have demonstrated the ability to precisely measure the relative vertical velocity by taking multiple measurements from the same location on the ice. Airborne systems can make a similar measurement but can suffer from spatial baseline errors since it is generally impossible to fly over the same stretch of ice on each pass with enough precision to ignore the spatial baseline. In this work, we compensate for spatial baseline errors using precise trajectory information and estimates of the cross-track layer slope using direction of arrival estimation. The current DInSAR algorithm is applied to airborne radar depth sounder data to produce results for flights near Summit camp and the EGIG (Expéditions Glaciologiques Internationales au Groenland) line in Greenland using the CReSIS toolbox. The current approach estimates the baseline error in multiple steps. Each step has dependencies on all the values to be estimated. To overcome this drawback, we have implemented a maximum likelihood estimator that jointly estimates the vertical velocity, the cross-track internal layer slope, and the unknown baseline error due to GPS and INS (Inertial Navigation System) errors. We incorporate the Lliboutry parametric model for vertical velocity into the maximum likelihood estimator framework.
To improve the direction of arrival estimation, we explore the use of focusing matrices against other wideband direction of arrival methods, such as wideband MLE, wideband MUSIC, and wideband MVDR, by comparing the mean squared error of the DOA estimates.
Dalton Brucker-Hahn
Mishaps in Microservices: Improving Microservice Architecture Security Through Novel Service Mesh CapabilitiesWhen & Where:
Nichols Hall, Room 129, Ron Evans Apollo Auditorium
Committee Members:
Alex Bardas, ChairDrew Davidson
Fengjun Li
Bo Luo
Huazhen Fang
Abstract
Shifting trends in modern software engineering and cloud computing have pushed system designs to leverage containerization and develop their systems into microservice architectures. While microservice architectures emphasize scalability and ease-of-development, the issue of microservice explosion has emerged, stressing hosting environments and generating new challenges within this domain. Service meshes, the latest in a series of developments, are being adopted to meet these needs. Service meshes provide separation of concerns between microservice development and the operational concerns of microservice deployments, such as service discovery and networking. However, despite the benefits provided by service meshes, the security demands of this domain are unmet by the current state-of-art offerings.
Through a series of experimental trials in a service mesh testbed, we demonstrate a need for improved security mechanisms in the state-of-art offerings of service meshes. After deriving a series of domain-conscious recommendations to improve the longevity and flexibility of service meshes, we design and implement our proof-of-concept service mesh system ServiceWatch. By leveraging a novel verification-in-the-loop scheme, we provide the capability for service meshes to provide holistic monitoring and management of the microservice deployments they host. Further, through frequent, automated rotations of security artifacts (keys, certificates, and tokens), we allow the service mesh to automatically isolate and remove microservices that violate the defined network policies of the service mesh, requiring no system administrator intervention. Extending this proof-of-concept environment, we design and implement a prototype workflow called CloudCover. CloudCover incorporates our verification-in-the-loop scheme and leverages existing tools, allowing easy adoption of these novel security mechanisms into modern systems. Under a realistic and relevant threat model, we show how our design choices and improvements are both necessary and beneficial to real-world deployments. By examining network packet captures, we provide a theoretical analysis of the scalability of these solutions in real-world networks. We further extend these trials experimentally using an independently managed and operated cloud environment to demonstrate the practical scalability of our proposed designs to large-scale software systems. Our results indicate that the overhead introduced by ServiceWatch and CloudCover are acceptable for real-world deployments. Additionally, the security capabilities provided effectively mitigate threats present within these environments.
Hara Madhav Talasila
Radiometric Calibration of Radar Depth Sounder Data ProductsWhen & Where:
Nichols Hall, Room 317 (Richard K. Moore Conference Room)
Committee Members:
Carl Leuschen, ChairJohn Paden (Co-Chair)
Christopher Allen
James Stiles
Jilu Li
Abstract
Although the Center for Remote Sensing of Ice Sheets (CReSIS) performs several radar calibration steps to produce Operation IceBridge (OIB) radar depth sounder data products, these datasets are not radiometrically calibrated and the swath array processing uses ideal (rather than measured [calibrated]) steering vectors. Any errors in the steering vectors, which describe the response of the radar as a function of arrival angle, will lead to errors in positioning and backscatter that subsequently affect estimates of basal conditions, ice thickness, and radar attenuation. Scientific applications that estimate physical characteristics of surface and subsurface targets from the backscatter are limited with the current data because it is not absolutely calibrated. Moreover, changes in instrument hardware and processing methods for OIB over the last decade affect the quality of inter-seasonal comparisons. Recent methods which interpret basal conditions and calculate radar attenuation using CReSIS OIB 2D radar depth sounder echograms are forced to use relative scattering power, rather than absolute methods.
As an active target calibration is not possible for past field seasons, a method that uses natural targets will be developed. Unsaturated natural target returns from smooth sea-ice leads or lakes are imaged in many datasets and have known scattering responses. The proposed method forms a system of linear equations with the recorded scattering signatures from these known targets, scattering signatures from crossing flight paths, and the radiometric correction terms. A least squares solution to optimize the radiometric correction terms is calculated, which minimizes the error function representing the mismatch in expected and measured scattering. The new correction terms will be used to correct the remaining mission data. The radar depth sounder data from all OIB campaigns can be reprocessed to produce absolutely calibrated echograms for the Arctic and Antarctic. A software simulator will be developed to study calibration errors and verify the calibration software. The software for processing natural targets will be made available in CReSIS’s open-source polar radar software toolbox. The OIB data will be reprocessed with new calibration terms, providing to the data user community a complete set of radiometrically calibrated radar echograms for the CReSIS OIB radar depth sounder for the first time.
Justinas Lialys
Parametrically Resonant Surface Plasmon PolaritonsWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Alessandro Salandrino, ChairKenneth Demarest
Shima Fardad
Rongqing Hui
Xinmai Yang
Abstract
The surface electromagnetic waves that propagate along a metal-dielectric or a metal-air interface are called surface plasmon polaritons (SPPs). However, as the tangential wavevector component is larger than what is permitted for the homogenous plane wave in the dielectric medium this poses a phase-matching issue. In other words, the available spatial vector in the dielectric at a given frequency is smaller than what is required by SPP to be excited. The most commonly known technique to bypass this problem is by using the Otto and Kretschmann configurations. A glass prism is used to increase the available spatial vector in dielectric/air. Other methods are evanescent field directional coupling and optical grating. Even with all these methods, it is still challenging to couple the SPPs having a large propagation constant.
A novel way to efficiently inject the power into SPPs is via temporal modulation of the dielectric adhered to the metal. The dielectric constant is modulated in time using an incident pump field. As a result of the induced changes in the dielectric constant, spatial vector shortage is eliminated. In other words, there is enough spatial vector in the dielectric to excite SPPs. As SPPs applicability is widely studied in numerous applications, this method gives a new way of evoking SPPs. Hence, this technique opens new possibilities in the surface plasmon polariton study. One of the applications that we discuss in details is the optical limiting.
Thomas Kramer
Time-Frequency Analysis of Waveform Diverse DesignsWhen & Where:
Nichols Hall, Room 317 (Richard K. Moore Conference Room)
Committee Members:
Shannon Blunt, ChairVictor Frost
James Stiles
Abstract
Waveform diversity desires to optimize the Radar waveform given the constraints and objectives of a particular task or scenario. Recent advances in electronics have significantly expanded the design space of waveforms. The resulting waveforms of various waveform diverse approaches possess complex structures which have temporal, spectral, and spatial extents. The utilization of optimization in many of these approaches results in complex signal structures that are not imagined a priori, but are instead the product of algorithms. Traditional waveform analysis using the frequency spectrum, autocorrelation, and beampatterns of waveforms provide the majority of metrics of interest. But as these new waveforms’ structure increases in complexity, and the constraints of their use tighten, further aspects of the waveform’s structure must be considered, especially the true occupancy of the waveforms in the transmission hyperspace. Time-Frequency analysis can be applied to these waveforms to better understand their behavior and to inform future design. These tools are especially useful for spectrally shaped random FM waveforms as well as spatially shaped spatial beams. Both linear and quadratic transforms are used to study the emissions in time, frequency, and space dimensions. Insight on waveform generation is observed and future design opportunities are identified.
Vincent Occhiogrosso
Development of Low-Cost Microwave and RF Modules for Compact, Fine-Resolution FMCW RadarsWhen & Where:
Nichols Hall, Room 317 (Richard K. Moore Conference Room)
Committee Members:
Christopher Allen, ChairFernando Rodriguez-Morales, (Co-Chair)
Carl Leuschen
Abstract
The Center for Remote Sensing and Integrated Systems (CReSIS) has enabled the development of several radars for measuring ice and snow depth. One of these systems is the Ultra-Wideband (UWB) Snow Radar, which operates in microwave range and can provide measurements with cm-scale vertical resolution. To date, renditions of this system demand medium to high size, weight and power (SWaP) characteristics. To facilitate a more flexible and mobile measurement setup with these systems, it became necessary to reduce the SWaP of the radar electronics. This thesis focuses on the design of several compact RF and microwave modules enabling integration of a full UWB radar system weighing < 5 lbs and consuming < 30 W of DC power. This system is suitable for operation over either 12-18 GHz or 2-8 GHz in platforms with low SWaP requirements, such as unmanned aerial systems (UAS). The modules developed as a part of this work include a VCO-based chirp generation module, downconverter modules, and a set of modules for a receiver front end, each implemented on a low-cost laminate substrate. The chirp generator uses a Phase Locked Loop (PLL) based on an architecture previously developed at CReSIS and offers a small form factor with a frequency non-linearity of 0.0013% across the operating bandwidth (12-18 GHz) using sub-millisecond pulse durations. The down-conversion modules were created to allow for system operation in the S/C frequency band (2-8 GHz) as well as the default Ku band (12-18 GHz). Additionally, an RF receiver front end was designed, which includes a microwave receiver module for de-chirping and an IF module for signal conditioning before digitization. The compactness of the receiver modules enabled the demonstration of multi-channel data acquisition without multiplexing from two different aircraft. A radar test-bed largely based on this compact system was demonstrated in the laboratory and used as part of a dual-frequency instrument for a surface-based experiment in Antarctica. The laboratory performance of the miniaturized radar is comparable to the legacy 2-8 GHz snow radar and 12-18 GHz Ku-band radar systems. The 2-8 GHz system is currently being integrated into a class-I UAS.
Tianxiao Zhang
Efficient and Effective Convolutional Neural Networks for Object Detection and RecognitionWhen & Where:
Nichols Hall, Room 246
Committee Members:
Bo Luo, ChairPrasad Kulkarni
Fengjun Li
Cuncong Zhong
Guanghui Wang
Abstract
With the development of Convolutional Neural Networks (CNNs), computer vision enters a new era and the performance of image classification, object detection, segmentation, and recognition has been significantly improved. Object detection, as one of the fundamental problems in computer vision, is a necessary component of many computer vision tasks, such as image and video understanding, object tracking, instance segmentation, etc. In object detection, we need to not only recognize all defined objects in images or videos but also localize these objects, making it difficult to perfectly realize in real-world scenarios.
In this work, we aim to improve the performance of object detection and localization by adopting more efficient and effective CNN models. (1) We propose an effective and efficient approach for real-time detection and tracking of small golf balls based on object detection and the Kalman filter. For this purpose, we have collected and labeled thousands of golf ball images to train the learning model. We also implemented several classical object detection models and compared their performance in terms of detection precision and speed. (2) To address the domain shift problem in object detection, we propose to employ generative adversarial networks (GANs) to generate new images in different domains and then concatenate the original RGB images and their corresponding GAN-generated fake images to form a 6-channel representation of the image content. (3) We propose a strategy to improve label assignment in modern object detection models. The IoU (Intersection over Union) thresholds between the pre-defined anchors and the ground truth bounding boxes are significant to the definition of the positive and negative samples. Instead of using fixed thresholds or adaptive thresholds based on statistics, we introduced the predictions into the label assignment paradigm to dynamically define positive samples and negative samples so that more high-quality samples could be selected as positive samples. The strategy reduces the discrepancy between the classification scores and the IoU scores and yields more accurate bounding boxes.
Xiangyu Chen
Toward Data Efficient Learning in Computer VisionWhen & Where:
Nichols Hall, Room 246
Committee Members:
Cuncong Zhong, ChairPrasad Kulkarni
Fengjun Li
Bo Luo
Guanghui Wang
Abstract
Deep learning leads the performance in many areas of computer vision. Deep neural networks usually require a large amount of data to train a good model with the growing number of parameters. However, collecting and labeling a large dataset is not always realistic, e.g. to recognize rare diseases in the medical field. In addition, both collecting and labeling data are labor-intensive and time-consuming. In contrast, studies show that humans can recognize new categories with even a single example, which is apparently in the opposite direction of current machine learning algorithms. Thus, data-efficient learning, where the labeled data scale is relatively small, has attracted increased attention recently. According to the key components of machine learning algorithms, data-efficient learning algorithms can also be divided into three folders, data-based, model-based, and optimization-based. In this study, we investigate two data-based models and one model-based approach.
First, to collect more data to increase data quantity. The most direct way for data-efficient learning is to generate more data to mimic data-rich scenarios. To achieve this, we propose to integrate both spatial and Discrete Cosine Transformation (DCT) based frequency representations to finetune the classifier. In addition to the quantity, another property of data is the quality to the model, different from the quality to human eyes. As language carries denser information than natural images. To mimic language, we propose to explicitly increase the input information density in the frequency domain. The goal of model-based methods in data-efficient learning is mainly to make models converge faster. After carefully examining the self-attention modules in Vision Transformers, we discover that trivial attention covers useful non-trivial attention due to its large amount. To solve this issue, we proposed to divide attention weights into trivial and non-trivial ones by thresholds and suppress the accumulated trivial attention weights. Extensive experiments have been performed to demonstrate the effectiveness of the proposed models.