Defense Notices


All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

David Felton

Optimization and Evaluation of Physical Complementary Radar Waveforms

When & Where:


Nichols Hall, Room 129 (Apollo Auditorium)

Committee Members:

Shannon Blunt, Chair
Rachel Jarvis
Patrick McCormick
James Stiles
Zsolt Talata

Abstract

The RF spectrum is a precious, finite resource with ever-increasing demand. Consequently, the mandate to be a "good spectral neighbor" is in direct conflict with the requirements for high-performance sensing where correlation error is fundamentally limited. As such, matched-filter radar performance is often sidelobe-limited with estimation error being constrained by the time-bandwidth (TB) of the collective emission. The methods developed here seek to bridge this gap between idealized radar performance and practical utility via waveform design.    

Estimation error becomes more complex when employing pulse-agility. In doing so, range-sidelobe modulation (RSM) spreads energy across Doppler, rendering traditional methods ineffective. To address this, the gradient-based complementary-FM framework was developed to produce complementary sidelobe cancellation (CSC) after coherently combining subsets within a pulse-agile emission. In contrast to the majority of complementary signals, explored via phase-coding, these Comp-FM waveform subsets achieve CSC while preserving hardware-compatibility since they are FM (though design distortion is never completely avoided). Although Comp-FM addressed practicality via hardware amenability, CSC was localized to zero-Doppler. This work expands the Comp-FM notion to a Doppler-generalized (DG) framework, extending the cancellation condition to an arbitrary span. The same framework can likewise be employed to jointly optimize an entire coherent processing interval (CPI) to minimize RSM within the radar point-spread-function (PSF), thereby generalizing the notion of complementarity and introducing the potential for cognitive operation if sufficient scattering knowledge is available a-priori.          

Sensing with a single emitter is limited by self-inflicted error alone (e.g., clutter, sidelobes), while MIMO systems must additionally contend with the cross-responses from emitters operating concurrently (e.g., simultaneously, spatially proximate, in a shared spectrum), further degrading radar sensitivity. Now, total correlation error is dictated by the overlapping TB (i.e., how coincident are the signals) and number of operating emitters, compounding difficulty to estimate if left unaddressed. As such, the determination of "orthogonal waveforms" comprises a large portion of MIMO literature, though remains a phenomenological misnomer for pulsed emissions. Here, the notion of complementary-FM is applied to a multi-emitter context in which transmitter-amenable quasi-orthogonal subsets, occupying the same spectral band, are produced via a similar gradient-based approach. To further practicalize these MIMO-Comp-FM waveform subsets, the same "DG" approach described above, addressing the otherwise-default Doppler-induced degradation of complementary signals, is applied. In doing so, Doppler-independent separability and complementarity greatly improves estimation sensitivity for multi-emitter systems. 

This MIMO-Comp-FM framework is developed for standard matched filter processing. Coupling this framework with a "DG" form of the previously explored MIMO-MiCRFt is also investigated, illustrating the added benefit of pairing optimized subsets with similarly calibrated processing. 

Each of these methods is developed to address unique and increasingly complex sources of estimation error. All approaches are initially developed and evaluated via simulated analysis where ground-truth is known. Then, despite hardware-induced distortion being unavoidable, the MIMO-Comp-FM framework is confirmed via loopback measurements to preserve the majority of CSC that was observed in simulation. Finally, open-air demonstration of each approach validates practical utility on a radar system.


Hao Xuan

Toward an Integrated Computational Framework for Metagenomics: From Sequence Alignment to Automated Knowledge Discovery

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Cuncong Zhong, Chair
Fengjun Li
Suzanne Shontz
Hongyang Sun
Liang Xu

Abstract

Metagenomic sequencing has become a central paradigm for studying complex microbial communities and their interactions with the host, with emerging applications in clinical prediction and disease modeling. In this work, we first investigate two representative application scenarios: predicting immune checkpoint inhibitor response in non-small cell lung cancer using gut microbial signatures, and characterizing host–microbiome interactions in neonatal systems. The proposed reference-free neural network captures both compositional and functional signals without reliance on reference genomes, while the neonatal study demonstrates how environmental and genetic factors reshape microbial communities and how probiotic intervention can mitigate pathogen-induced immune activation.

These studies highlight both the promise and the inherent difficulty of metagenomic analysis: transforming raw sequencing data into clinically actionable insights remains an algorithmically fragmented and computationally intensive process. This challenge arises from two key limitations: the lack of a unified algorithmic foundation for sequence alignment and the absence of systematic approaches for selecting and organizing analytical tools. Motivated by these challenges, we present a unified computational framework for metagenomic analysis that integrates complementary algorithmic and systems-level solutions.

First, to resolve fragmentation at the alignment level, we develop the Versatile Alignment Toolkit (VAT), a unified algorithmic system for biological sequence alignment across diverse applications. VAT introduces an asymmetric multi-view k-mer indexing scheme that integrates multiple seeding strategies within a single architecture and enables dynamic seed-length adjustment via longest common prefix (LCP)–based inference without re-indexing. A flexible seed-chaining mechanism further supports diverse alignment scenarios, including collinear, rearranged, and split alignments. Combined with a hardware-efficient in-register bitonic sorting algorithm and dynamic index-loading strategy, VAT achieves high efficiency and broad applicability across read mapping, homology search, and whole-genome alignment. Second, to address the challenge of tool selection and pipeline construction, we develop SNAIL, a natural language processing system for automated recognition of bioinformatics tools from large-scale and rapidly growing scientific literature. By integrating XGBoost and Transformer-based models such as SciBERT, SNAIL enables structured extraction of analytical tools and supports automated, reproducible pipeline construction.

Together, this work establishes a unified framework that is grounded in real-world applications and addresses key bottlenecks in metagenomic analysis, enabling more efficient, scalable, and clinically actionable workflows.


Pramil Paudel

Learning Without Seeing: Privacy-Preserving and Adversarial Perspectives in Lensless Imaging

When & Where:


Eaton Hall, Room 2001B

Committee Members:

Fengjun Li, Chair
Alex Bardas
Bo Luo
Cuncong Zhong
Haiyang Chao

Abstract

Conventional computer vision relies on spatially resolved, human-interpretable images, which inherently expose sensitive information and raise privacy concerns. In this study, we explore an alternative paradigm based on lensless imaging, where scenes are captured as diffraction patterns governed by the point spread function (PSF). Although unintelligible to humans, these measurements encode structured, distributed information that remains useful for computational inference. 

We propose a unified framework for privacy-preserving vision that operates directly on lensless sensor measurements by leveraging their frequency-domain and phase-encoded properties. The framework is developed along two complementary directions. First, we enable reconstruction-free inference by exploiting the intrinsic obfuscation of lensless data. We show that semantic tasks such as classification can be performed directly on diffraction patterns using models tailored to non-local, phase-scrambled representations. We further design lensless-aware architectures and integrate them into practical pipelines, including a Swin Transformer-based steganographic framework (DiffHide) for secure and imperceptible information embedding. To assess robustness, we formalize adversarial threat models and develop defenses against learning-based reconstruction attacks, particularly GAN-driven inversion. Second, we investigate the limits of privacy by studying the reconstructability of lensless measurements without explicit knowledge of the forward model. We develop learning-based reconstruction methods that approximate the inverse mapping and analyze conditions under which sensitive information can be recovered. Our results demonstrate that lensless measurements enable effective vision tasks without reconstruction, while providing a principled framework to evaluate and mitigate privacy risks. 


Past Defense Notices

Dates

FARHAD MAHMOOD

Modeling and Analysis of Energy Efficiency in Wireless Handset Transceiver Systems

When & Where:


250 Nichols Hall

Committee Members:

Erik Perrins, Chair
Lingjia Liu
Shannon Blunt
Victor Frost
Bozenna Pasik-Duncan

Abstract

As it is becoming a significant part of our daily life, wireless mobile handsets have become faster and smarter. One of the main remaining requirement by users is to have a longer lasting wireless cellular devices. Many techniques have been used to increase the capacity of the battery (Ampere per Hour), but that increases the safety concern. 
Instead, it is better to have mobile handsets that consume less energy i.e increase energy efficiency. Therefore, in this research proposal, we study and analyze the radio 
frequency(RF) transceiver energy consumption, which is the largest energy consumed in the cellular device. We consider a model of large number of parameters in order to make it more realistic. First a transmitter energy of single antenna device is considered for a fixed target probability of error in the receiver for multilevel quadratic amplitude modulations (MQAM). It will be found that the power amplifier (PA) consumes the highest portion of transceiver energy due to the low efficiency of the PA.
Furthermore, when MQAM and raised cosine filter are used, the impact of peak to average ratio (PAR) on PA becomes another source of energy wasting in the PA. This issue is analyzed in this research proposal with a number of promising solutions. This analysis of energy consumption for single antenna devices will help us analyze the energy consumption of multiple antennas devices. In this regard, we discuss the energy efficiency of multiple input multiple output (MIMO) antenna with known channel state information (CSI) at the transmitter. However, the study of energy efficiency of MIMO without CSI using space time coding will be our next step. 


THEODORE LINDSEY

Interesting Rule Induction Module: Adding Support for Unknown Attribute Values

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Bo Luo
Prasad Kulkarni


Abstract

IRIM (Interesting Rule Induction Module) is a rule induction system designed to induce particularly strong, simple rule sets. Additionally, IRIM does not require prior discretization of numerical attribute values. IRIM does not necessarily produce consistent rules that fully describe the target concepts, however, the rules induced by IRIM often lead to novel revelations of hidden relationships in a dataset. In this paper, we attempt to extend the IRIM system to be able to handle missing attribute values (in particular, lost and do-not-care attribute values) more thoroughly than ignoring the cases that they belong to. Further, we include an implementation of IRIM in the modern programming language Python that has been written for easy inclusion in within a Python data mining package or library. The provided implementation makes use of the Pandas module which is built on top of a C back end for quick performance relative to the performance normally found with Python. 

 

 


Sathya Mahadevan

Implementation of ID3 for Data Stored in Multiple SQL Databases

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Man Kong
Prasad Kulkarni


Abstract

Data classification is a methodology of data mining used to retrieve meaningful information from data. A model is built from the input training set which is later used to classify new observations. One of the most widely used models is a decision tree which uses a tree like structure to list all possible outcomes. Decision trees are preferred for their simple structure, requiring little effort for data preparation and easy interpretation. This project implements ID3, an algorithm for building the decision tree using information gain. The decision tree is converted to a set of rules and the error rate is calculated using the test dataset. The dataset is usually stored in a relational database in the form tables. In practice, it might be desired that data be stored across multiple databases. In such scenarios, retrieving and coordinating data from the databases could be a challenging task. This project provides the implementation of ID3 algorithm with the convenience of reading data stored at multiple data sources.


SATHYA MAHADEVAN

Implementation of ID3 for Data Stored in Multiple SQL Databases

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Man Kong
Prasad Kulkarni


Abstract

Data classification is a methodology of data mining used to retrieve meaningful information from data. A model is built from the input training set which is later used to classify new observations. One of the most widely used models is a decision tree which uses a tree like structure to list all possible outcomes. Decision trees are preferred for their simple structure, requiring little effort for data preparation and easy interpretation. This project implements ID3, an algorithm for building the decision tree using information gain. The decision tree is converted to a set of rules and the error rate is calculated using the test dataset. The dataset is usually stored in a relational database in the form tables. In practice, it might be desired that data be stored across multiple databases. In such scenarios, retrieving and coordinating data from the databases could be a challenging task. This project provides the implementation of ID3 algorithm with the convenience of reading data stored at multiple data sources.


CHAO LAN

Inequity Coefficient and Fair Transfer Learning

When & Where:


250 Nichols Hall

Committee Members:

Luke Huan, Chair
Lingjia Liu
Bo Luo
Xintao Wu
Jin Feng

Abstract

Fair machine learning is an emerging and urgent research topic that aims to avoid discriminatory predictions against protected groups of people in real-world decision makings. This project aims to advance the field in two dimensions. First, we propose a more practical measurement of individual fairness called inequity coefficient, which integrates the current individual fairness framework that lacks of practice and the current situation testing practice that lacks of principle. We develop certain foundations of the measurement and present its practice. Second, we propose a first study of fairness in the context of transfer learning, with focuses on the hypothesis transfer and multi-task settings over two tasks. We illustrate a new challenge called discriminatory transfer, where discrimination is enforced by traditional task relatedness constraints that only aim to find accurate hypotheses. We propose a set of new algorithms that aim to avoid discriminatory transfer across tasks or promote fairness within each task.


Chao Lan

Inequity Coefficient and Fair Transfer Learning

When & Where:


250 Nichols Hall

Committee Members:

Luke Huan, Chair
Lingjia Liu
Bo Luo
Xintao Wu
Jin Feng

Abstract

Fair machine learning is an emerging and urgent research topic that aims to avoid discriminatory predictions against protected groups of people in real-world decision makings. This project aims to advance the field in two dimensions. First, we propose a more practical measurement of individual fairness called inequity coefficient, which integrates the current individual fairness framework that lacks of practice and the current situation testing practice that lacks of principle. We develop certain foundations of the measurement and present its practice. Second, we propose a first study of fairness in the context of transfer learning, with focuses on the hypothesis transfer and multi-task settings over two tasks. We illustrate a new challenge called discriminatory transfer, where discrimination is enforced by traditional task relatedness constraints that only aim to find accurate hypotheses. We propose a set of new algorithms that aim to avoid discriminatory transfer across tasks or promote fairness within each task.


ROHIT BANERJEE

Extraction and Analysis of Amazon Reviews

When & Where:


246 Nichols Hall

Committee Members:

Fengjun Li, Chair
Man Kong
Bo Luo


Abstract

Amazon.com is one of the largest online retail stores in the world. Besides selling millions of product on their website, Amazon provides a variety of Web services including Amazon Review and Recommendation System. Users are encouraged to write product reviews to help others to understand products’ features and make purchase decisions. However, product reviews, as a type of user generated content (UGC), suffer from quality and trust problems. To help evaluating the quality of reviews, Amazon also provides the users with the helpfulness vote feature so that a user can support a review that he considers helpful. In this project we aim to study the relation between helpfulness votes and the ranks of the reviews. In particular, we are looking for answers to questions such as “how does the helpfulness votes affect review ranks?” and “how review rank and its presentation mechanism affect people’s voting behavior?” To investigate on these questions, we built a crawler to collect reviews and votes of reviews from Amazon at a daily basis. Then, we conducted an analysis on a dataset with over 50,000 Amazon reviews to identify the voting patterns and their impact on the review ranks. Our results show that there exists a positive correlation between the review ranks and the helpfulness votes. 


Rohit Banerjee

Extraction and Analysis of Amazon Reviews

When & Where:


246 Nichols Hall

Committee Members:

Fengjun Li, Chair
Man Kong
Bo Luo


Abstract

Amazon.com is one of the largest online retail stores in the world. Besides selling millions of product on their website, Amazon provides a variety of Web services including Amazon Review and Recommendation System. Users are encouraged to write product reviews to help others to understand products’ features and make purchase decisions. However, product reviews, as a type of user generated content (UGC), suffer from quality and trust problems. To help evaluating the quality of reviews, Amazon also provides the users with the helpfulness vote feature so that a user can support a review that he considers helpful. In this project we aim to study the relation between helpfulness votes and the ranks of the reviews. In particular, we are looking for answers to questions such as “how does the helpfulness votes affect review ranks?” and “how review rank and its presentation mechanism affect people’s voting behavior?” To investigate on these questions, we built a crawler to collect reviews and votes of reviews from Amazon at a daily basis. Then, we conducted an analysis on a dataset with over 50,000 Amazon reviews to identify the voting patterns and their impact on the review ranks. Our results show that there exists a positive correlation between the review ranks and the helpfulness votes.​


BIJAL PARIKH

A Comparison of Tolerance Relation and Valued Tolerance Relation for Incomplete Datasets

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Bo Luo


Abstract

Rough set theory is a popular approach for decision rule induction. However, it requires the objects in the information system to be completely described. Many real life data sets are incomplete, so we cannot apply directly rough set theory for rule induction. This project implements and compares two generalizations of rough set theory, used for rule induction from incomplete data: Tolerance Relation and Valued Tolerance Relation. A comparative analysis is conducted for the lower and upper approximations and decision rules induced by the two methods. Our experiments show that Valued Tolerance Relation provides better approximations than Simple Tolerance Relation when the percentage of missing attribute values in the datasets is high.


Bijal Parikh

A Comparison of Tolerance Relation and Valued Tolerance Relation for Incomplete Datasets

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Bo Luo


Abstract

Rough set theory is a popular approach for decision rule induction. However, it requires the objects in the information system to be completely described. Many real life data sets are incomplete, so we cannot apply directly rough set theory for rule induction. This project implements and compares two generalizations of rough set theory, used for rule induction from incomplete data: Tolerance Relation and Valued Tolerance Relation. A comparative analysis is conducted for the lower and upper approximations and decision rules induced by the two methods. Our experiments show that Valued Tolerance Relation provides better approximations than Simple Tolerance Relation when the percentage of missing attribute values in the datasets is high.