Defense Notices


All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Soumya Baddham

Battling Toxicity: A Comparative Analysis of Machine Learning Models for Content Moderation

When & Where:


Eaton Hall, Room 2001B

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Hongyang Sun


Abstract

With the exponential growth of user-generated content, online platforms face unprecedented challenges in moderating toxic and harmful comments. Due to this, Automated content moderation has emerged as a critical application of machine learning, enabling platforms to ensure user safety and maintain community standards. Despite its importance, challenges such as severe class imbalance, contextual ambiguity, and the diverse nature of toxic language often compromise moderation accuracy, leading to biased classification performance.

This project presents a comparative analysis of machine learning approaches for a Multi-Label Toxic Comment Classification System using the Toxic Comment Classification dataset from Kaggle.  The study examines the performance of traditional algorithms, such as Logistic Regression, Random Forest, and XGBoost, alongside deep architectures, including Bi-LSTM, CNN-Bi-LSTM, and DistilBERT. The proposed approach utilizes word-level embeddings across all models and examines the effects of architectural enhancements, hyperparameter optimization, and advanced training strategies on model robustness and predictive accuracy.

The study emphasizes the significance of loss function optimization and threshold adjustment strategies in improving the detection of minority classes. The comparative results reveal distinct performance trade-offs across model architectures, with transformer models achieving superior contextual understanding at the cost of computational complexity. At the same time, deep learning approaches(LSTM models) offer efficiency advantages. These findings establish evidence-based guidelines for model selection in real-world content moderation systems, striking a balance between accuracy requirements and operational constraints.


Manu Chaudhary

Utilizing Quantum Computing for Solving Multidimensional Partial Differential Equations

When & Where:


Eaton Hall, Room 2001B

Committee Members:

Esam El-Araby, Chair
Perry Alexander
Tamzidul Hoque
Prasad Kulkarni
Tyrone Duncan

Abstract

Quantum computing has the potential to revolutionize computational problem-solving by leveraging the quantum mechanical phenomena of superposition and entanglement, which allows for processing a large amount of information simultaneously. This capability is significant in the numerical solution of complex and/or multidimensional partial differential equations (PDEs), which are fundamental to modeling various physical phenomena. There are currently many quantum techniques available for solving partial differential equations (PDEs), which are mainly based on variational quantum circuits. However, the existing quantum PDE solvers, particularly those based on variational quantum eigensolver (VQE) techniques, suffer from several limitations. These include low accuracy, high execution times, and low scalability on quantum simulators as well as on noisy intermediate-scale quantum (NISQ) devices, especially for multidimensional PDEs.

 In this work, we propose an efficient and scalable algorithm for solving multidimensional PDEs. We present two variants of our algorithm: the first leverages finite-difference method (FDM), classical-to-quantum (C2Q) encoding, and numerical instantiation, while the second employs FDM, C2Q, and column-by-column decomposition (CCD). Both variants are designed to enhance accuracy and scalability while reducing execution times. We have validated and evaluated our proposed concepts using a number of case studies including multidimensional Poisson equation, multidimensional heat equation, Black Scholes equation, and Navier-Stokes equation for computational fluid dynamics (CFD) achieving promising results. Our results demonstrate higher accuracy, higher scalability, and faster execution times compared to VQE-based solvers on noise-free and noisy quantum simulators from IBM. Additionally, we validated our approach on hardware emulators and actual quantum hardware, employing noise mitigation techniques. This work establishes a practical and effective approach for solving PDEs using quantum computing for engineering and scientific applications.


Alex Manley

Taming Complexity in Computer Architecture through Modern AI-Assisted Design and Education

When & Where:


Nichols Hall, Room 250 (Gemini Room)

Committee Members:

Heechul Yun, Chair
Tamzidul Hoque
Prasad Kulkarni
Mohammad Alian

Abstract

The escalating complexity inherent in modern computer architecture presents significant challenges for both professional hardware designers and students striving to gain foundational understanding. Historically, the steady improvement of computer systems was driven by transistor scaling, predictable performance increases, and relatively straightforward architectural paradigms. However, with the end of traditional scaling laws and the rise of heterogeneous and parallel architectures, designers now face unprecedented intricacies involving power management, thermal constraints, security considerations, and sophisticated software interactions. Prior tools and methodologies, often reliant on complex, command-line driven simulations, exacerbate these challenges by introducing steep learning curves, creating a critical need for more intuitive, accessible, and efficient solutions. To address these challenges, this thesis introduces two innovative, modern tools.

The first tool, SimScholar, provides an intuitive graphical user interface (GUI) built upon the widely-used gem5 simulator. SimScholar significantly simplifies the simulation process, enabling students and educators to more effectively engage with architectural concepts through a visually guided environment, both reducing complexity and enhancing conceptual understanding. Supporting SimScholar, the gem5 Extended Modules API (gEMA) offers streamlined backend integration with gem5, ensuring efficient communication, modularity, and maintainability.

The second contribution, gem5 Co-Pilot, delivers an advanced framework for architectural design space exploration (DSE). Co-Pilot integrates cycle-accurate simulation via gem5, detailed power and area modeling through McPAT, and intelligent optimization assisted by a large language model (LLM). Central to Co-Pilot is the Design Space Declarative Language (DSDL), a Python-based domain-specific language that facilitates structured, clear specification of design parameters and constraints.

Collectively, these tools constitute a comprehensive approach to taming complexity in computer architecture, offering powerful, user-friendly solutions tailored to both educational and professional settings.


Past Defense Notices

Dates

CHENYUAN ZHAO

Energy Efficient Spike-Time-Dependent Encoder Design for Neuromorphic Computing System

When & Where:


250 Nichols Hall

Committee Members:

Yang Yi, Chair
Lingjia Liu
Luke Huan
Suzanne Shontz
Yong Zeng

Abstract

Von Neumann Bottleneck, which refers to the limited throughput between the CPU and memory, has already become the major factor hindering the technical advances of computing systems. In recent years, neuromorphic systems started to gain the increasing attentions as compact and energy-efficient computing platforms. As one of the most crucial components in the neuromorphic computing systems, neural encoder transforms the stimulus (input signals) into spike trains. In this report, I will present my research work on spike-time-dependent encoding schemes and its relevant energy efficient encoders’ design. The performance comparison among rate encoding, latency encoding, and temporal encoding would be discussed in this report. The proposed neural temporal encoder allows efficient mapping of signal amplitude information into a spike time sequence that represents the input data and offers perfect recovery for band-limited stimuli. The simulation and measurement results show that the proposed temporal encoder is proven to be robust and error-tolerant. 


XIAOLI LI

Constructivism Learning: A Learning Paradigm for Transparent and Reliable Predictive Analytics

When & Where:


246 Nichols Hall

Committee Members:

Luke Huan, Chair
Victor Frost
Jerzy Grzymala-Busse
Bo Luo
Alfred Tat-Kei Ho

Abstract

With an increasing trend of adoption of machine learning in various real-world problems, the need for transparent and reliable models has become apparent. Especially in some socially consequential applications, such as medical diagnosis, credit scoring, and decision making in educational systems, it may be problematic if humans cannot understand and trust those models. To this end, in this work, we propose a novel machine learning algorithm, constructivism learning. To achieve transparency, we formalized a Bayesian nonparametric approach using sequential Dirichlet Process Mixture of prediction models to support constructivism learning. To achieve reliability, we exploit two strategies, reducing model uncertainty and increasing task construction stability by leveraging techniques in active learning and self-paced learning. 


JOSEPH ST. AMAND

Local Metric Learning

When & Where:


250 Nichols Hall

Committee Members:

Luke Huan, Chair
Prasad Kulkarni
Jim Miller
Richard Wang
Bozenna Pasik-Duncan

Abstract

Distance metrics are concerned with learning how objects are similar, and are a critical component of many machine learning algorithms such as k-nearest neighbors and kernel machines. Traditional metrics are unable to adapt to data with heterogenous interactions in the feature space. State of the art methods consider learning multiple metrics, each in some way local to a portion of the data. Selecting how the distance metrics are local to the data is done apriori, with no known best approach. 
In this proposal, we address the local metric learning scenario from three complementary perspectives. In the first direction, we consider a spatial approach, and develop an efficient Frank-Wolfe based technique to learn local distance metrics directly in a high-dimensional input space. We then consider a view-local perspective, where we associate each metric with a separate view of the data, and show how the approach naturally evolves into a multiple kernel learning problem. Finally, we propose a new function for learning a metric which is based on a newly discovered operator called the t-product, here we show that our metric is composed of multiple parts, with each portion local to different interactions in the input space. 


MARK GREBE

Domain Specific Languages for Small Embedded Systems

When & Where:


246 Nichols Hall

Committee Members:

Andy Gill, Chair
Perry Alexander
Prasad Kulkarni
Suzanne Shontz
Kyle Camarda

Abstract

Resource limited embedded systems provide a great challenge to programming using functional languages. Although we cannot program these embedded systems directly with Haskell, we show than an embedded domain specific language is able to be used to program them, providing a user friendly environment for both prototyping and full development. The Arduino line of microcontroller boards provide a versatile, low cost and popular platform for development of these resource limited systems, and we use this as the platform for our DSL research. 

First we provide a shallowly embedded domain specific language and a firmware interpreter, allowing the user to program the Arduino while tethered to a host computer. Second, we add a deeply embedded version, allowing the interpreter to run standalone from the host computer, as well as allowing us to compile the code to C and then machine code for efficient operation. Finally, we develop a method of transforming the shallowly embedded DSL syntax into the deeply embedded DSL syntax automatically.


RUBAYET SHAFIN

Performance Analysis of Parametric Channel Estimation for 3D Massive MIMO/FD-MIMO OFDM Systems

When & Where:


250 Nichols Hall

Committee Members:

Lingjia Liu, Chair
Erik Perrins
Yang Yi


Abstract

With the promise of meeting future capacity demands for mobile broadband communications, 3D massive-MIMO/Full Dimension MIMO (FD-MIMO) systems have gained much interest among the researchers in recent years. Apart from the huge spectral efficiency gain offered by the system, the reason for this great interest can also be attributed to significant reduction of latency, simplified multiple access layer, and robustness to interference. However, in order to completely extract the benefits of massive-MIMO systems, accurate channel state information is critical. In this thesis, a channel estimation method based on direction of arrival (DoA) estimation is presented for massive- MIMO OFDM systems. To be specific, the DoA is estimated using Estimation of Signal Parameter via Rotational Invariance Technique (ESPRIT) method, and the root mean square error (RMSE) of the DoA estimation is analytically characterized for the corresponding MIMO-OFDM system.


DANIEL HEIN

A New Approach for Predicting Security Vulnerability Severity in Attack Prone Software Using Architecture and Repository Mined Change Metrics

When & Where:


1 Eaton Hall

Committee Members:

Hossein Saiedian, Chair
Arvin Agah
Perry Alexander
Prasad Kulkarni
Nancy Mead

Abstract

Billions of dollars are lost every year to successful cyber attacks that are fundamentally enabled by software vulnerabilities. Modern cyber attacks increasingly threaten individuals, organizations, and governments, causing service disruption, inconvenience, and costly incident response. Given that such attacks are primarily enabled by software vulnerabilities, this work examines the efficacy of using change metrics, along with architectural burst and maintainability metrics, to predict modules and files that should be analyzed or tested further to excise vulnerabilities prior to release. 

The problem addressed by this research is the residual vulnerability problem, or vulnerabilities that evade detection and persist in released software. Many modern software projects are over a million lines of code, and composed of reused components of varying maturity. The sheer size of modern software, along with the reuse of existing open source modules, complicates the questions of where to look, and in what order to look, for residual vulnerabilities. 

Traditional code complexity metrics, along with newer frequency based churn metrics (mined from software repository change history), are selected specifically for their relevance to the residual vulnerability problem. We compare the performance of these complexity and churn metrics to architectural level change burst metrics, automatically mined from the git repositories of the Mozilla Firefox Web Browser, Apache HTTP Web Server, and the MySQL Database Server, for the purpose of predicting attack prone files and modules. 

We offer new empirical data quantifying the relationship between our selected metrics and the severity of vulnerable files and modules, assessed using severity data compiled from the NIST National Vulnerability Database, and cross-referenced to our study subjects using unique identifers defined by the Common Vulnerabilities and Exposures (CVE) vulnerability catalog. Our results show that architectural level change burst metrics can perform well in situations where more traditional complexity metrics fail as reliable estimators of vulnerability severity. In particular, results from our experiments on Apache HTTP Web Server indicate that architectural level change burst metrics show high correlation with the severity of known vulnerable modules, and do so with information directly available from the version control repository change-set (i.e., commit) history. 


CHENG GAO

Mining Incomplete Numerical Data Sets

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Arvin Agah
Bo Luo
Tyrone Duncan
Xuemin Tu

Abstract

Incomplete and numerical data are common for many application domains. There have been many approaches to handle missing data in statistical analysis and data mining. To deal with numerical data, discretization is crucial for many machine learning algorithms. However, most of the discretization algorithms cannot be applied to incomplete data sets. 

Multiple Scanning is an entropy based discretization method. Previous research shown it outperforms commonly used discretization methods: Equal Width or Equal Frequency discretization. In this work, Multiple Scanning is tested on C4.5 and MLEM2 on incomplete datasets. Results show for some data sets, the setup utilizing Multiple Scanning as preprocessing performs better, for the other data sets, C4.5 or MLEM2 should be used by themselves. Our conclusion is that there are no universal optimal solutions for all data sets. Setup should be custom-made. 


SUMIAH ALALWANI

Experiments on Incomplete Data Sets Using Modifications to Characteristic Relation

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Bo Luo


Abstract

Rough set theory is a useful approach for decision rule induction, which is applied, to large life data sets. Lower and upper approximations of concepts values are used to induce rules for incomplete data sets. In our research we will study validity of modifications suggested to characteristic relation. We discuss the implementation of modifications to characteristic relation, and the local definability of each modified set. We show that all suggested modifications sets are not locally definable except for maximal consistent blocks that are restricted to data set with “do not care” conditions. A comparative analysis was conducted for characteristic sets and modifications in terms of cardinality of lower and upper approximations of each concept and decision rules induced by each modification. In this thesis, experiments were conducted on four incomplete data sets with lost and “do not care “ conditions. LEM2 algorithm was implemented to induce certain and possible rules form the incomplete data set. To measure the classification average error rate for induced rules, ten-fold cross validation was implemented. Our results show that there is no significant difference between the qualities of rule induced from each modification.


DANIEL GOMEZ GARCIA ALVESTEGUI

Ultra-Wideband Radar for High-Throughput-Phenotyping of Wheat Canopies

When & Where:


250 Nichols Hall

Committee Members:

Carl Leuschen, Chair
Chris Allen
Ron Hui
Fernando Rodriguez-Morales
David Braaten

Abstract

Increasing the rate of crop yield is an important issue to meet projected future crop production demands. Breeding efforts are being made to rapidly improve crop yields and make them more stress-resistance. Accelerated molecular breeding techniques, in which desirable plant physical traits are selected based on genetic markers, rely on accurate and rapid methods to link plant genotypes and phenotypes. Advances in next-generation-DNA sequencing have made genotyping a fast and efficient process. In contrast, methods for characterizing physical traits remain inefficient. 
The height of wheat crop is an important trait as it may be related to yield and biomass. It is also an indicator of plant growth-stage. Recent high-throughput-phenotyping experiments have used sensing techniques to measure canopy height based on ultrasound sonar and cameras. The main drawback of these methods is that the ground topography is not directly measured. 
In contrast to current sensors, ultra-wideband radars have the potential to take distance measurements to the top of the canopy and the ground simultaneously. We propose the study of ultra-wideband radar for measuring wheat crop heights. Specifically, we propose to study the effects of canopy constituents on the ranging radar-return or impulse-response, as well as on the frequency-response. First, a numerical simulator will be developed to accurately calculate the radar response at different canopy conditions. Second, a parametric study will be performed with aforementioned simulator. Lastly, an estimation algorithm for crop canopy heights will be developed, based on the parametric study. 


ALI ABUSHAIBA

Maximum Power Point Tracking for Photvoltaic Systems Using a Discreet in Time Extremum Seeking Algorithm

When & Where:


2001B Eaton Hall

Committee Members:

Reza Ahmadi, Chair
Ken Demarest
Glenn Prescott
Alessandro Salandrino
Huazhen Fang

Abstract

Energy harvesting from solar sources in an attempt to increase efficiency has sparked interest in many communities to develop more energy harvesting applications for renewable energy topics. Advanced technical methods are required to ensure the maximum available power is harnessed from the photovoltaic (PV) system. This work proposes a new discrete-in-time extremum-seeking based technique for tracking the maximum power point of a photovoltaic array. The proposed method is a true maximum power point tracker that can be implemented with reasonable processing effort on an expensive digital controller. The approach is to study the stability analysis of the proposed method to guarantee the convergence of the algorithm. The proposed method should exhibit better performance in comparison to conventional Maximum Power Point Tracking (MPPT) methods and require less computational effort than the complex mathematical methods.