Defense Notices


All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

David Felton

Optimization and Evaluation of Physical Complementary Radar Waveforms

When & Where:


Nichols Hall, Room 129 (Apollo Auditorium)

Committee Members:

Shannon Blunt, Chair
Rachel Jarvis
Patrick McCormick
James Stiles
Zsolt Talata

Abstract

The RF spectrum is a precious, finite resource with ever-increasing demand. Consequently, the mandate to be a "good spectral neighbor" is in direct conflict with the requirements for high-performance sensing where correlation error is fundamentally limited. As such, matched-filter radar performance is often sidelobe-limited with estimation error being constrained by the time-bandwidth (TB) of the collective emission. The methods developed here seek to bridge this gap between idealized radar performance and practical utility via waveform design.    

Estimation error becomes more complex when employing pulse-agility. In doing so, range-sidelobe modulation (RSM) spreads energy across Doppler, rendering traditional methods ineffective. To address this, the gradient-based complementary-FM framework was developed to produce complementary sidelobe cancellation (CSC) after coherently combining subsets within a pulse-agile emission. In contrast to the majority of complementary signals, explored via phase-coding, these Comp-FM waveform subsets achieve CSC while preserving hardware-compatibility since they are FM (though design distortion is never completely avoided). Although Comp-FM addressed practicality via hardware amenability, CSC was localized to zero-Doppler. This work expands the Comp-FM notion to a Doppler-generalized (DG) framework, extending the cancellation condition to an arbitrary span. The same framework can likewise be employed to jointly optimize an entire coherent processing interval (CPI) to minimize RSM within the radar point-spread-function (PSF), thereby generalizing the notion of complementarity and introducing the potential for cognitive operation if sufficient scattering knowledge is available a-priori.          

Sensing with a single emitter is limited by self-inflicted error alone (e.g., clutter, sidelobes), while MIMO systems must additionally contend with the cross-responses from emitters operating concurrently (e.g., simultaneously, spatially proximate, in a shared spectrum), further degrading radar sensitivity. Now, total correlation error is dictated by the overlapping TB (i.e., how coincident are the signals) and number of operating emitters, compounding difficulty to estimate if left unaddressed. As such, the determination of "orthogonal waveforms" comprises a large portion of MIMO literature, though remains a phenomenological misnomer for pulsed emissions. Here, the notion of complementary-FM is applied to a multi-emitter context in which transmitter-amenable quasi-orthogonal subsets, occupying the same spectral band, are produced via a similar gradient-based approach. To further practicalize these MIMO-Comp-FM waveform subsets, the same "DG" approach described above, addressing the otherwise-default Doppler-induced degradation of complementary signals, is applied. In doing so, Doppler-independent separability and complementarity greatly improves estimation sensitivity for multi-emitter systems. 

This MIMO-Comp-FM framework is developed for standard matched filter processing. Coupling this framework with a "DG" form of the previously explored MIMO-MiCRFt is also investigated, illustrating the added benefit of pairing optimized subsets with similarly calibrated processing. 

Each of these methods is developed to address unique and increasingly complex sources of estimation error. All approaches are initially developed and evaluated via simulated analysis where ground-truth is known. Then, despite hardware-induced distortion being unavoidable, the MIMO-Comp-FM framework is confirmed via loopback measurements to preserve the majority of CSC that was observed in simulation. Finally, open-air demonstration of each approach validates practical utility on a radar system.


Hao Xuan

Toward an Integrated Computational Framework for Metagenomics: From Sequence Alignment to Automated Knowledge Discovery

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Cuncong Zhong, Chair
Fengjun Li
Suzanne Shontz
Hongyang Sun
Liang Xu

Abstract

Metagenomic sequencing has become a central paradigm for studying complex microbial communities and their interactions with the host, with emerging applications in clinical prediction and disease modeling. In this work, we first investigate two representative application scenarios: predicting immune checkpoint inhibitor response in non-small cell lung cancer using gut microbial signatures, and characterizing host–microbiome interactions in neonatal systems. The proposed reference-free neural network captures both compositional and functional signals without reliance on reference genomes, while the neonatal study demonstrates how environmental and genetic factors reshape microbial communities and how probiotic intervention can mitigate pathogen-induced immune activation.

These studies highlight both the promise and the inherent difficulty of metagenomic analysis: transforming raw sequencing data into clinically actionable insights remains an algorithmically fragmented and computationally intensive process. This challenge arises from two key limitations: the lack of a unified algorithmic foundation for sequence alignment and the absence of systematic approaches for selecting and organizing analytical tools. Motivated by these challenges, we present a unified computational framework for metagenomic analysis that integrates complementary algorithmic and systems-level solutions.

First, to resolve fragmentation at the alignment level, we develop the Versatile Alignment Toolkit (VAT), a unified algorithmic system for biological sequence alignment across diverse applications. VAT introduces an asymmetric multi-view k-mer indexing scheme that integrates multiple seeding strategies within a single architecture and enables dynamic seed-length adjustment via longest common prefix (LCP)–based inference without re-indexing. A flexible seed-chaining mechanism further supports diverse alignment scenarios, including collinear, rearranged, and split alignments. Combined with a hardware-efficient in-register bitonic sorting algorithm and dynamic index-loading strategy, VAT achieves high efficiency and broad applicability across read mapping, homology search, and whole-genome alignment. Second, to address the challenge of tool selection and pipeline construction, we develop SNAIL, a natural language processing system for automated recognition of bioinformatics tools from large-scale and rapidly growing scientific literature. By integrating XGBoost and Transformer-based models such as SciBERT, SNAIL enables structured extraction of analytical tools and supports automated, reproducible pipeline construction.

Together, this work establishes a unified framework that is grounded in real-world applications and addresses key bottlenecks in metagenomic analysis, enabling more efficient, scalable, and clinically actionable workflows.


Pramil Paudel

Learning Without Seeing: Privacy-Preserving and Adversarial Perspectives in Lensless Imaging

When & Where:


Eaton Hall, Room 2001B

Committee Members:

Fengjun Li, Chair
Alex Bardas
Bo Luo
Cuncong Zhong
Haiyang Chao

Abstract

Conventional computer vision relies on spatially resolved, human-interpretable images, which inherently expose sensitive information and raise privacy concerns. In this study, we explore an alternative paradigm based on lensless imaging, where scenes are captured as diffraction patterns governed by the point spread function (PSF). Although unintelligible to humans, these measurements encode structured, distributed information that remains useful for computational inference. 

We propose a unified framework for privacy-preserving vision that operates directly on lensless sensor measurements by leveraging their frequency-domain and phase-encoded properties. The framework is developed along two complementary directions. First, we enable reconstruction-free inference by exploiting the intrinsic obfuscation of lensless data. We show that semantic tasks such as classification can be performed directly on diffraction patterns using models tailored to non-local, phase-scrambled representations. We further design lensless-aware architectures and integrate them into practical pipelines, including a Swin Transformer-based steganographic framework (DiffHide) for secure and imperceptible information embedding. To assess robustness, we formalize adversarial threat models and develop defenses against learning-based reconstruction attacks, particularly GAN-driven inversion. Second, we investigate the limits of privacy by studying the reconstructability of lensless measurements without explicit knowledge of the forward model. We develop learning-based reconstruction methods that approximate the inverse mapping and analyze conditions under which sensitive information can be recovered. Our results demonstrate that lensless measurements enable effective vision tasks without reconstruction, while providing a principled framework to evaluate and mitigate privacy risks. 


Sharmila Raisa

Digital Coherent Optical System: Investigation and Monitoring

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Rongqing Hui, Chair
Morteza Hashemi
Erik Perrins
Alessandro Salandrino
Jie Han

Abstract

Coherent wavelength-division multiplexed (WDM) optical fiber systems have become the primary transmission technology for high-capacity data networks, driven by the explosive bandwidth demand of cloud computing, streaming services, and large-scale artificial intelligence training infrastructure. This dissertation investigates two fundamental aspects of digital coherent fiber optic systems under the unifying theme of source and monitoring: the design of multi-wavelength optical sources compatible with high-order coherent detection, and the leveraging of fiber Kerr-effect nonlinearity at the coherent receiver to perform physical-layer link health monitoring and to assess inherent security vulnerabilities — both achieved through digital signal processing of the received complex optical field without dedicated hardware.

We begin by addressing the multi-wavelength transmitter challenge in WDM coherent systems. Existing quantum-dot, quantum-dash, and quantum-well based optical frequency comb (OFC) sources share a common limitation: individual comb line linewidths in the tens of MHz range caused by low output power levels of 1–20 mW, making them incompatible with high-order coherent detection. We demonstrate coherent system application of a single-section InGaAsP QW Fabry-Perot laser diode with greater than 120 mW optical power at the fiber pigtail and 36.14 GHz mode spacing. The high optical power per mode produces Lorentzian equivalent linewidths below 100 kHz — compatible with 16-QAM carrier phase recovery without optical phase locking. Experimental results obtained using a commercial Ciena WaveLogic-Ai coherent transceiver demonstrate 20-channel WDM transmission over 78.3 km of standard single-mode fiber with all channels below the HD-FEC threshold of 3.8 × 10⁻³ at 30 GBaud differential-coded 16-QAM, corresponding to an aggregate capacity of 2.15 Tb/s from a single laser device.

After investigating the QW Fabry-Perot laser as a multi-wavelength source for coherent WDM transmission, we leverage the coherent receiver DSP to exploit fiber Kerr-effect nonlinearity for longitudinal power profile estimation, enabling reconstruction of the signal power distribution P(z) along the full multi-span link without dedicated hardware or traffic interruption. We propose a modified enhanced regular perturbation (ERP) method that corrects two independent physical error sources of the standard RP1 least-squares baseline: the accumulated nonlinear phase rotation, and the dispersion-mediated phase-to-intensity conversion — a second bias source not addressed by prior methods. The RP1 method produces mean absolute error (MAE) that scales quadratically with span count, growing to 1.656 dB at 10 spans and 3 dBm. The modified ERP reduces this to 0.608 dB — an improvement that grows consistently with link length, confirming increasing advantage in the long-haul regime. Extension to WDM through an XPM-aware per-channel formulation achieves MAE of 0.113–0.419 dB across 150–500 km link lengths.

In addition to its role in enabling DSP-based longitudinal power profile estimation, the fiber Kerr-effect nonlinearity is shown to give rise to an inherent physical-layer security vulnerability in coherent WDM systems. We show that an eavesdropper co-tenanting a shared fiber — transmitting a continuous-wave probe at a wavelength adjacent to the legitimate signal — can capture the XPM-induced waveform at the fiber output and apply a bidirectional gated recurrent unit neural network, trained on split-step Fourier method simulation data, to reconstruct the transmitted symbol sequence without physical fiber access and without perturbing the legitimate signal. This eavesdropping mechanism is experimentally validated using a commercial Ciena WaveLogic-Ai coherent transceiver for ASK, BPSK, QPSK, and 16-QAM modulation formats at 4.26 GBaud and 8.56 GBaud over one- and two-span 75 km fiber systems, achieving zero symbol errors under high-OSNR conditions. Noise-aware training over OSNR from 20 to 60 dB maintains symbol error rate below 10⁻² for OSNR above 25–30 dB.

Together, these three contributions demonstrate that the coherent fiber optic system is a versatile physical instrument extending well beyond its role as a data transmission medium. The coherent receiver infrastructure — deployed for high-order modulation and data recovery — simultaneously enables the high-power OFC laser to serve as a practical multi-wavelength transmitter source, and provides the complex field measurement capability through which fiber Kerr-effect nonlinearity can be exploited constructively for distributed link monitoring and, as a direct consequence, reveals an inherent physical-layer security exposure in shared fiber infrastructure. This unified perspective on the coherent system as both a transmission platform and a general-purpose measurement instrument has direct relevance to the design of spectrally efficient, self-monitoring, and physically secure optical interconnects for next-generation AI computing networks.


Arman Ghasemi

Task-Oriented Data Communication and Compression for Timely Forecasting and Control in Smart Grids

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Morteza Hashemi, Chair
Alexandru Bardas
Prasad Kulkarni
Taejoon Kim
Zsolt Talata

Abstract

Advances in sensing, communication, and intelligent control have transformed power systems into data-driven smart grids, where forecasting and intelligent decision-making are essential components. Modern smart grids include distributed energy resources (DERs), renewable generation, battery energy storage systems, and large numbers of grid-edge devices that continuously generate time-series data. At the same time, increasing renewable penetration introduces substantial uncertainty in generation, net load, and market operations, while communication networks impose bandwidth, latency, and reliability constraints on timely data delivery. This dissertation addresses how time-series forecasting, data compression, and task-oriented wireless communication can be jointly designed for smart grid applications.

First, we study weather-aware distributed energy management in prosumer-centric microgrids and show that incorporating day-ahead weather information into decision-making improves battery dispatch and reduces the impact of renewable uncertainty. Second, we introduce forecasting-aware energy management in both wholesale and retail electricity markets, highlighting how renewable generation forecasting affects pricing, scheduling, and uncertainty mitigation. Third, we develop and evaluate deep learning methods for renewable generation forecasting, showing that Transformer-based models outperform recurrent baselines such as RNN and LSTM for wind and solar prediction tasks.

Building on this forecasting foundation, we develop a communication-efficient forecasting framework in which high-dimensional smart grid measurements are compressed into low-dimensional latent representations before transmission. This framework is extended into a task-oriented communication system that jointly optimizes data relevance and information timeliness, so that the receiver obtains compressed updates that remain useful for downstream forecasting tasks. Finally, we extend this framework to a distributed multi-node uplink setting, where multiple grid sensors share a bandwidth-limited channel, and develop scheduling policy that improves both the timeliness and task-relevance of received updates.


Past Defense Notices

Dates

Tong Xu

Real-time DSP-enabled digital subcarrier cross-connect (DSXC) for optical communication systems and networks

When & Where:


246 Nichols Hall

Committee Members:

Ron Hui, Chair
Christopher Allen
Esam Eldin Aly
Erik Perrins
Jie Han

Abstract

Elastic optical networking (EON) is intended to offer flexible channel wavelength granularity to meet the requirement of high spectral efficiency (SE) in today’s optical networks. However, optical cross-connects (OXC) and switches based on optical wavelength division multiplexing (WDM) are not flexible enough due to the coarse bandwidth granularity imposed by optical filtering. Thus, OXC may not meet the requirements of many applications which require finer bandwidth granularities than that carried by an entire wavelength channel. 

 In order to achieve highly flexible and fine enough bandwidth granularities, electrical digital subcarrier cross-connect (DSXC) can be utilized in EON. As presented in this thesis, my research work focuses on the investigation and implementation of real-time digital signal processing (DSP) enabled DSXC which can dynamically assign both bandwidth and power to each individual sub-wavelength channel, known as subcarrier. This DSXC is based on digital sub-carrier multiplexing (DSCM), which is a frequency division multiplexing (FDM) technique that multiplexes a large number of digitally created subcarriers on each optical wavelength. Compared with OXC based on optical WDM, DSXC based on DSCM has much finer bandwidth granularities and flexibilities for dynamic bandwidth allocation. 

Based on a field programmable gate array (FPGA) hardware platform, we have designed and implemented a real-time DSP enabled DSXC which uses Nyquist FDM as the multiplexing scheme. For the first time, we demonstrated resampling filters for channel selection and frequency translation, which enabled real-time DSXC. This circuit-based DSXC supports flexible and fine data-rate subcarrier channel granularities, offering a low latency data plane, transparency to modulation formats, and the capability of compensating transmission impairments in the digital domain. The experimentally demonstrated 8×8 DSXC makes use of a Virtex-7 FPGA platform, which supports any-to-any switching of eight subcarrier channels with mixed modulation formats and data rates. Digital resampling filters, which enable frequency selections and translations of multiple subcarrier channels, have much lower DSP complexity and reduced FPGA resources requirements (DSP slices used in FPGA) in comparison to the traditional technique based on I/Q mixing and filtering.

We have also investigated the feasibility of using the distributed arithmetic (DA) architecture for real-time DSXC to completely eliminate the need of DSP slices in FPGA implementation. For the first time, we experimentally demonstrated the implementation of real-time frequency translation and channel selection based on the DA architecture in the same FPGA platform. Compared with resampling filters that leverage multipliers, the DA-based approach eliminates the need of DSP slices in the FPGA implementation and significantly reduces the hardware cost. In addition, by requiring the time of only a few clock cycles, a DA-based resampling filter is significantly faster when compared to a conventional FIR filter whose overall latency is proportional to the filter order. The DA-based DSXC is, therefore, able to achieve not only the improved spectral efficiency, programmability of multiple orthogonal subcarrier channels, and low hardware resources requirements, but also much reduced cross-connection latency when implemented in a real-time DSP hardware platform. This reduced latency of cross-connect switching can be critically important for time-sensitive applications such as 5G mobile fronthaul, cloud radio access network (C-RAN), cloud-based robot control, tele-surgery and network gaming.


Levi Goodman

Dual Mode W-Band Radar for Range Finding, Static Clutter Suppression & Moving Target Detection

When & Where:


250 Nichols Hall

Committee Members:

Christopher Allen, Chair
Shannon Blunt
James Stiles


Abstract

Many radar applications today require accurate, real-time, unambiguous measurement of target range and radial velocity.  Obstacles that frequently prevent target detection are the presence of noise and the overwhelming backscatter from other objects, referred to as clutter.

In this thesis, a method of static clutter suppression is proposed to increase detectability of moving targets in high clutter environments.  An experimental dual-purpose, single-mode, monostatic FMCW radar, operating at 108 GHz, is used to map the range of stationary targets and determine range and velocity of moving targets.  By transmitting a triangular waveform, which consists of alternating upchirps and downchirps, the received echo signals can be separated into two complementary data sets, an upchirp data set and a downchirp data set.  In one data set, the return signals from moving targets are spectrally isolated (separated in frequency) from static clutter return signals.  The static clutter signals in that first data set are then used to suppress the static clutter in the second data set, greatly improving detectability of moving targets.  Once the moving target signals are recovered from each data set, they are then used to solve for target range and velocity simultaneously.

The moving target of interest for tests performed was a reusable paintball (reball).  Reball range and velocity were accurately measured at distances up to 5 meters and at speeds greater than 90 m/s (200 mph) with a deceleration of approximately 0.155 m/s/ms (meters per second per millisecond).  Static clutter suppression of up to 25 dB was achieved, while moving target signals only suffered a loss of about 3 dB.

 


Ruoting Zheng

Algorithms for Computing Maximal Consistent Blocks

When & Where:


2001 B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Bo Luo


Abstract

Rough set theory is a tool to deal with uncertain and incomplete data. It has been successfully used in classification, machine learning and automated knowledge acquisition. A maximal consistent block defined using rough set theory, is used for rule acquisition.

Maximal consistent block technique is applied to acquire knowledge in incomplete data sets by analyzing the structure of a similarity class. 

The main objective of this project is to implement and compare the algorithms for computing the maximal consistent blocks. The brute force method, recursive method and hierarchical method were designed for the data sets with missing attribute values interpreted only as “do not care” conditions. In this project, we extend these algorithms so they can be applied to arbitrary interpretations of missing attribute values, and an approach for computing maximal consistent blocks on the data sets with lost values is introduced in this project. Besides, we found that the brute force method and recursive method have problems dealing with the data sets for which characteristic sets are not transitive, so the limitations of the algorithms and a simplified recursive method are provided in the project as well.


Hao Xue

Trust and Credibility in Online Social Networks

When & Where:


246 Nichols Hall

Committee Members:

Fengjun Li, Chair
Prasad Kulkarni
Bo Luo
Cuncong Zhong
Mei Liu

Abstract

Increasing portions of people's social and communicative activities now take place in the digital world. The growth and popularity of online social networks (OSNs) have tremendously facilitate the online interaction and information exchange. Not only normal users benefit from OSNs as more people now rely online information for news, opinions, and social networking, but also companies and business owners who utilize OSNs as platforms for gathering feedback and marketing activities. As OSNs enable people to communicate more effectively, a large volume of user  generated content (UGC) is produced daily. However, the freedom and ease of of publishing information online has made these systems no longer the sources of reliable information. Not only does biased and misleading information exist, financial incentives drive individual and professional spammers to insert deceptive content and promote harmful information, which jeopardizes the ecosystems of OSNs.
In this dissertation, we present our work of measuring the credibility of information and detect content polluters in OSNs. Firstly, we assume that review spammers spend less effort in maintain social connections and propose to utilize the social relationships and rating deviations to assist the computation of trustworthiness of users. Compared to numeric ratings, textual content contains richer information about the actual opinion of a user toward a target. Thus, we propose a content-based trust propagation framework by extracting the opinions expressed in review content. In addition, we discover that the surrounding network around a user could also provide valuable information about the user himself. Lastly, we study the problem of detecting social bots by utilizing the characteristics of surrounding neighborhood networks.


Casey Sader

Taming WOLF: Building a More Functional and User-Friendly Framework

When & Where:


2001 B Eaton Hall

Committee Members:

Michael Branicky , Chair
Bo Luo
Suzanne Shontz


Abstract

Machine learning is all about automation. Many tools have been created to help data scientists automate repeated tasks and train models. These tools require varying levels of user experience to be used effectively. The ``machine learning WOrk fLow management Framework" (WOLF) aims to automate the machine learning pipeline. One of its key uses is to discover which machine learning model and hyper-parameters are the best configuration for a dataset. In this project, features were explored that could be added to make WOLF behave as a full pipeline in order to be helpful for novice and experienced data scientists alike. One feature to make WOLF more accessible is a website version that can be accessed from anywhere and make using WOLF much more intuitive. To keep WOLF aligned with the most recent trends and models, the ability to train a neural network using the TensorFlow framework and Keras library were added. This project also introduced the ability to pickle and save trained models. Designing the option for using the models to make predictions within the WOLF framework on another collection of data is a fundamental side-effect of saving the models. Understanding how the model makes predictions is a beneficial component of machine learning. This project aids in that understanding by calculating and reporting the relative importance of the dataset features for the given model. Incorporating all these additions to WOLF makes it a more functional and user-friendly framework for machine learning tasks.

 


Charles Mohr

Multi-Objective Optimization of FM Noise Waveforms via Generalized Frequency Template Error Metrics

When & Where:


129 Nichols Hall

Committee Members:

Shannon Blunt, Chair
Christopher Allen
James Stiles


Abstract

FM noise waveforms have been experimentally demonstrated to achieve high time bandwidth products and low autocorrelation sidelobes while achieving acceptable spectral containment in physical implementation. Still, it may be necessary to further reduce sidelobe levels for detection or improve spectral containment in the face of growing spectral use. The Frequency Template Error (FTE) and the Logarithmic Frequency Template Error (Log-FTE) metrics were conceived as means to achieve FM noise waveforms with good spectral containment and good autocorrelation sidelobes. In practice, FTE based waveform optimizations have been found to produce better autocorrelation responses at the expense of spectral containment while Log-FTE optimizations achieve excellent spectral containment and interference rejection at the expense of autocorrelation sidelobe levels. In this work, the notion of the FTE and Log-FTE metrics are considered as subsets of a broader class of frequency domain metrics collectively termed as the Generalized Frequency Template Error (GFTE). In doing so, many different P-norm based variations of the FTE and Log-FTE cost functions are extensively examined and applied via gradient descent methods to optimize polyphase-coded FM (PCFM) waveforms. The performance of the different P-norm variations of the FTE and Log-FTE cost functions are compared amongst themselves, against each other, and relative to a previous FM noise waveform design approach called Pseudo-Random Optimized FM (PRO-FM). They are evaluated in terms of their autocorrelation sidelobes, spectral containment, and their ability to realize spectral notches within the 3 dB bandwidth for the purpose of interference rejection. These comparisons are performed in both simulation and experimentally in loopback where it was found that P-norm values of 2 tend to provide the best optimization performance for both the FTE and Log-FTE optimizations except in the case of the Log-FTE optimization of a notched spectral template where a P-norm value of 3 provides the best results. In general, the FTE and Log-FTE cost functions as subsets of the GFTE provide diverse means to optimize physically robust FM noise waveforms while emphasizing different performance criteria in terms of autocorrelation sidelobes, spectral containment, and interference rejection.


Rui Cao

How good Are Probabilistic Approximations for Rule Induction from Data with Missing Attribute Values

When & Where:


246 Nichols Hall

Committee Members:

Jerzy Grzymala-Busse , Chair
Guanghui Wang
Cuncong Zhong


Abstract

In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, may be incomplete. The idea of the probabilistic approximation, has been used for many years in variable precision rough set models and similar approaches to uncertainty. The objective of this project is to test whether proper probabilistic approximations are better than concept lower and upper approximations. In this project, experiments were conducted on six incomplete data sets with lost values. We implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Hold-out validation was carried out and the error rate was used as the criterion for comparison. 


Lokesh Kaki

An Automatic Image Stitching Software with Customizable Parameters and a Graphical User Interface

When & Where:


2001 B Eaton Hall

Committee Members:

Richard Wang, Chair
Esam El-Araby
Jerzy Grzymala-Busse


Abstract

Image stitching is one of the most widely used Computer Vision algorithms with a broad range of applications, such as image stabilization, high-resolution photomosaics, object insertion, 3D image reconstruction, and satellite imaging. The process of extracting image features from each input image,  determining the image matches, and then estimating the homography for each matched image is the necessary procedure for most of the feature-based image stitching techniques. In recent years, several state-of-the-art techniques like scale-invariant feature transform (SIFT), random sample consensus (RANSAC), and direct linear transformation (DLT) have been proposed for feature detection, extraction, matching, and homography estimation. However, using these algorithms with fixed parameters does not usually work well in creating seamless, natural-looking panoramas. The set of parameter values which work best for specific images may not work equally well for another set of images taken by a different camera or in varied conditions. Hence, the parameter tuning is as important as choosing the right set of algorithms for the efficient performance of any image stitching algorithm.

In this project, a graphical user interface is designed and programmed to tune a total of 32 parameters, including some of the basic ones such as straitening, cropping, setting the maximum output image size, and setting the focal length.  It also contains several advanced parameters like specifying the number of RANSAC iterations, RANSAC inlier threshold, extrema threshold, Gaussian window size, etc. The image stitching algorithm used in this project comprises of SIFT, DLT, RANSAC, warping, straightening, bundle adjustment, and blending techniques. Once the given images are stitched together, the output image can be further analyzed inside the user interface by clicking on any particular point. Then, it returns the corresponding input image, which contributed to the selected point, and its GPS coordinates, altitude, and camera focal length given by its metadata. The developed software has been successfully tested on various diverse datasets, and the customized parameters with corresponding results, as well as timer logs are tabulated in this report. The software is built for both Windows and Linux operating systems as part of this project.

 


Mohammad Isyroqi Fathan

Comparative Study on Polyp Localization and Classification on Colonoscopy Video

When & Where:


250 Nichols Hall

Committee Members:

Guanghui Wang, Chair
Bo Luo
James Miller


Abstract

Colorectal cancer is one of the most common types of cancer with a high mortality rate. It typically develops from small clumps of benign cells called polyp. The adenomatous polyp has a higher chance of developing into cancer compared to the hyperplastic polyp. Colonoscopy is the preferred procedure for colorectal cancer screening and to minimize its risk by performing a biopsy on found polyps. Thus, a good polyp detection model can assist physicians and increase the effectiveness of colonoscopy. Several models using handcrafted features and deep learning approaches have been proposed for the polyp detection task.  

In this study, we compare the performances of the previous state-of-the-art general object detection models for polyp detection and classification (into adenomatous and hyperplastic class).  Specifically, we compare the performances of FasterRCNN, SSD, YOLOv3, RefineDet, RetinaNet, and FasterRCNN with DetNet backbone. This comparative study serves as an initial analysis of the effectiveness of these models and to choose a base model that we will improve further for polyp detection.


Lei Wang

I Know What You Type on Your Phone: Keystroke Inference on Android Device Using Deep Learning

When & Where:


246 Nichols Hall

Committee Members:

Bo Luo, Chair
Fengjun Li
Guanghui Wang


Abstract

Given a list of smartphone sensor readings, such as accelerometer, gyroscope and light sensor, is there enough information present to predict a user’s input without access to either the raw text or keyboard log? With the increasing usage of smartphones as personal devices to access sensitive information on-the-go has put user privacy at risk. As the technology advances rapidly, smartphones now equip multiple sensors to measure user motion, temperature and brightness to provide constant feedback to applications in order to receive accurate and current weather forecast, GPS information and so on. In the ecosystem of Android, sensor reading can be accessed without user permissions and this makes Android devices vulnerable to various side-channel attacks.

In this thesis, we first create a native Android app to collect approximately 20700 keypresses from 30 volunteers. The text used for the data collection is carefully selected based on the bigram analysis we run on over 1.3 million tweets. We then present two approaches (single key press and bigram) for feature extraction, those features are constructed using accelerometer, gyroscope and light sensor readings. A deep neural network with four hidden layers is proposed as the baseline for this work, which achieves an accuracy of 47% using categorical cross entropy as the accuracy metric. A multi-view model then is proposed in the later work and multiple views are extracted and performance of the combination of each view is compared for analysis.