Defense Notices


All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Andrew Riachi

An Investigation Into The Memory Consumption of Web Browsers and A Memory Profiling Tool Using Linux Smaps

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Prasad Kulkarni, Chair
Perry Alexander
Drew Davidson
Heechul Yun

Abstract

Web browsers are notorious for consuming large amounts of memory. Yet, they have become the dominant framework for writing GUIs because the web languages are ergonomic for programmers and have a cross-platform reach. These benefits are so enticing that even a large portion of mobile apps, which have to run on resource-constrained devices, are running a web browser under the hood. Therefore, it is important to keep the memory consumption of web browsers as low as practicable.

In this thesis, we investigate the memory consumption of web browsers, in particular, compared to applications written in native GUI frameworks. We introduce smaps-profiler, a tool to profile the overall memory consumption of Linux applications that can report memory usage other profilers simply do not measure. Using this tool, we conduct experiments which suggest that most of the extra memory usage compared to native applications could be due the size of the web browser program itself. We discuss our experiments and findings, and conclude that even more rigorous studies are needed to profile GUI applications.


Elizabeth Wyss

A New Frontier for Software Security: Diving Deep into npm

When & Where:


Eaton Hall, Room 2001B

Committee Members:

Drew Davidson, Chair
Alex Bardas
Fengjun Li
Bo Luo
J. Walker

Abstract

Open-source package managers (e.g., npm for Node.js) have become an established component of modern software development. Rather than creating applications from scratch, developers may employ modular software dependencies and frameworks--called packages--to serve as building blocks for writing larger applications. Package managers make this process easy. With a simple command line directive, developers are able to quickly fetch and install packages across vast open-source repositories. npm--the largest of such repositories--alone hosts millions of unique packages and serves billions of package downloads each week. 

However, the widespread code sharing resulting from open-source package managers also presents novel security implications. Vulnerable or malicious code hiding deep within package dependency trees can be leveraged downstream to attack both software developers and the end-users of their applications. This downstream flow of software dependencies--dubbed the software supply chain--is critical to secure.

This research provides a deep dive into the npm-centric software supply chain, exploring distinctive phenomena that impact its overall security and usability. Such factors include (i) hidden code clones--which may stealthily propagate known vulnerabilities, (ii) install-time attacks enabled by unmediated installation scripts, (iii) hard-coded URLs residing in package code, (iv) the impacts of open-source development practices, (v) package compromise via malicious updates, (vi) spammers disseminating phishing links within package metadata, and (vii) abuse of cryptocurrency protocols designed to reward the creators of high-impact packages. For each facet, tooling is presented to identify and/or mitigate potential security impacts. Ultimately, it is our hope that this research fosters greater awareness, deeper understanding, and further efforts to forge a new frontier for the security of modern software supply chains. 


Past Defense Notices

Dates

Levi Goodman

Dual Mode W-Band Radar for Range Finding, Static Clutter Suppression & Moving Target Detection

When & Where:


250 Nichols Hall

Committee Members:

Christopher Allen, Chair
Shannon Blunt
James Stiles


Abstract

Many radar applications today require accurate, real-time, unambiguous measurement of target range and radial velocity.  Obstacles that frequently prevent target detection are the presence of noise and the overwhelming backscatter from other objects, referred to as clutter.

In this thesis, a method of static clutter suppression is proposed to increase detectability of moving targets in high clutter environments.  An experimental dual-purpose, single-mode, monostatic FMCW radar, operating at 108 GHz, is used to map the range of stationary targets and determine range and velocity of moving targets.  By transmitting a triangular waveform, which consists of alternating upchirps and downchirps, the received echo signals can be separated into two complementary data sets, an upchirp data set and a downchirp data set.  In one data set, the return signals from moving targets are spectrally isolated (separated in frequency) from static clutter return signals.  The static clutter signals in that first data set are then used to suppress the static clutter in the second data set, greatly improving detectability of moving targets.  Once the moving target signals are recovered from each data set, they are then used to solve for target range and velocity simultaneously.

The moving target of interest for tests performed was a reusable paintball (reball).  Reball range and velocity were accurately measured at distances up to 5 meters and at speeds greater than 90 m/s (200 mph) with a deceleration of approximately 0.155 m/s/ms (meters per second per millisecond).  Static clutter suppression of up to 25 dB was achieved, while moving target signals only suffered a loss of about 3 dB.

 


Ruoting Zheng

Algorithms for Computing Maximal Consistent Blocks

When & Where:


2001 B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Bo Luo


Abstract

Rough set theory is a tool to deal with uncertain and incomplete data. It has been successfully used in classification, machine learning and automated knowledge acquisition. A maximal consistent block defined using rough set theory, is used for rule acquisition.

Maximal consistent block technique is applied to acquire knowledge in incomplete data sets by analyzing the structure of a similarity class. 

The main objective of this project is to implement and compare the algorithms for computing the maximal consistent blocks. The brute force method, recursive method and hierarchical method were designed for the data sets with missing attribute values interpreted only as “do not care” conditions. In this project, we extend these algorithms so they can be applied to arbitrary interpretations of missing attribute values, and an approach for computing maximal consistent blocks on the data sets with lost values is introduced in this project. Besides, we found that the brute force method and recursive method have problems dealing with the data sets for which characteristic sets are not transitive, so the limitations of the algorithms and a simplified recursive method are provided in the project as well.


Hao Xue

Trust and Credibility in Online Social Networks

When & Where:


246 Nichols Hall

Committee Members:

Fengjun Li, Chair
Prasad Kulkarni
Bo Luo
Cuncong Zhong
Mei Liu

Abstract

Increasing portions of people's social and communicative activities now take place in the digital world. The growth and popularity of online social networks (OSNs) have tremendously facilitate the online interaction and information exchange. Not only normal users benefit from OSNs as more people now rely online information for news, opinions, and social networking, but also companies and business owners who utilize OSNs as platforms for gathering feedback and marketing activities. As OSNs enable people to communicate more effectively, a large volume of user  generated content (UGC) is produced daily. However, the freedom and ease of of publishing information online has made these systems no longer the sources of reliable information. Not only does biased and misleading information exist, financial incentives drive individual and professional spammers to insert deceptive content and promote harmful information, which jeopardizes the ecosystems of OSNs.
In this dissertation, we present our work of measuring the credibility of information and detect content polluters in OSNs. Firstly, we assume that review spammers spend less effort in maintain social connections and propose to utilize the social relationships and rating deviations to assist the computation of trustworthiness of users. Compared to numeric ratings, textual content contains richer information about the actual opinion of a user toward a target. Thus, we propose a content-based trust propagation framework by extracting the opinions expressed in review content. In addition, we discover that the surrounding network around a user could also provide valuable information about the user himself. Lastly, we study the problem of detecting social bots by utilizing the characteristics of surrounding neighborhood networks.


Casey Sader

Taming WOLF: Building a More Functional and User-Friendly Framework

When & Where:


2001 B Eaton Hall

Committee Members:

Michael Branicky , Chair
Bo Luo
Suzanne Shontz


Abstract

Machine learning is all about automation. Many tools have been created to help data scientists automate repeated tasks and train models. These tools require varying levels of user experience to be used effectively. The ``machine learning WOrk fLow management Framework" (WOLF) aims to automate the machine learning pipeline. One of its key uses is to discover which machine learning model and hyper-parameters are the best configuration for a dataset. In this project, features were explored that could be added to make WOLF behave as a full pipeline in order to be helpful for novice and experienced data scientists alike. One feature to make WOLF more accessible is a website version that can be accessed from anywhere and make using WOLF much more intuitive. To keep WOLF aligned with the most recent trends and models, the ability to train a neural network using the TensorFlow framework and Keras library were added. This project also introduced the ability to pickle and save trained models. Designing the option for using the models to make predictions within the WOLF framework on another collection of data is a fundamental side-effect of saving the models. Understanding how the model makes predictions is a beneficial component of machine learning. This project aids in that understanding by calculating and reporting the relative importance of the dataset features for the given model. Incorporating all these additions to WOLF makes it a more functional and user-friendly framework for machine learning tasks.

 


Charles Mohr

Multi-Objective Optimization of FM Noise Waveforms via Generalized Frequency Template Error Metrics

When & Where:


129 Nichols Hall

Committee Members:

Shannon Blunt, Chair
Christopher Allen
James Stiles


Abstract

FM noise waveforms have been experimentally demonstrated to achieve high time bandwidth products and low autocorrelation sidelobes while achieving acceptable spectral containment in physical implementation. Still, it may be necessary to further reduce sidelobe levels for detection or improve spectral containment in the face of growing spectral use. The Frequency Template Error (FTE) and the Logarithmic Frequency Template Error (Log-FTE) metrics were conceived as means to achieve FM noise waveforms with good spectral containment and good autocorrelation sidelobes. In practice, FTE based waveform optimizations have been found to produce better autocorrelation responses at the expense of spectral containment while Log-FTE optimizations achieve excellent spectral containment and interference rejection at the expense of autocorrelation sidelobe levels. In this work, the notion of the FTE and Log-FTE metrics are considered as subsets of a broader class of frequency domain metrics collectively termed as the Generalized Frequency Template Error (GFTE). In doing so, many different P-norm based variations of the FTE and Log-FTE cost functions are extensively examined and applied via gradient descent methods to optimize polyphase-coded FM (PCFM) waveforms. The performance of the different P-norm variations of the FTE and Log-FTE cost functions are compared amongst themselves, against each other, and relative to a previous FM noise waveform design approach called Pseudo-Random Optimized FM (PRO-FM). They are evaluated in terms of their autocorrelation sidelobes, spectral containment, and their ability to realize spectral notches within the 3 dB bandwidth for the purpose of interference rejection. These comparisons are performed in both simulation and experimentally in loopback where it was found that P-norm values of 2 tend to provide the best optimization performance for both the FTE and Log-FTE optimizations except in the case of the Log-FTE optimization of a notched spectral template where a P-norm value of 3 provides the best results. In general, the FTE and Log-FTE cost functions as subsets of the GFTE provide diverse means to optimize physically robust FM noise waveforms while emphasizing different performance criteria in terms of autocorrelation sidelobes, spectral containment, and interference rejection.


Rui Cao

How good Are Probabilistic Approximations for Rule Induction from Data with Missing Attribute Values

When & Where:


246 Nichols Hall

Committee Members:

Jerzy Grzymala-Busse , Chair
Guanghui Wang
Cuncong Zhong


Abstract

In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, may be incomplete. The idea of the probabilistic approximation, has been used for many years in variable precision rough set models and similar approaches to uncertainty. The objective of this project is to test whether proper probabilistic approximations are better than concept lower and upper approximations. In this project, experiments were conducted on six incomplete data sets with lost values. We implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Hold-out validation was carried out and the error rate was used as the criterion for comparison. 


Lokesh Kaki

An Automatic Image Stitching Software with Customizable Parameters and a Graphical User Interface

When & Where:


2001 B Eaton Hall

Committee Members:

Richard Wang, Chair
Esam El-Araby
Jerzy Grzymala-Busse


Abstract

Image stitching is one of the most widely used Computer Vision algorithms with a broad range of applications, such as image stabilization, high-resolution photomosaics, object insertion, 3D image reconstruction, and satellite imaging. The process of extracting image features from each input image,  determining the image matches, and then estimating the homography for each matched image is the necessary procedure for most of the feature-based image stitching techniques. In recent years, several state-of-the-art techniques like scale-invariant feature transform (SIFT), random sample consensus (RANSAC), and direct linear transformation (DLT) have been proposed for feature detection, extraction, matching, and homography estimation. However, using these algorithms with fixed parameters does not usually work well in creating seamless, natural-looking panoramas. The set of parameter values which work best for specific images may not work equally well for another set of images taken by a different camera or in varied conditions. Hence, the parameter tuning is as important as choosing the right set of algorithms for the efficient performance of any image stitching algorithm.

In this project, a graphical user interface is designed and programmed to tune a total of 32 parameters, including some of the basic ones such as straitening, cropping, setting the maximum output image size, and setting the focal length.  It also contains several advanced parameters like specifying the number of RANSAC iterations, RANSAC inlier threshold, extrema threshold, Gaussian window size, etc. The image stitching algorithm used in this project comprises of SIFT, DLT, RANSAC, warping, straightening, bundle adjustment, and blending techniques. Once the given images are stitched together, the output image can be further analyzed inside the user interface by clicking on any particular point. Then, it returns the corresponding input image, which contributed to the selected point, and its GPS coordinates, altitude, and camera focal length given by its metadata. The developed software has been successfully tested on various diverse datasets, and the customized parameters with corresponding results, as well as timer logs are tabulated in this report. The software is built for both Windows and Linux operating systems as part of this project.

 


Mohammad Isyroqi Fathan

Comparative Study on Polyp Localization and Classification on Colonoscopy Video

When & Where:


250 Nichols Hall

Committee Members:

Guanghui Wang, Chair
Bo Luo
James Miller


Abstract

Colorectal cancer is one of the most common types of cancer with a high mortality rate. It typically develops from small clumps of benign cells called polyp. The adenomatous polyp has a higher chance of developing into cancer compared to the hyperplastic polyp. Colonoscopy is the preferred procedure for colorectal cancer screening and to minimize its risk by performing a biopsy on found polyps. Thus, a good polyp detection model can assist physicians and increase the effectiveness of colonoscopy. Several models using handcrafted features and deep learning approaches have been proposed for the polyp detection task.  

In this study, we compare the performances of the previous state-of-the-art general object detection models for polyp detection and classification (into adenomatous and hyperplastic class).  Specifically, we compare the performances of FasterRCNN, SSD, YOLOv3, RefineDet, RetinaNet, and FasterRCNN with DetNet backbone. This comparative study serves as an initial analysis of the effectiveness of these models and to choose a base model that we will improve further for polyp detection.


Lei Wang

I Know What You Type on Your Phone: Keystroke Inference on Android Device Using Deep Learning

When & Where:


246 Nichols Hall

Committee Members:

Bo Luo, Chair
Fengjun Li
Guanghui Wang


Abstract

Given a list of smartphone sensor readings, such as accelerometer, gyroscope and light sensor, is there enough information present to predict a user’s input without access to either the raw text or keyboard log? With the increasing usage of smartphones as personal devices to access sensitive information on-the-go has put user privacy at risk. As the technology advances rapidly, smartphones now equip multiple sensors to measure user motion, temperature and brightness to provide constant feedback to applications in order to receive accurate and current weather forecast, GPS information and so on. In the ecosystem of Android, sensor reading can be accessed without user permissions and this makes Android devices vulnerable to various side-channel attacks.

In this thesis, we first create a native Android app to collect approximately 20700 keypresses from 30 volunteers. The text used for the data collection is carefully selected based on the bigram analysis we run on over 1.3 million tweets. We then present two approaches (single key press and bigram) for feature extraction, those features are constructed using accelerometer, gyroscope and light sensor readings. A deep neural network with four hidden layers is proposed as the baseline for this work, which achieves an accuracy of 47% using categorical cross entropy as the accuracy metric. A multi-view model then is proposed in the later work and multiple views are extracted and performance of the combination of each view is compared for analysis.


Wenchi Ma

Deep Neural Network based Object Detection and Regularization in Deep Learning

When & Where:


246 Nichols Hall

Committee Members:

Richard Wang, Chair
Arvin Agah
Bo Luo
Heechul Yun
Haiyang Chao

Abstract

The abilities of feature learning, scene understanding, and task generalization are the consistent pursuit in deep learning-based computer vision. A number of object detectors with various network structures and algorithms have been proposed to learn more effective features, to extract more contextual and semantic information, and to achieve more robust and more accurate performance on different datasets. Nevertheless, the problem is still not well addressed in practical applications. One major issue lies in the inefficient feature learning and propagation in challenging situations like small objects, occlusion, illumination, etc. Another big issue is the poor generalization ability on datasets with different feature distribution. 

The study aims to explore different learning frameworks and strategies to solve the above issues. (1) We propose a new model to make full use of different features from details to semantic ones for better detection of small and occluded objects. The proposed model emphasizes more on the effectiveness of semantic and contextual information from features produced in high-level layers. (2) To achieve more efficient learning, we propose the near-orthogonality regularization, which takes the neuron redundancy into consideration, to generate better deep learning models. (3) We are currently working on tightening the object localization by integrating the localization score into a non-maximum suppression (NMS) to achieve more accurate detection results, and on the domain adaptive learning that encourages the learning models to acquire higher generalization ability of domain transfer.