Defense Notices


All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Andrew Riachi

An Investigation Into The Memory Consumption of Web Browsers and A Memory Profiling Tool Using Linux Smaps

When & Where:


Nichols Hall, Room 250 (Gemini Conference Room)

Committee Members:

Prasad Kulkarni, Chair
Perry Alexander
Drew Davidson
Heechul Yun

Abstract

Web browsers are notorious for consuming large amounts of memory. Yet, they have become the dominant framework for writing GUIs because the web languages are ergonomic for programmers and have a cross-platform reach. These benefits are so enticing that even a large portion of mobile apps, which have to run on resource-constrained devices, are running a web browser under the hood. Therefore, it is important to keep the memory consumption of web browsers as low as practicable.

In this thesis, we investigate the memory consumption of web browsers, in particular, compared to applications written in native GUI frameworks. We introduce smaps-profiler, a tool to profile the overall memory consumption of Linux applications that can report memory usage other profilers simply do not measure. Using this tool, we conduct experiments which suggest that most of the extra memory usage compared to native applications could be due the size of the web browser program itself. We discuss our experiments and findings, and conclude that even more rigorous studies are needed to profile GUI applications.


Past Defense Notices

Dates

XIAOMENG SU

A comparison of the Quality of Rule Induction from Inconsistent Data sets and Incomplete Data Sets

When & Where:


246 Nichols Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Zongbo Wang


Abstract

In data mining, decision rules inducted from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, either inconsistent or incomplete. The idea of lower and upper approximations, or more generally, the probabilistic approximation, provides an effective way to induct rules from inconsistent data sets and incomplete data sets. But the accuracy of rule sets inducted from imperfect data sets are expected to be lower. The objective of this project is to investigate which kind of imperfect data sets (inconsistent or incomplete) is worse in terms of the quality of inducted rule set. In this project, experiments were conducted on eight inconsistent data sets and eight incomplete data sets with lost values. We implemented the MLEM2 algorithm to induct certain and possible rules from inconsistent data sets, and implemented the local probabilistic version of MLEM2 algorithm to induct certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with inducted rules and measure the classification error rate. Ten-cross fold validation was carried out and the average error rate was used as the criteria for comparison. The Mann-Whitney nonparametric test was performed to compare, separately for certain and possible rules, incompleteness with inconsistency. The results show that there is no significant difference between inconsistent and incomplete data sets in terms of the quality of rule induction.


SIVA PRAMOD BOBBILI

Static Disassembly of Binary using Symbol Table Information

When & Where:


250 Nichols Hall

Committee Members:

Prasad Kulkarni, Chair
Andy Gill
Jerzy Grzymala-Busse


Abstract

Static binary analysis is an important challenge with many applications in security and performance optimization. One of the main challenges with analyzing an executable file statically is to discover all the instructions in the binary executable. It is often difficult to discover all program instructions due to a well-known limitation in static binary analysis, called the code discovery problem. Some of the main contributors to the code discovery problem are variable length CISC instructions, data interspersed with code, padding bytes for branch target alignment and indirect jumps. All these problems manifest themselves in x86 binary files, which is unfortunate since x86 is the most popular architecture format in desktop and server domains. 
Although much of the research work in the recent times have stated that the symbol table might be of help to overcome the difficulties of code discovery, the extent to which it can actually help in the code discovery problem is still in question. This work focuses on assessing the benefit of using the symbol table information to overcome the limitations of the code discovery problem and identify more or all instructions in x86 binary executable files. We will discuss the details, extent, limitations and impact of instruction discovery with and without symbol table information in this work. 


JONATHAN LUTES

SafeExit: Exit Node Protection for TOR

When & Where:


2001B Eaton Hall

Committee Members:

Bo Luo, Chair
Arvin Agah
Prasad Kulkarni


Abstract

TOR is one of the most important networks for providing anonymity over the internet. However, in some cases its exit node operators open themselves up to various legal challenges, a fact which discourages participation in the network. In this paper, we propose a mechanism for allowing some users to be voluntarily verified by trusted third parties, providing a means by which an exit node can verify that they are not the true source of traffic. This is done by extending TOR’s anonymity model to include 
another class of user, and using a web of trust mechanism to create chains of trust. 


KAVYASHREE PILAR

Digital Down Conversion and Compression of Radar Data

When & Where:


317 Nichols Hall

Committee Members:

Carl Leuschen, Chair
Shannon Blunt
Glenn Prescott


Abstract

Storage and handling of huge amount of received data samples is one of the major challenges associated with Radar system design. Radar data samples potentially have high temporal and spatial correlation depending on the target characteristics and radar settings. This correlation can be utilized to compress them without any loss in sensitivity in post processed products. This project focuses on reducing the storage requirement of a Radar used for remote sensing of ice sheets. At the front-end of Radar receiver, the data sample rate can be reduced at real-time by performing frequency down-conversion and decimation of the incoming data. The decimated signal can be further compressed by applying suitable data compression algorithm. The project implements a digital down-converter, decimator and a data compression module on FPGA. Literature survey suggests that there are quite a few research works being done towards developing customized Radar data compression algorithms. This project analyses the possibility of using general-purpose algorithms like GZIP, JPEG-2000 (lossless) to compress Radar data. It also considers a simple floating point compression technique to convert 16 bit data to 8 bit data, guaranteeing a 50% reduction in data size. The project implements the 16-to-8 bit conversion, JPEG 2000 lossless and GZIP algorithms in Matlab and compares their SNR performance with Radar data. Simulations suggest that all of them have similar SNR performance but JPEG 2000, GZIP algorithms offer a compression ratio of over 90%. However, 16-to-8-bit compression is implemented in this project because of its simplicity. 
A hardware test bed is implemented to integrate the digital radar electronics with the Matlab Simulink Simulation tools in a hardware in the loop (HIL) configuration. The digital down converter, decimator and the data compression module are prototyped on SimuLink. The design is later implemented on FPGA using Verilog code. The functionality is tested at various stages of development using ModelSim simulations, Altera DSPBuilder’s HDL import, HIL co-simulation and using SignalTap. This test bed can also be used for future development efforts. 


SURYA TEJ NIMMAKAYALA

Exploring Causes of Performance Overhead during Dynamic Binary Translation

When & Where:


250 Nichols Hall

Committee Members:

Prasad Kulkarni, Chair
Fengjun Li
Bo Luo


Abstract

Dynamic Binary Translators (DBT) have applications ranging from program 
portability, instrumentation, optimizations, and improving software security. To achieve these goals and maintain control over the application's execution, DBTs translate and run the original source/guest programs in a sand-boxed environment. DBT systems apply several optimization techniques like code caching, trace creation, etc. to reduce the translation overhead and 
enhance program performance at run-time. However, even with these 
optimizations, DBTs typically impose a significant performance overhead, 
especially for short-running applications. This performance penalty has 
restricted the more wide-spread adoption of DBT technology, in spite of its obvious need. 

The goal of this work is to determine the different factors that contribute to the performance penalty imposed by dynamic binary translators. In this thesis, we describe the experiments that we designed to achieve our goal and report our results and observations. We use a popular and sophisticated DBT, DynamoRio, for our test platform, and employ the industry-standard SPEC CPU2006 benchmarks to capture run-time statistics. Our experiments find that DynamoRio executes a large number of additional instructions when compared to the native application execution. We further measure that this increase in the number of executed instructions is caused by the DBT frequently exiting 
the code cache to perform various management tasks at run-time, including 
code translation, indirect branch resolution and trace formation. We also 
find that the performance loss experienced by the DBT is directly 
proportional to the number of code cache exits. We will discuss the details on the experiments, results, observations, and analysis in this work.


XUN WU

A Global Discretization Approach to Handle Numerical Attributes as Preprocessing Presenter

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Heechul Yun


Abstract

Discretization is a common technique to handle numerical attributes in data mining, and it divides continuous values into several intervals by defining multiple thresholds. Decision tree learning algorithms, such as C4.5 and random forests, are able to deal with numerical attributes by applying discretization technique and transforming them into nominal attributes based on one impurity-based criterion, such as information gain or Gini gain. However, there is no doubt that a considerable amount of distinct values are located in the same interval after discretization, through which digital information delivered by the original continuous values are lost. 
In this thesis, we proposed a global discretization method that is able to keep the information within the original numerical attributes by expanding them into multiple nominal ones based on each of the candidate cut-point values. The discretized data set, which includes only nominal attributes, evolves from the original data set. We analyzed the problem by applying two decision tree learning algorithms, namely C4.5 and random forests, respectively to each of the twelve pairs of data sets (original and discretized data sets) and evaluating the performances (prediction accuracy rate) of the obtained classification models in Weka Experimenter. This is followed by two separate Wilcoxon tests (each test for one learning algorithm) to decide whether there is a level of statistical significance among these paired data sets. Results of both tests indicate that there is no clear difference in terms of performances by using the discretized data sets compared to the original ones. 


YUFEI CHENG

Future Internet Routing Design for Massive Failures and Attacks

When & Where:


246 Nichols Hall

Committee Members:

James Sterbenz, Chair
Victor Frost
Fengjun Li
Gary Minden
Michael Vitevitch

Abstract

With the increasing frequency of natural disasters and intentional attacks that challenge the optical network, vulnerability to cascading and regional-correlated challenges is escalating. Given the high complexity and large traffic load of the optical networks, the correlated challenges pose great damage to reliable network communication. We start our research by proposing a critical regional identification mechanism and study different vulnerability scales using real-world physical network topologies. We further propose geographical diversity and incorporate it into a new graph resilience metric cTGGD (compensated Total Geographical Graph Diversity), which is capable of characterizing and differentiating resiliency level from different physical networks. We propose path geodiverse problem (PGD) and two heuristics for solving the problem with less complexity compared to the optimal algorithm. The geodiverse paths are optimized with a delay-skew optimization formulation for optimal traffic allocation. We implement GeoDivRP in ns-3 to employ the optimized paths and demonstrate their effectiveness compared to OSPF Equal-Cost Multi-Path routing (ECMP) in terms of both throughput and overall link utilization. As from the attackers perspective, we have analyzed the mechanism by which the attackers could use to maximize the attack impact with a limited budget and demonstrate the effectiveness of different network restoration plans.


DARSHAN RAMESH

Configuration of Routing Protocols on Routers using Quagga

When & Where:


246 Nichols Hall

Committee Members:

Joseph Evans, Chair
Victor Frost
Glenn Prescott


Abstract

With the increasing number of devices being connected to the network, efficient connection of those devices to the network is very important. The routing protocols have evolved through time. I have used Mininet and Quagga to implement the routing protocols in a topology with ten routers and eleven host machines. Initially the basic configuration of the routers is done to bring its interfaces administratively up and the IP addresses are assigned. Static routes are configured on the routers using Quagga zebra daemons. With the amount of overhead observed, static protocol is replaced with RIPv2 using the Quagga ripd daemon and the features of RIPv2 are implemented like MD5 authentication and split horizon. RIPv2 is replaced with OSPF routing protocol. The differences between static and dynamic protocol are observed. Complex OSPF applications are implemented using the Quagga ospfd daemon. The best route to the neighboring routers is changed using the OSPF cost attribute. Next the networks in the lab are 
assumed to belong to different autonomous systems and BGP is implemented using the Quagga bgpd daemon. The routing updates are filtered using the access list attributes. The path to the neighboring routers is changed using BGP metrics such as MED, weight, AS_PATH and local_pref. Load balancing is also implemented and the results are verified using traceroute and routing tables.


RUXIN XIE

Single-Fiber-Laser-Based Multimodal Coherent Raman System

When & Where:


250 Nichols Hall

Committee Members:

Ron Hui, Chair
Chris Allen
Shannon Blunt
Victor Frost
Carey Johnson

Abstract

Single-fiber-laser-based coherent Raman scattering (CRS) spectroscopy and microscopy system can automatically maintain frequency synchronization between pump and Stokes beam, which dramatically simplifies the setup configuration. The Stokes frequency shift is generated by soliton self-frequency shift (SSFS) through a photonic crystal fiber. The impact of pulse chirping on the signal power reduction of coherent anti-Stokes Raman scattering (CARS) and stimulated Raman scattering (SRS) have been investigate through theoretical analysis and experiment. The strategies of system optimization is discussed. 
Our multimodal system provides measurement diversity among CARS, SRS and photothermal, which can be used for comparison and offering complementary information. Distribution of hemoglobin in human red blood cells and lipids in sliced mouse brain sample have been imaged. Frequency and power dependency of photothermal signal is characterized. 
Instead of using intensity modulated pump, the polarization switched SRS method is applied to our system by changing the polarization of the pump. Based on the polarization dependency of the third-order susceptibility of the material, this method is able to eliminate the nonresonant photothermal signal from the resonant SRS signal. Red blood cells and sliced mouse brain samples were imaged to demonstrate the capability of the proposed technique. The result shows that polarization switched SRS removes most of the photothermal signal. 


VENU GOPAL BOMMU

Performance Analysis of Various Implementations of Machine Learning Algorithms

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Luke Huan
Bo Luo


Abstract

Rapid development in technologies and database systems result in producing and storing large amounts of data. With such an enormous increase in data over the last few decades, data mining became a useful tool to discover the knowledge hidden in large data. Domain experts often use machine learning algorithms for finding theories that would explain their data. 
In this project we compare Weka implementation of CART and C4.5 with their original implementation on different datasets from University of California Irvine (UCI). Comparisons of these implementations has been carried in terms of accuracy, decision tree complexity and area under ROC curve (AUC). Results from our experiments show that the decision tree complexity of C4.5 is much higher than CART and that the original implementation of these algorithms perform slightly better than their corresponding Weka implementation in terms of accuracy and AUC.