Defense Notices


All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Andrew Riachi

An Investigation Into The Memory Consumption of Web Browsers and A Memory Profiling Tool Using Linux Smaps

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Prasad Kulkarni, Chair
Perry Alexander
Drew Davidson
Heechul Yun

Abstract

Web browsers are notorious for consuming large amounts of memory. Yet, they have become the dominant framework for writing GUIs because the web languages are ergonomic for programmers and have a cross-platform reach. These benefits are so enticing that even a large portion of mobile apps, which have to run on resource-constrained devices, are running a web browser under the hood. Therefore, it is important to keep the memory consumption of web browsers as low as practicable.

In this thesis, we investigate the memory consumption of web browsers, in particular, compared to applications written in native GUI frameworks. We introduce smaps-profiler, a tool to profile the overall memory consumption of Linux applications that can report memory usage other profilers simply do not measure. Using this tool, we conduct experiments which suggest that most of the extra memory usage compared to native applications could be due the size of the web browser program itself. We discuss our experiments and findings, and conclude that even more rigorous studies are needed to profile GUI applications.


Elizabeth Wyss

A New Frontier for Software Security: Diving Deep into npm

When & Where:


Eaton Hall, Room 2001B

Committee Members:

Drew Davidson, Chair
Alex Bardas
Fengjun Li
Bo Luo
J. Walker

Abstract

Open-source package managers (e.g., npm for Node.js) have become an established component of modern software development. Rather than creating applications from scratch, developers may employ modular software dependencies and frameworks--called packages--to serve as building blocks for writing larger applications. Package managers make this process easy. With a simple command line directive, developers are able to quickly fetch and install packages across vast open-source repositories. npm--the largest of such repositories--alone hosts millions of unique packages and serves billions of package downloads each week. 

However, the widespread code sharing resulting from open-source package managers also presents novel security implications. Vulnerable or malicious code hiding deep within package dependency trees can be leveraged downstream to attack both software developers and the end-users of their applications. This downstream flow of software dependencies--dubbed the software supply chain--is critical to secure.

This research provides a deep dive into the npm-centric software supply chain, exploring distinctive phenomena that impact its overall security and usability. Such factors include (i) hidden code clones--which may stealthily propagate known vulnerabilities, (ii) install-time attacks enabled by unmediated installation scripts, (iii) hard-coded URLs residing in package code, (iv) the impacts of open-source development practices, (v) package compromise via malicious updates, (vi) spammers disseminating phishing links within package metadata, and (vii) abuse of cryptocurrency protocols designed to reward the creators of high-impact packages. For each facet, tooling is presented to identify and/or mitigate potential security impacts. Ultimately, it is our hope that this research fosters greater awareness, deeper understanding, and further efforts to forge a new frontier for the security of modern software supply chains. 


Alfred Fontes

Optimization and Trade-Space Analysis of Pulsed Radar-Communication Waveforms using Constant Envelope Modulations

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jonathan Owen


Abstract

Dual function radar communications (DFRC) is a method of co-designing a single radio frequency system to perform simultaneous radar and communications service. DFRC is ultimately a compromise between radar sensing performance and communications data throughput due to the conflicting requirements between the sensing and information-bearing signals.

A novel waveform-based DFRC approach is phase attached radar communications (PARC), where a communications signal is embedded onto a radar pulse via the phase modulation between the two signals. The PARC framework is used here in a new waveform design technique that designs the radar component of a PARC signal to match the PARC DFRC waveform expected power spectral density (PSD) to a desired spectral template. This provides better control over the PARC signal spectrum, which mitigates the issue of PARC radar performance degradation from spectral growth due to the communications signal. 

The characteristics of optimized PARC waveforms are then analyzed to establish a trade-space between radar and communications performance within a PARC DFRC scenario. This is done by sampling the DFRC trade-space continuum with waveforms that contain a varying degree of communications bandwidth, from a pure radar waveform (no embedded communications) to a pure communications waveform (no radar component). Radar performance, which is degraded by range sidelobe modulation (RSM) from the communications signal randomness, is measured from the PARC signal variance across pulses; data throughput is established as the communications performance metric. Comparing the values of these two measures as a function of communications symbol rate explores the trade-offs in performance between radar and communications with optimized PARC waveforms.


Qua Nguyen

Hybrid Array and Privacy-Preserving Signaling Optimization for NextG Wireless Communications

When & Where:


Zoom Defense, please email jgrisafe@ku.edu for link.

Committee Members:

Erik Perrins, Chair
Morteza Hashemi
Zijun Yao
Taejoon Kim
KC Kong

Abstract

This PhD research tackles two critical challenges in NextG wireless networks: hybrid precoder design for wideband sub-Terahertz (sub-THz) massive multiple-input multiple-output (MIMO) communications and privacy-preserving federated learning (FL) over wireless networks.

In the first part, we propose a novel hybrid precoding framework that integrates true-time delay (TTD) devices and phase shifters (PS) to counteract the beam squint effect - a significant challenge in the wideband sub-THz massive MIMO systems that leads to considerable loss in array gain. Unlike previous methods that only designed TTD values while fixed PS values and assuming unbounded time delay values, our approach jointly optimizes TTD and PS values under realistic time delays constraint. We determine the minimum number of TTD devices required to achieve a target array gain using our proposed approach. Then, we extend the framework to multi-user wideband systems and formulate a hybrid array optimization problem aiming to maximize the minimum data rate across users. This problem is decomposed into two sub-problems: fair subarray allocation, solved via continuous domain relaxation, and subarray gain maximization, addressed via a phase-domain transformation.

The second part focuses on preserving privacy in FL over wireless networks. First, we design a differentially-private FL algorithm that applies time-varying noise variance perturbation. Taking advantage of existing wireless channel noise, we jointly design differential privacy (DP) noise variances and users transmit power to resolve the tradeoffs between privacy and learning utility. Next, we tackle two critical challenges within FL networks: (i) privacy risks arising from model updates and (ii) reduced learning utility due to quantization heterogeneity. Prior work typically addresses only one of these challenges because maintaining learning utility under both privacy risks and quantization heterogeneity is a non-trivial task. We approach to improve the learning utility of a privacy-preserving FL that allows clusters of devices with different quantization resolutions to participate in each FL round. Specifically, we introduce a novel stochastic quantizer (SQ) that ensures a DP guarantee and minimal quantization distortion. To address quantization heterogeneity, we introduce a cluster size optimization technique combined with a linear fusion approach to enhance model aggregation accuracy. Lastly, inspired by the information-theoretic rate-distortion framework, a privacy-distortion tradeoff problem is formulated to minimize privacy loss under a given maximum allowable quantization distortion. The optimal solution to this problem is identified, revealing that the privacy loss decreases as the maximum allowable quantization distortion increases, and vice versa.

This research advances hybrid array optimization for wideband sub-THz massive MIMO and introduces novel algorithms for privacy-preserving quantized FL with diverse precision. These contributions enable high-throughput wideband MIMO communication systems and privacy-preserving AI-native designs, aligning with the performance and privacy protection demands of NextG networks.


Arin Dutta

Performance Analysis of Distributed Raman Amplification with Different Pumping Configurations

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Rongqing Hui, Chair
Morteza Hashemi
Rachel Jarvis
Alessandro Salandrino
Hui Zhao

Abstract

As internet services like high-definition videos, cloud computing, and artificial intelligence keep growing, optical networks need to keep up with the demand for more capacity. Optical amplifiers play a crucial role in offsetting fiber loss and enabling long-distance wavelength division multiplexing (WDM) transmission in high-capacity systems. Various methods have been proposed to enhance the capacity and reach of fiber communication systems, including advanced modulation formats, dense wavelength division multiplexing (DWDM) over ultra-wide bands, space-division multiplexing, and high-performance digital signal processing (DSP) technologies. To maintain higher data rates along with maximizing the spectral efficiency of multi-level modulated signals, a higher Optical Signal-to-Noise Ratio (OSNR) is necessary. Despite advancements in coherent optical communication systems, the spectral efficiency of multi-level modulated signals is ultimately constrained by fiber nonlinearity. Raman amplification is an attractive solution for wide-band amplification with low noise figures in multi-band systems.

Distributed Raman Amplification (DRA) have been deployed in recent high-capacity transmission experiments to achieve a relatively flat signal power distribution along the optical path and offers the unique advantage of using conventional low-loss silica fibers as the gain medium, effectively transforming passive optical fibers into active or amplifying waveguides. Also, DRA provides gain at any wavelength by selecting the appropriate pump wavelength, enabling operation in signal bands outside the Erbium doped fiber amplifier (EDFA) bands. Forward (FW) Raman pumping configuration in DRA can be adopted to further improve the DRA performance as it is more efficient in OSNR improvement because the optical noise is generated near the beginning of the fiber span and attenuated along the fiber. Dual-order FW pumping scheme helps to reduce the non-linear effect of the optical signal and improves OSNR by more uniformly distributing the Raman gain along the transmission span.

The major concern with Forward Distributed Raman Amplification (FW DRA) is the fluctuation in pump power, known as relative intensity noise (RIN), which transfers from the pump laser to both the intensity and phase of the transmitted optical signal as they propagate in the same direction. Additionally, another concern of FW DRA is the rise in signal optical power near the start of the fiber span, leading to an increase in the non-linear phase shift of the signal. These factors, including RIN transfer-induced noise and non-linear noise, contribute to the degradation of system performance in FW DRA systems at the receiver.

As the performance of DRA with backward pumping is well understood with relatively low impact of RIN transfer, our research  is focused on the FW pumping configuration, and is intended to provide a comprehensive analysis on the system performance impact of dual order FW Raman pumping, including signal intensity and phase noise induced by the RINs of both 1st and the 2nd order pump lasers, as well as the impacts of linear and nonlinear noise. The efficiencies of pump RIN to signal intensity and phase noise transfer are theoretically analyzed and experimentally verified by applying a shallow intensity modulation to the pump laser to mimic the RIN. The results indicate that the efficiency of the 2nd order pump RIN to signal phase noise transfer can be more than 2 orders of magnitude higher than that from the 1st order pump. Then the performance of the dual order FW Raman configurations is compared with that of single order Raman pumping to understand trade-offs of system parameters. The nonlinear interference (NLI) noise is analyzed to study the overall OSNR improvement when employing a 2nd order Raman pump. Finally, a DWDM system with 16-QAM modulation is used as an example to investigate the benefit of DRA with dual order Raman pumping and with different pump RIN levels. We also consider a DRA system using a 1st order incoherent pump together with a 2nd order coherent pump. Although dual order FW pumping corresponds to a slight increase of linear amplified spontaneous emission (ASE) compared to using only a 1st order pump, its major advantage comes from the reduction of nonlinear interference noise in a DWDM system. Because the RIN of the 2nd order pump has much higher impact than that of the 1st order pump, there should be more stringent requirement on the RIN of the 2nd order pump laser when dual order FW pumping scheme is used for DRA for efficient fiber-optic communication. Also, the result of system performance analysis reveals that higher baud rate systems, like those operating at 100Gbaud, are less affected by pump laser RIN due to the low-pass characteristics of the transfer of pump RIN to signal phase noise.


Audrey Mockenhaupt

Using Dual Function Radar Communication Waveforms for Synthetic Aperture Radar Automatic Target Recognition

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jon Owen


Abstract

Pending.


Rich Simeon

Delay-Doppler Channel Estimation for High-Speed Aeronautical Mobile Telemetry Applications

When & Where:


Eaton Hall, Room 2001B

Committee Members:

Erik Perrins, Chair
Shannon Blunt
Morteza Hashemi
Jim Stiles
Craig McLaughlin

Abstract

The next generation of digital communications systems aims to operate in high-Doppler environments such as high-speed trains and non-terrestrial networks that utilize satellites in low-Earth orbit. Current generation systems use Orthogonal Frequency Division Multiplexing modulation which is known to suffer from inter-carrier interference (ICI) when different channel paths have dissimilar Doppler shifts.

A new Orthogonal Time Frequency Space (OTFS) modulation (also known as Delay-Doppler modulation) is proposed as a candidate modulation for 6G networks that is resilient to ICI. To date, OTFS demodulation designs have focused on the use cases of popular urban terrestrial channel models where path delay spread is a fraction of the OTFS symbol duration. However, wireless wide-area networks that operate in the aeronautical mobile telemetry (AMT) space can have large path delay spreads due to reflections from distant geographic features. This presents problems for existing channel estimation techniques which assume a small maximum expected channel delay, since data transmission is paused to sound the channel by an amount equal to twice the maximum channel delay. The dropout in data contributes to a reduction in spectral efficiency.

Our research addresses OTFS limitations in the AMT use case. We start with an exemplary OTFS framework with parameters optimized for AMT. Following system design, we focus on two distinct areas to improve OTFS performance in the AMT environment. First we propose a new channel estimation technique using a pilot signal superimposed over data that can measure large delay spread channels with no penalty in spectral efficiency. A successive interference cancellation algorithm is used to iteratively improve channel estimates and jointly decode data. A second aspect of our research aims to equalize in delay-Doppler space. In the delay-Doppler paradigm, the rapid channel variations seen in the time-frequency domain is transformed into a sparse quasi-stationary channel in the delay-Doppler domain. We propose to use machine learning using Gaussian Process Regression to take advantage of the sparse and stationary channel and learn the channel parameters to compensate for the effects of fractional Doppler in which simpler channel estimation techniques cannot mitigate. Both areas of research can advance the robustness of OTFS across all communications systems.


Past Defense Notices

Dates

MAHITHA DODDALA

Properties of Probabilistic Approximations Applied to Incomplete Data

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Man Kong
Bo Luo


Abstract

The main focus of the project is to discuss mining of incomplete data which we find frequently in real-life records. For this, I considered the probabilistic approximations as they have a direct application to mining incomplete data. I have examined the results obtained from the experiments conducted on eight real-life data sets taken from University of California at Irvine Machine Learning Repository. I also investigated the properties of singleton, subset, and concept approximations and corresponding consistencies. The main objective was to compare the global and local approximations and generalize the consistency definition for incomplete data with two interpretations of missing attribute values: lost values and "do not care" conditions. In addition to this comparison, the most useful approach among singleton, subset and concept approximations is also tested for which the conclusion is the best approach would be selected with the help of tenfold cross validation after applying all three approaches. Also it’s shown that even if there exist six types of consistencies, there are only four distinct consistencies of incomplete data as two pairs of such consistencies are equivalent.


ROHIT YADAV

Automatic Text Summarization of Email Corpus Using Importance of Sentences

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Bo Luo


Abstract

With the advent of Internet, the data being added online have been increasing at an enormous rate. Though search engines use information retrieval (IR) techniques to facilitate the search requests from users, the results may not always be effective or the efficiency of results according to a search query may not be high. The user has to go through certain web pages before getting at the web page he/she needs. This problem of information overload can be solved using automatic text summarization. Summarization is a process of obtaining an abridged version of documents so that user can have a quick understanding of the document. A new technique to produce a summary of an original text is investigated in this project. 
Email threads from the World Wide Web consortium’s sites (W3C) corpus are used in this system.Our system is based on identification and extraction of important sentences from the input document. Apart from common IR features like term frequency and inverse document frequency, novel features such as Term Frequency-Inverse Document Frequency,subject words, sentence position and thematic words have also been implemented. The model consists of four stages. The pre-processing stage converts the unstructured (all those things that can't be so readily classified) text into structured (any data that resides in a fixed field within a record or file). In the first stage each sentence is partitioned into the list of tokens and stop words are removed. The second stage is to extract the important key phrases in the text by implementing a new algorithm through ranking the candidate words. The system uses the extracted keywords/key phrases to select the important sentence. Each sentence is ranked depending on many features such as the existence of the keywords/key phrase in it, the relation between the sentence and the title by using a similarity measurement and other many features. The third stage of the proposed system is to extract the sentences with the highest rank. The fourth stage is the filtering stage where sentences from email threads are ranked as per features and summaries are generated. This system can be considered as a framework for unsupervised learning in the field of text summarization. 


ARJUN MUTHALAGU

Flight Search Application

When & Where:


250 Nichols Hall

Committee Members:

Prasad Kulkarni, Chair
Andy Gill
Jerzy Grzymala-Busse


Abstract

“Flight-search” application is an Angular JS application implemented in a client side architecture. The application displays the flight results from different airline companies based on the input parameters. The application also has custom filtering conditions and custom pagination, which a user can interact with to filter the result and also limit the results displayed in the browser. The application uses QPX Express API to pull data for the flight searches.


SATYA KUNDETI

A comparison of Two Decision Tree Generating Algorithms: C4.5 and CART Based on Numerical Data

When & Where:


2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Luke Huan
Bo Luo


Abstract

In Data Mining, classification of data is a challenging task. One of the most popular techniques for classifying data is decision tree induction. In this project, two decision tree generating algorithms CART and C4.5, using their original implementations, are compared on different numerical data sets, taken from University of California Irvine (UCI). The comparative analysis of these two implementations is carried out in terms of accuracy and decision tree complexity. Results from experiments show that there is statistically insignificant difference(5% level of significance, two-tailed test)between C4.5 and CART in terms of accuracy. On the other hand, decision trees generated by C4.5 and CART have significant statistical difference in terms of their complexity. 

 


NAGA ANUSHA BOMMIDI

The Comparison of Performance and Complexity of Rule Sets induced from Incomplete Data

When & Where:


317 Nichols Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Andy Gill
Prasad Kulkarni


Abstract

The main focus of this project is to identify the best interpretation of missing attribute values in terms of performance and complexity of rule sets. This report summarizes the experimental comparison of the performance and the complexity of rule sets induced from incomplete data sets with three interpretations of missing attribute values: lost values, attribute-concept values, and “do not care” conditions. Furthermore, it details the experiments conducted using MLEM2 rule induction system on 176 data sets, using three kinds of probabilistic approximations: lower, middle and upper. The performance was evaluated using the error rate computed by ten-fold cross validation, and the complexity of rule sets was evaluated based the size of the rule sets and the number of conditions in the rule sets. The results showed that lost values were better in terms of the performance in 10 out of 24 combinations. In addition, attribute-concept values were better in 5 out of 24 combinations, and “do not care” conditions were better in 1 combination in terms of the complexity of rule sets. Furthermore, there was not even one combination of dataset and type of approximation for which both performance and complexity of rule sets were better for one interpretation of missing attributes compared to the other two.


BLAKE BRYANT

Hacking SIEMS to Catch Hackers: Decreasing the Mean Time to Respond to Security Incidents with a Novel Threat Ontology in SIEM Software

When & Where:


2012 BEST

Committee Members:

Hossein Saiedian, Chair
Bo Luo
Gary Minden


Abstract

Information security is plagued with increasingly sophisticated and persistent threats to communication networks. The development of new threat tools or vulnerability exploits often outpaces advancements in network security detection systems. As a result, detection systems often compensate by over reporting partial detections of routine network activity to security analysts for further review. Such alarms seldom contain adequate forensic data for analysts to accurately validate alerts to other stakeholders without lengthy investigations. As a result, security analysts often ignore the vast majority of network security alarms provided by sensors, resulting in security breaches that may have otherwise been prevented. 

Security Information and Event Management (SIEM) software has been introduced recently in an effort to enable data correlation across multiple sensors, with the intent of producing a lower number of security alerts with little forensic value and a higher number of security alerts that accurately reflect malicious actions. However, the normalization frameworks found in current SIEM systems do not accurately depict modern threat activities. As a result, recent network security research has introduced the concept of a "kill chain" model designed to represent threat activities based upon patterns of action, known indicators, and methodical intrusion phases. Such a model was hypothesized by many researchers to result in the realization of the desired goals of SIEM software. 

The focus of this thesis is the implementation of a "kill chain" framework within SIEM software. A novel "Kill chain" model was developed and implemented within a commercial SIEM system through modifications to the existing SIEM database. These modifications resulted in a new log ontology capable of normalizing security sensor data in accordance with modern threat research. New SIEM correlation rules were developed using the novel log ontology compared to existing vendor recommended correlation rules using the default model. The novel log ontology produced promising results indicating improved detection rates, more descriptive security alarms, and a lower number of false positive alarms. These improvements were assessed to provide improved visibility and more efficient investigation processes to security analysts ultimately reducing the mean time required to detect and escalate security incidents. 


SHAUN CHUA

Implementation of a Multichannel Radar Waveform Generator System Controller

When & Where:


317 Nichols Hall

Committee Members:

Carl Leuschen, Chair
Chris Allen
Fernando Rodriguez-Morales


Abstract

Waveform generation is crucial in a radar system operation. There is a recent need for an 8 channel transmitter with high bandwidth chirp signals (100 MHz – 600 MHz). As such, a waveform generator (WFG) hardware module is required for this purpose. The WFG houses 4 Direct Digital Synthesizers (DDS), and an ALTERA Cyclone V FPGA that acts as its controller. The DDS of choice is the AD9915, because its Digital to Analog Converter can be clocked at a maximum rate of 2.5 GHz, allowing plenty of room to produce the high bandwidth and high frequency chirp signals desired, and also because it supports synchronization between multiple AD9915s. 

The brains behind the DDS operations are the FPGA and the radar software developed in NI LabVIEW. These two aspects of the digital systems grants the WFG highly configurable waveform capabilities. The configurable inputs that can be controlled by the user include: number of waveforms in a playlist, start and stop frequency (bandwidth of chirp signal), zero-pi mode, and waveform amplitude and phase control. 

The FPGA acts as a DDS controller that directly configures and control the DDS operations, while also managing and synchronizing the operations of all DDS channels. This project details largely the development of such a controller, named Multichannel Waveform Generator (MWFG) Controller, and the necessary modifications and development in the NI LabVIEW software, so that they complement each other.


DEEPIKA KOTA

Automatic Color Detection of Colored Wires In Electric Cables

When & Where:


2001B Eaton Hall

Committee Members:

Jim Stiles, Chair
Ron Hui
James Rowland


Abstract

An automatic Color detection system checks for the sequence of colored wires in electric cables which are ready to get crimped together. The system inspects for flat connectors with differs in type and number of wires.This is managed in an automatic way with a self learning system without any requirement of manual input from the user to load new data to the machine. The system is coupled to a connector crimping machine and once the system learns the actual sample of cable order , it automatically inspects each cable assembled by the machine. There are three methodologies based on which this automatic detection takes place 1) A self learning system 2) An algorithm for wire segmentation to extract colors from the captured images 3) An algorithm for color recognition to cope up with wires with different illuminations and insulation .The main advantage of this system is when the cables are produced in large batches ,it provides high level of accuracy and prevents false negatives in order to guarantee defect free production.


MOHAMMED ZIAUDDIN

Open Source Python Widget Application to Synchronize Bibliographical References Between Two BibTeX Repositories

When & Where:


246 Nichols Hall

Committee Members:

Andy Gill, Chair
Perry Alexander
Prasad Kulkarni


Abstract

Bibtex is a tool to edit and manage bibliographical references in a document.Researchers face a common problem that they have one copy of their bibliographical reference databases for a specific project and a master bibliographical database file that holds all their bibliographical references. Syncing these two files is an arduous task as one has to search and modify each reference record individually. Most of the bibtex tools available either provide help in maintaining bibtex bibliographies in different file formats or searching for references in web databases but none of them provide a way to synchronize the fields of the same reference record in the two different bibtex database files. 
The intention of this project is to create an application that helps academicians to keep their bibliographical references in two different databases in sync. We have created a python widget application that employs the Tkinter library for GUI and unQLite database for data storage. This application is integrated with Github allowing users to modify bibtex files present on Github. 


HARISH ROHINI

Using Intel Pintools to Analyze Memory Access Patterns

When & Where:


246 Nichols Hall

Committee Members:

Prasad Kulkarni, Chair
Andy Gill
Heechul Yun


Abstract

Analysis of large benchmark programs can be very difficult because of their changes in memory state for every run and with billions of instructions the simulation of a whole program in general can be extremely slow. The solution for this is to simulate only some selected regions which are the most representative parts of a program, So that we can focus our analysis and optimizations on those particular regions which represent more part of the execution of a program. In order to accomplish that, we use intel’s pintool, a binary instrumentation framework which performs program analysis at run time, simpoint to get the most representative regions of a program and pinplay for the reproducible analysis of the program. This project uses these frameworks to simulate and analyze programs to provide various statistics about the memory allocations, memory reference traces, allocated memory usage across the most representative regions of the program and also the cache simulations of the representative regions.