Defense Notices

All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Elizabeth Wyss

A New Frontier for Software Security: Diving Deep into npm

When & Where:

Wednesday, July 23, 2025 - 9:30 AM
Eaton Hall, Room 2001B

Committee Members:

Drew Davidson, Chair
Alex Bardas
Fengjun Li
Bo Luo
J. Walker

Abstract

Open-source package managers (e.g., npm for Node.js) have become an established component of modern software development. Rather than creating applications from scratch, developers may employ modular software dependencies and frameworks--called packages--to serve as building blocks for writing larger applications. Package managers make this process easy. With a simple command line directive, developers are able to quickly fetch and install packages across vast open-source repositories. npm--the largest of such repositories--alone hosts millions of unique packages and serves billions of package downloads each week.

However, the widespread code sharing resulting from open-source package managers also presents novel security implications. Vulnerable or malicious code hiding deep within package dependency trees can be leveraged downstream to attack both software developers and the end-users of their applications. This downstream flow of software dependencies--dubbed the software supply chain--is critical to secure.

This research provides a deep dive into the npm-centric software supply chain, exploring distinctive phenomena that impact its overall security and usability. Such factors include (i) hidden code clones--which may stealthily propagate known vulnerabilities, (ii) install-time attacks enabled by unmediated installation scripts, (iii) hard-coded URLs residing in package code, (iv) the impacts of open-source development practices, (v) package compromise via malicious updates, (vi) spammers disseminating phishing links within package metadata, and (vii) abuse of cryptocurrency protocols designed to reward the creators of high-impact packages. For each facet, tooling is presented to identify and/or mitigate potential security impacts. Ultimately, it is our hope that this research fosters greater awareness, deeper understanding, and further efforts to forge a new frontier for the security of modern software supply chains.

Alfred Fontes

Optimization and Trade-Space Analysis of Pulsed Radar-Communication Waveforms using Constant Envelope Modulations

When & Where:

Tuesday, July 22, 2025 - 2:00 PM
Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jonathan Owen

Abstract

Dual function radar communications (DFRC) is a method of co-designing a single radio frequency system to perform simultaneous radar and communications service. DFRC is ultimately a compromise between radar sensing performance and communications data throughput due to the conflicting requirements between the sensing and information-bearing signals.

A novel waveform-based DFRC approach is phase attached radar communications (PARC), where a communications signal is embedded onto a radar pulse via the phase modulation between the two signals. The PARC framework is used here in a new waveform design technique that designs the radar component of a PARC signal to match the PARC DFRC waveform expected power spectral density (PSD) to a desired spectral template. This provides better control over the PARC signal spectrum, which mitigates the issue of PARC radar performance degradation from spectral growth due to the communications signal.

The characteristics of optimized PARC waveforms are then analyzed to establish a trade-space between radar and communications performance within a PARC DFRC scenario. This is done by sampling the DFRC trade-space continuum with waveforms that contain a varying degree of communications bandwidth, from a pure radar waveform (no embedded communications) to a pure communications waveform (no radar component). Radar performance, which is degraded by range sidelobe modulation (RSM) from the communications signal randomness, is measured from the PARC signal variance across pulses; data throughput is established as the communications performance metric. Comparing the values of these two measures as a function of communications symbol rate explores the trade-offs in performance between radar and communications with optimized PARC waveforms.

Arin Dutta

Performance Analysis of Distributed Raman Amplification with Different Pumping Configurations

When & Where:

Monday, July 21, 2025 - 10:00 AM
Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Rongqing Hui, Chair
Morteza Hashemi
Rachel Jarvis
Alessandro Salandrino
Hui Zhao

Abstract

As internet services like high-definition videos, cloud computing, and artificial intelligence keep growing, optical networks need to keep up with the demand for more capacity. Optical amplifiers play a crucial role in offsetting fiber loss and enabling long-distance wavelength division multiplexing (WDM) transmission in high-capacity systems. Various methods have been proposed to enhance the capacity and reach of fiber communication systems, including advanced modulation formats, dense wavelength division multiplexing (DWDM) over ultra-wide bands, space-division multiplexing, and high-performance digital signal processing (DSP) technologies. To maintain higher data rates along with maximizing the spectral efficiency of multi-level modulated signals, a higher Optical Signal-to-Noise Ratio (OSNR) is necessary. Despite advancements in coherent optical communication systems, the spectral efficiency of multi-level modulated signals is ultimately constrained by fiber nonlinearity. Raman amplification is an attractive solution for wide-band amplification with low noise figures in multi-band systems.

Distributed Raman Amplification (DRA) have been deployed in recent high-capacity transmission experiments to achieve a relatively flat signal power distribution along the optical path and offers the unique advantage of using conventional low-loss silica fibers as the gain medium, effectively transforming passive optical fibers into active or amplifying waveguides. Also, DRA provides gain at any wavelength by selecting the appropriate pump wavelength, enabling operation in signal bands outside the Erbium doped fiber amplifier (EDFA) bands. Forward (FW) Raman pumping configuration in DRA can be adopted to further improve the DRA performance as it is more efficient in OSNR improvement because the optical noise is generated near the beginning of the fiber span and attenuated along the fiber. Dual-order FW pumping scheme helps to reduce the non-linear effect of the optical signal and improves OSNR by more uniformly distributing the Raman gain along the transmission span.

The major concern with Forward Distributed Raman Amplification (FW DRA) is the fluctuation in pump power, known as relative intensity noise (RIN), which transfers from the pump laser to both the intensity and phase of the transmitted optical signal as they propagate in the same direction. Additionally, another concern of FW DRA is the rise in signal optical power near the start of the fiber span, leading to an increase in the non-linear phase shift of the signal. These factors, including RIN transfer-induced noise and non-linear noise, contribute to the degradation of system performance in FW DRA systems at the receiver.

As the performance of DRA with backward pumping is well understood with relatively low impact of RIN transfer, our research is focused on the FW pumping configuration, and is intended to provide a comprehensive analysis on the system performance impact of dual order FW Raman pumping, including signal intensity and phase noise induced by the RINs of both 1st and the 2nd order pump lasers, as well as the impacts of linear and nonlinear noise. The efficiencies of pump RIN to signal intensity and phase noise transfer are theoretically analyzed and experimentally verified by applying a shallow intensity modulation to the pump laser to mimic the RIN. The results indicate that the efficiency of the 2nd order pump RIN to signal phase noise transfer can be more than 2 orders of magnitude higher than that from the 1st order pump. Then the performance of the dual order FW Raman configurations is compared with that of single order Raman pumping to understand trade-offs of system parameters. The nonlinear interference (NLI) noise is analyzed to study the overall OSNR improvement when employing a 2nd order Raman pump. Finally, a DWDM system with 16-QAM modulation is used as an example to investigate the benefit of DRA with dual order Raman pumping and with different pump RIN levels. We also consider a DRA system using a 1st order incoherent pump together with a 2nd order coherent pump. Although dual order FW pumping corresponds to a slight increase of linear amplified spontaneous emission (ASE) compared to using only a 1st order pump, its major advantage comes from the reduction of nonlinear interference noise in a DWDM system. Because the RIN of the 2nd order pump has much higher impact than that of the 1st order pump, there should be more stringent requirement on the RIN of the 2nd order pump laser when dual order FW pumping scheme is used for DRA for efficient fiber-optic communication. Also, the result of system performance analysis reveals that higher baud rate systems, like those operating at 100Gbaud, are less affected by pump laser RIN due to the low-pass characteristics of the transfer of pump RIN to signal phase noise.

Audrey Mockenhaupt

Using Dual Function Radar Communication Waveforms for Synthetic Aperture Radar Automatic Target Recognition

When & Where:

Friday, July 18, 2025 - 2:30 PM
Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jon Owen

Abstract

Pending.

Rich Simeon

Delay-Doppler Channel Estimation for High-Speed Aeronautical Mobile Telemetry Applications

When & Where:

Tuesday, July 15, 2025 - 11:30 AM
Eaton Hall, Room 2001B

Committee Members:

Erik Perrins, Chair
Shannon Blunt
Morteza Hashemi
Jim Stiles
Craig McLaughlin

Abstract

The next generation of digital communications systems aims to operate in high-Doppler environments such as high-speed trains and non-terrestrial networks that utilize satellites in low-Earth orbit. Current generation systems use Orthogonal Frequency Division Multiplexing modulation which is known to suffer from inter-carrier interference (ICI) when different channel paths have dissimilar Doppler shifts.

A new Orthogonal Time Frequency Space (OTFS) modulation (also known as Delay-Doppler modulation) is proposed as a candidate modulation for 6G networks that is resilient to ICI. To date, OTFS demodulation designs have focused on the use cases of popular urban terrestrial channel models where path delay spread is a fraction of the OTFS symbol duration. However, wireless wide-area networks that operate in the aeronautical mobile telemetry (AMT) space can have large path delay spreads due to reflections from distant geographic features. This presents problems for existing channel estimation techniques which assume a small maximum expected channel delay, since data transmission is paused to sound the channel by an amount equal to twice the maximum channel delay. The dropout in data contributes to a reduction in spectral efficiency.

Our research addresses OTFS limitations in the AMT use case. We start with an exemplary OTFS framework with parameters optimized for AMT. Following system design, we focus on two distinct areas to improve OTFS performance in the AMT environment. First we propose a new channel estimation technique using a pilot signal superimposed over data that can measure large delay spread channels with no penalty in spectral efficiency. A successive interference cancellation algorithm is used to iteratively improve channel estimates and jointly decode data. A second aspect of our research aims to equalize in delay-Doppler space. In the delay-Doppler paradigm, the rapid channel variations seen in the time-frequency domain is transformed into a sparse quasi-stationary channel in the delay-Doppler domain. We propose to use machine learning using Gaussian Process Regression to take advantage of the sparse and stationary channel and learn the channel parameters to compensate for the effects of fractional Doppler in which simpler channel estimation techniques cannot mitigate. Both areas of research can advance the robustness of OTFS across all communications systems.

Mohammad Ful Hossain Seikh

AAFIYA: Antenna Analysis in Frequency-domain for Impedance and Yield Assessment

When & Where:

Thursday, July 10, 2025 - 10:00 AM
Eaton Hall, Room 2001B

Committee Members:

Jim Stiles, Chair
Rachel Jarvis
Alessandro Salandrino

Abstract

This project presents AAFIYA (Antenna Analysis in Frequency-domain for Impedance and Yield Assessment), a modular Python toolkit developed to automate and streamline the characterization and analysis of radiofrequency (RF) antennas using both measurement and simulation data. Motivated by the need for reproducible, flexible, and publication-ready workflows in modern antenna research, AAFIYA provides comprehensive support for all major antenna metrics, including S-parameters, impedance, gain and beam patterns, polarization purity, and calibration-based yield estimation. The toolkit features robust data ingestion from standard formats (such as Touchstone files and beam pattern text files), vectorized computation of RF metrics, and high-quality plotting utilities suitable for scientific publication.

Validation was carried out using measurements from industry-standard electromagnetic anechoic chamber setups involving both Log Periodic Dipole Array (LPDA) reference antennas and Askaryan Radio Array (ARA) Bottom Vertically Polarized (BVPol) antennas, covering a frequency range of 50–1500 MHz. Key performance metrics, such as broadband impedance matching, S11 and S21 related calculations, 3D realized gain patterns, vector effective lengths, and cross-polarization ratio, were extracted and compared against full-wave electromagnetic simulations (using HFSS and WIPL-D). The results demonstrate close agreement between measurement and simulation, confirming the reliability of the workflow and calibration methodology.

AAFIYA’s open-source, extensible design enables rapid adaptation to new experiments and provides a foundation for future integration with machine learning and evolutionary optimization algorithms. This work not only delivers a validated toolkit for antenna research and pedagogy but also sets the stage for next-generation approaches in automated antenna design, optimization, and performance analysis.

Soumya Baddham

Battling Toxicity: A Comparative Analysis of Machine Learning Models for Content Moderation

When & Where:

Monday, July 7, 2025 - 2:00 PM
Eaton Hall, Room 2001B

Committee Members:

David Johnson, Chair
Prasad Kulkarni
Hongyang Sun

Abstract

With the exponential growth of user-generated content, online platforms face unprecedented challenges in moderating toxic and harmful comments. Due to this, Automated content moderation has emerged as a critical application of machine learning, enabling platforms to ensure user safety and maintain community standards. Despite its importance, challenges such as severe class imbalance, contextual ambiguity, and the diverse nature of toxic language often compromise moderation accuracy, leading to biased classification performance.

This project presents a comparative analysis of machine learning approaches for a Multi-Label Toxic Comment Classification System using the Toxic Comment Classification dataset from Kaggle. The study examines the performance of traditional algorithms, such as Logistic Regression, Random Forest, and XGBoost, alongside deep architectures, including Bi-LSTM, CNN-Bi-LSTM, and DistilBERT. The proposed approach utilizes word-level embeddings across all models and examines the effects of architectural enhancements, hyperparameter optimization, and advanced training strategies on model robustness and predictive accuracy.

The study emphasizes the significance of loss function optimization and threshold adjustment strategies in improving the detection of minority classes. The comparative results reveal distinct performance trade-offs across model architectures, with transformer models achieving superior contextual understanding at the cost of computational complexity. At the same time, deep learning approaches(LSTM models) offer efficiency advantages. These findings establish evidence-based guidelines for model selection in real-world content moderation systems, striking a balance between accuracy requirements and operational constraints.

Past Defense Notices

Vincent Occhiogrosso

Development of Low-Cost Microwave and RF Modules for Compact, Fine-Resolution FMCW Radars

When & Where:

Monday, January 23, 2023 - 1:30 PM
Nichols Hall, Room 317 (Richard K. Moore Conference Room)

Committee Members:

Christopher Allen, Chair
Fernando Rodriguez-Morales, (Co-Chair)
Carl Leuschen

Abstract

The Center for Remote Sensing and Integrated Systems (CReSIS) has enabled the development of several radars for measuring ice and snow depth. One of these systems is the Ultra-Wideband (UWB) Snow Radar, which operates in microwave range and can provide measurements with cm-scale vertical resolution. To date, renditions of this system demand medium to high size, weight and power (SWaP) characteristics. To facilitate a more flexible and mobile measurement setup with these systems, it became necessary to reduce the SWaP of the radar electronics. This thesis focuses on the design of several compact RF and microwave modules enabling integration of a full UWB radar system weighing < 5 lbs and consuming < 30 W of DC power. This system is suitable for operation over either 12-18 GHz or 2-8 GHz in platforms with low SWaP requirements, such as unmanned aerial systems (UAS). The modules developed as a part of this work include a VCO-based chirp generation module, downconverter modules, and a set of modules for a receiver front end, each implemented on a low-cost laminate substrate. The chirp generator uses a Phase Locked Loop (PLL) based on an architecture previously developed at CReSIS and offers a small form factor with a frequency non-linearity of 0.0013% across the operating bandwidth (12-18 GHz) using sub-millisecond pulse durations. The down-conversion modules were created to allow for system operation in the S/C frequency band (2-8 GHz) as well as the default Ku band (12-18 GHz). Additionally, an RF receiver front end was designed, which includes a microwave receiver module for de-chirping and an IF module for signal conditioning before digitization. The compactness of the receiver modules enabled the demonstration of multi-channel data acquisition without multiplexing from two different aircraft. A radar test-bed largely based on this compact system was demonstrated in the laboratory and used as part of a dual-frequency instrument for a surface-based experiment in Antarctica. The laboratory performance of the miniaturized radar is comparable to the legacy 2-8 GHz snow radar and 12-18 GHz Ku-band radar systems. The 2-8 GHz system is currently being integrated into a class-I UAS.

Tianxiao Zhang

Efficient and Effective Convolutional Neural Networks for Object Detection and Recognition

When & Where:

Friday, December 16, 2022 - 1:00 PM
Nichols Hall, Room 246

Committee Members:

Bo Luo, Chair
Prasad Kulkarni
Fengjun Li
Cuncong Zhong
Guanghui Wang

Abstract

With the development of Convolutional Neural Networks (CNNs), computer vision enters a new era and the performance of image classification, object detection, segmentation, and recognition has been significantly improved. Object detection, as one of the fundamental problems in computer vision, is a necessary component of many computer vision tasks, such as image and video understanding, object tracking, instance segmentation, etc. In object detection, we need to not only recognize all defined objects in images or videos but also localize these objects, making it difficult to perfectly realize in real-world scenarios.

In this work, we aim to improve the performance of object detection and localization by adopting more efficient and effective CNN models. (1) We propose an effective and efficient approach for real-time detection and tracking of small golf balls based on object detection and the Kalman filter. For this purpose, we have collected and labeled thousands of golf ball images to train the learning model. We also implemented several classical object detection models and compared their performance in terms of detection precision and speed. (2) To address the domain shift problem in object detection, we propose to employ generative adversarial networks (GANs) to generate new images in different domains and then concatenate the original RGB images and their corresponding GAN-generated fake images to form a 6-channel representation of the image content. (3) We propose a strategy to improve label assignment in modern object detection models. The IoU (Intersection over Union) thresholds between the pre-defined anchors and the ground truth bounding boxes are significant to the definition of the positive and negative samples. Instead of using fixed thresholds or adaptive thresholds based on statistics, we introduced the predictions into the label assignment paradigm to dynamically define positive samples and negative samples so that more high-quality samples could be selected as positive samples. The strategy reduces the discrepancy between the classification scores and the IoU scores and yields more accurate bounding boxes.

Xiangyu Chen

Toward Data Efficient Learning in Computer Vision

When & Where:

Friday, December 16, 2022 - 11:00 AM
Nichols Hall, Room 246

Committee Members:

Cuncong Zhong, Chair
Prasad Kulkarni
Fengjun Li
Bo Luo
Guanghui Wang

Abstract

Deep learning leads the performance in many areas of computer vision. Deep neural networks usually require a large amount of data to train a good model with the growing number of parameters. However, collecting and labeling a large dataset is not always realistic, e.g. to recognize rare diseases in the medical field. In addition, both collecting and labeling data are labor-intensive and time-consuming. In contrast, studies show that humans can recognize new categories with even a single example, which is apparently in the opposite direction of current machine learning algorithms. Thus, data-efficient learning, where the labeled data scale is relatively small, has attracted increased attention recently. According to the key components of machine learning algorithms, data-efficient learning algorithms can also be divided into three folders, data-based, model-based, and optimization-based. In this study, we investigate two data-based models and one model-based approach.

First, to collect more data to increase data quantity. The most direct way for data-efficient learning is to generate more data to mimic data-rich scenarios. To achieve this, we propose to integrate both spatial and Discrete Cosine Transformation (DCT) based frequency representations to finetune the classifier. In addition to the quantity, another property of data is the quality to the model, different from the quality to human eyes. As language carries denser information than natural images. To mimic language, we propose to explicitly increase the input information density in the frequency domain. The goal of model-based methods in data-efficient learning is mainly to make models converge faster. After carefully examining the self-attention modules in Vision Transformers, we discover that trivial attention covers useful non-trivial attention due to its large amount. To solve this issue, we proposed to divide attention weights into trivial and non-trivial ones by thresholds and suppress the accumulated trivial attention weights. Extensive experiments have been performed to demonstrate the effectiveness of the proposed models.

Yousif Dafalla

Web-Armour: Mitigating Reconnaissance and Vulnerability Scanning with Injecting Scan-Impeding Delays in Web Deployments

When & Where:

Thursday, December 15, 2022 - 1:00 PM
Nichols Hall, Room 250 (Gemini Room)

Committee Members:

Alex Bardas, Chair
Drew Davidson
Fengjun Li
Bo Luo
ZJ Wang

Abstract

Scanning hosts on the internet for vulnerable devices and services is a key step in numerous cyberattacks. Previous work has shown that scanning is a widespread phenomenon on the internet and commonly targets web application/server deployments. Given that automated scanning is a crucial step in many cyberattacks, it would be beneficial to make it more difficult for adversaries to perform such activity.

In this work, we propose Web-Armour, a mitigation approach to adversarial reconnaissance and vulnerability scanning of web deployments. The proposed approach relies on injecting scanning impeding delays to infrequently or rarely used portions of a web deployment. Web-Armour has two goals: First, increase the cost for attackers to perform automated reconnaissance and vulnerability scanning; Second, introduce minimal to negligible performance overhead to benign users of the deployment. We evaluate Web-Armour on live environments, operated by real users, and on different controlled (offline) scenarios. We show that Web-Armour can effectively lead to thwarting reconnaissance and internet-wide scanning.

Sandhya Kandaswamy

An Empirical Evaluation of Multi-Resource Scheduling for Moldable Workflows

When & Where:

Friday, December 9, 2022 - 3:30 PM
Eaton Hall, Room 2001B

Committee Members:

Hongyang Sun, Chair
Suzanne Shontz
Heechul Yun

Abstract

Resource scheduling plays a vital role in High-Performance Computing (HPC) systems. However, most scheduling research in HPC has focused on only a single type of resource (e.g., computing cores or I/O resources). With the advancement in hardware architectures and the increase in data-intensive HPC applications, there is a need to simultaneously embrace a diverse set of resources (e.g., computing cores, cache, memory, I/O, and network resources) in the design of runtime schedulers for improving the overall application performance. This thesis performs an empirical evaluation of a recently proposed multi-resource scheduling algorithm for minimizing the overall completion time (or makespan) of computational workflows comprised of moldable parallel jobs. Moldable parallel jobs allow the scheduler to select the resource allocations at launch time and thus can adapt to the available system resources (as compared to rigid jobs) while staying easy to design and implement (as compared to malleable jobs). The algorithm was proven to have a worst-case approximation ratio that grows linearly with the number of resource types for moldable workflows. In this thesis, a comprehensive set of simulations is conducted to empirically evaluate the performance of the algorithm using synthetic workflows generated by DAGGEN and moldable jobs that exhibit different speedup profiles. The results show that the algorithm fares better than the theoretical bound predicts, and it consistently outperforms two baseline heuristics under a variety of parameter settings, illustrating its robust practical performance.

Bernaldo Luc

FPGA Implementation of an FFT-Based Carrier Frequency Estimation Algorithm

When & Where:

Friday, December 9, 2022 - 10:00 AM
Eaton Hall, Room 2001B

Committee Members:

Erik Perrins, Chair
Morteza Hashemi
Rongqing Hui

Abstract

Carrier synchronization is an essential part of digital communication systems. In essence, carrier synchronization is the process of estimating and correcting any carrier phase and frequency differences between the transmitted and received signals. Typically, carrier synchronization is achieved using a phase lock loop (PLL) system; however, this method is unreliable when experiencing frequency offsets larger than 30 kHz. This thesis evaluates the FPGA implementation of a combined FFT and PLL-based carrier phase synchronization system. The algorithm includes non-data-aided, FFT-based, frequency estimator used to initialize a data-aided, PLL-based phase estimator. The frequency estimator algorithm employs a resource-efficient strategy of averaging several small FFTs instead of using one large FFT, which results in a rough estimate of the frequency offset. Since it is initialized with a rough frequency estimate, this hybrid design allows the PLL to start in a state close to frequency lock and focus mainly on phase synchronization. The results show that the algorithm demonstrates comparable performance, based on performance metrics such as bit-error rate (BER) and estimator error variance, to alternative frequency estimation strategies and simulation models. Moreover, the FFT-initialized PLL approach improves the frequency acquisition range of the PLL while achieving similar BER performance as the PLL-only system.

Rakshitha Vidhyashankar

An empirical study of temporal knowledge graph and link prediction using longitudinal editorial data

When & Where:

Thursday, December 8, 2022 - 2:30 PM
Eaton Hall, Room 2001B

Committee Members:

Zijun Yao, Chair
Prasad Kulkarni
Hongyang Sun

Abstract

Natural Language Processing (NLP) is an application of Machine Learning (ML) which focuses on deriving useful and underlying facts through the semantics in articles to automatically extract insights about how information can be pictured, presented, and interpreted. Knowledge graphs, as a promising medium for carrying the structured linguistical piece, can be a desired target for learning and visualization through artificial neural networks, in order to identify the absent information and understand the hidden transitive relationship among them. In this study, we aim to construct Temporal Knowledge Graphs of sematic information to facilitate better visualization of editorial data. Further, A neural network-based approach for link prediction is carried out on the constructed knowledge graphs. This study uses news articles in English language, from New York Times (NYT) collected over a period of time for experiments. The sentences in these articles can be decomposed into Part-Of-Speech (POS) Tags to give a triple t = {sub, pred, obj}. A directed Graph G (V, E) is constructed using POS tags, such that the Set of Vertices is the grammatical constructs that appear in the sentence and the Set of Edges is the directed relation between the constructs. The main challenge that arises with knowledge graphs is the storage constraints that arise in lieu of storing the graph information. The study proposes ways by which this can be handled. Once these graphs are constructed, a neural architecture is trained to learn the graph embeddings which can be utilized to predict the potentially missing links which are transitive in nature. The results are evaluated using learning-to-rank metrics such Mean Reciprocal Rank (MRR).

Jace Kline

A Framework for Assessing Decompiler Inference Accuracy of Source-Level Program Constructs

When & Where:

Thursday, December 8, 2022 - 9:30 AM
Eaton Hall, Room 2001B

Committee Members:

Prasad Kulkarni, Chair
Perry Alexander
Bo Luo

Abstract

Decompilation is the process of reverse engineering a binary program into an equivalent source code representation with the objective to recover high-level program constructs such as functions, variables, data types, and control flow mechanisms. Decompilation is applicable in many contexts, particularly for security analysts attempting to decipher the construction and behavior of malware samples. However, due to the loss of information during compilation, this process is naturally speculative and thus is prone to inaccuracy. This inherent speculation motivates the idea of an evaluation framework for decompilers.

In this work, we present a novel framework to quantitatively evaluate the inference accuracy of decompilers, regarding functions, variables, and data types. Within our framework, we develop a domain-specific language (DSL) for representing such program information from any "ground truth" or decompiler source. Using our DSL, we implement a strategy for comparing ground truth and decompiler representations of the same program. Subsequently, we extract and present insightful metrics illustrating the accuracy of decompiler inference regarding functions, variables, and data types, over a given set of benchmark programs. We leverage our framework to assess the correctness of the Ghidra decompiler when compared to ground truth information scraped from DWARF debugging information. We perform this assessment over a subset of the GNU Core Utilities (Coreutils) programs and discuss our findings.

Jaypal Singh

EvalIt: Skill Evaluation using block chain

When & Where:

Wednesday, December 7, 2022 - 2:00 PM
Eaton Hall, Room 2001B

Committee Members:

Drew Davidson, Chair
David Johnson
Hongyang Sun

Abstract

Skills validation is a key issue when hiring workers. Companies and universities often face difficulties in determining an applicant's skills because certification of the skills claimed by an applicant is usually not readily verifiable and verification is costly. Also, from applicant's perspective, skill evaluation from industry expert is valuable instead of learning a generalized course with certification. Most of the certification programs are easy and proved not so fruitful in learning the required work skills. Blockchain has been proposed in the literature for functional verification and tamper-proof information storage in a decentralized way. "EvalIt" is a blockchain-based Dapp that addresses the above issues and guarantees some desirable properties. The Dapp facilitates skill evaluation efforts through payments using tokens that it collects from payments made by users of the platform.

Soma Pal

Properties of Profile-guided Compiler Optimization with GCC and LLVM

When & Where:

Wednesday, December 7, 2022 - 10:00 AM
Eaton Hall, Room 2001B

Committee Members:

Prasad Kulkarni, Chair
Mohammad Alian
Tamzidul Hoque

Abstract

Profile-guided optimizations (PGO) are a class of sophisticated compiler transformations that employ information regarding the profile or execution time behavior of a program to improve program performance, typically speed. PGOs for popular language platforms, like C, C++, and Java, are generally regarded as a mature and mainstream technology and are supported by most standard compilers. Consequently, properties and characteristics of PGOs are assumed to be established and known but have rarely been systematically studied with multiple mainstream compilers.

The goal of this work is to explore and report some important properties of PGOs in mainstream compilers, specifically GCC and LLVM in this work. We study the performance delivered by PGOs at the program and function-level, impact of different execution profiles on PGO performance, and compare relative PGO benefit delivered by different mainstream compilers. We also built the experimental framework to conduct this research. We expect that our work will help focus future research and assist in building frameworks to field PGOs in actual systems.