Defense Notices


All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Andrew Riachi

An Investigation Into The Memory Consumption of Web Browsers and A Memory Profiling Tool Using Linux Smaps

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Prasad Kulkarni, Chair
Perry Alexander
Drew Davidson
Heechul Yun

Abstract

Web browsers are notorious for consuming large amounts of memory. Yet, they have become the dominant framework for writing GUIs because the web languages are ergonomic for programmers and have a cross-platform reach. These benefits are so enticing that even a large portion of mobile apps, which have to run on resource-constrained devices, are running a web browser under the hood. Therefore, it is important to keep the memory consumption of web browsers as low as practicable.

In this thesis, we investigate the memory consumption of web browsers, in particular, compared to applications written in native GUI frameworks. We introduce smaps-profiler, a tool to profile the overall memory consumption of Linux applications that can report memory usage other profilers simply do not measure. Using this tool, we conduct experiments which suggest that most of the extra memory usage compared to native applications could be due the size of the web browser program itself. We discuss our experiments and findings, and conclude that even more rigorous studies are needed to profile GUI applications.


Elizabeth Wyss

A New Frontier for Software Security: Diving Deep into npm

When & Where:


Eaton Hall, Room 2001B

Committee Members:

Drew Davidson, Chair
Alex Bardas
Fengjun Li
Bo Luo
J. Walker

Abstract

Open-source package managers (e.g., npm for Node.js) have become an established component of modern software development. Rather than creating applications from scratch, developers may employ modular software dependencies and frameworks--called packages--to serve as building blocks for writing larger applications. Package managers make this process easy. With a simple command line directive, developers are able to quickly fetch and install packages across vast open-source repositories. npm--the largest of such repositories--alone hosts millions of unique packages and serves billions of package downloads each week. 

However, the widespread code sharing resulting from open-source package managers also presents novel security implications. Vulnerable or malicious code hiding deep within package dependency trees can be leveraged downstream to attack both software developers and the end-users of their applications. This downstream flow of software dependencies--dubbed the software supply chain--is critical to secure.

This research provides a deep dive into the npm-centric software supply chain, exploring distinctive phenomena that impact its overall security and usability. Such factors include (i) hidden code clones--which may stealthily propagate known vulnerabilities, (ii) install-time attacks enabled by unmediated installation scripts, (iii) hard-coded URLs residing in package code, (iv) the impacts of open-source development practices, (v) package compromise via malicious updates, (vi) spammers disseminating phishing links within package metadata, and (vii) abuse of cryptocurrency protocols designed to reward the creators of high-impact packages. For each facet, tooling is presented to identify and/or mitigate potential security impacts. Ultimately, it is our hope that this research fosters greater awareness, deeper understanding, and further efforts to forge a new frontier for the security of modern software supply chains. 


Alfred Fontes

Optimization and Trade-Space Analysis of Pulsed Radar-Communication Waveforms using Constant Envelope Modulations

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jonathan Owen


Abstract

Dual function radar communications (DFRC) is a method of co-designing a single radio frequency system to perform simultaneous radar and communications service. DFRC is ultimately a compromise between radar sensing performance and communications data throughput due to the conflicting requirements between the sensing and information-bearing signals.

A novel waveform-based DFRC approach is phase attached radar communications (PARC), where a communications signal is embedded onto a radar pulse via the phase modulation between the two signals. The PARC framework is used here in a new waveform design technique that designs the radar component of a PARC signal to match the PARC DFRC waveform expected power spectral density (PSD) to a desired spectral template. This provides better control over the PARC signal spectrum, which mitigates the issue of PARC radar performance degradation from spectral growth due to the communications signal. 

The characteristics of optimized PARC waveforms are then analyzed to establish a trade-space between radar and communications performance within a PARC DFRC scenario. This is done by sampling the DFRC trade-space continuum with waveforms that contain a varying degree of communications bandwidth, from a pure radar waveform (no embedded communications) to a pure communications waveform (no radar component). Radar performance, which is degraded by range sidelobe modulation (RSM) from the communications signal randomness, is measured from the PARC signal variance across pulses; data throughput is established as the communications performance metric. Comparing the values of these two measures as a function of communications symbol rate explores the trade-offs in performance between radar and communications with optimized PARC waveforms.


Qua Nguyen

Hybrid Array and Privacy-Preserving Signaling Optimization for NextG Wireless Communications

When & Where:


Zoom Defense, please email jgrisafe@ku.edu for link.

Committee Members:

Erik Perrins, Chair
Morteza Hashemi
Zijun Yao
Taejoon Kim
KC Kong

Abstract

This PhD research tackles two critical challenges in NextG wireless networks: hybrid precoder design for wideband sub-Terahertz (sub-THz) massive multiple-input multiple-output (MIMO) communications and privacy-preserving federated learning (FL) over wireless networks.

In the first part, we propose a novel hybrid precoding framework that integrates true-time delay (TTD) devices and phase shifters (PS) to counteract the beam squint effect - a significant challenge in the wideband sub-THz massive MIMO systems that leads to considerable loss in array gain. Unlike previous methods that only designed TTD values while fixed PS values and assuming unbounded time delay values, our approach jointly optimizes TTD and PS values under realistic time delays constraint. We determine the minimum number of TTD devices required to achieve a target array gain using our proposed approach. Then, we extend the framework to multi-user wideband systems and formulate a hybrid array optimization problem aiming to maximize the minimum data rate across users. This problem is decomposed into two sub-problems: fair subarray allocation, solved via continuous domain relaxation, and subarray gain maximization, addressed via a phase-domain transformation.

The second part focuses on preserving privacy in FL over wireless networks. First, we design a differentially-private FL algorithm that applies time-varying noise variance perturbation. Taking advantage of existing wireless channel noise, we jointly design differential privacy (DP) noise variances and users transmit power to resolve the tradeoffs between privacy and learning utility. Next, we tackle two critical challenges within FL networks: (i) privacy risks arising from model updates and (ii) reduced learning utility due to quantization heterogeneity. Prior work typically addresses only one of these challenges because maintaining learning utility under both privacy risks and quantization heterogeneity is a non-trivial task. We approach to improve the learning utility of a privacy-preserving FL that allows clusters of devices with different quantization resolutions to participate in each FL round. Specifically, we introduce a novel stochastic quantizer (SQ) that ensures a DP guarantee and minimal quantization distortion. To address quantization heterogeneity, we introduce a cluster size optimization technique combined with a linear fusion approach to enhance model aggregation accuracy. Lastly, inspired by the information-theoretic rate-distortion framework, a privacy-distortion tradeoff problem is formulated to minimize privacy loss under a given maximum allowable quantization distortion. The optimal solution to this problem is identified, revealing that the privacy loss decreases as the maximum allowable quantization distortion increases, and vice versa.

This research advances hybrid array optimization for wideband sub-THz massive MIMO and introduces novel algorithms for privacy-preserving quantized FL with diverse precision. These contributions enable high-throughput wideband MIMO communication systems and privacy-preserving AI-native designs, aligning with the performance and privacy protection demands of NextG networks.


Arin Dutta

Performance Analysis of Distributed Raman Amplification with Different Pumping Configurations

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Rongqing Hui, Chair
Morteza Hashemi
Rachel Jarvis
Alessandro Salandrino
Hui Zhao

Abstract

As internet services like high-definition videos, cloud computing, and artificial intelligence keep growing, optical networks need to keep up with the demand for more capacity. Optical amplifiers play a crucial role in offsetting fiber loss and enabling long-distance wavelength division multiplexing (WDM) transmission in high-capacity systems. Various methods have been proposed to enhance the capacity and reach of fiber communication systems, including advanced modulation formats, dense wavelength division multiplexing (DWDM) over ultra-wide bands, space-division multiplexing, and high-performance digital signal processing (DSP) technologies. To maintain higher data rates along with maximizing the spectral efficiency of multi-level modulated signals, a higher Optical Signal-to-Noise Ratio (OSNR) is necessary. Despite advancements in coherent optical communication systems, the spectral efficiency of multi-level modulated signals is ultimately constrained by fiber nonlinearity. Raman amplification is an attractive solution for wide-band amplification with low noise figures in multi-band systems.

Distributed Raman Amplification (DRA) have been deployed in recent high-capacity transmission experiments to achieve a relatively flat signal power distribution along the optical path and offers the unique advantage of using conventional low-loss silica fibers as the gain medium, effectively transforming passive optical fibers into active or amplifying waveguides. Also, DRA provides gain at any wavelength by selecting the appropriate pump wavelength, enabling operation in signal bands outside the Erbium doped fiber amplifier (EDFA) bands. Forward (FW) Raman pumping configuration in DRA can be adopted to further improve the DRA performance as it is more efficient in OSNR improvement because the optical noise is generated near the beginning of the fiber span and attenuated along the fiber. Dual-order FW pumping scheme helps to reduce the non-linear effect of the optical signal and improves OSNR by more uniformly distributing the Raman gain along the transmission span.

The major concern with Forward Distributed Raman Amplification (FW DRA) is the fluctuation in pump power, known as relative intensity noise (RIN), which transfers from the pump laser to both the intensity and phase of the transmitted optical signal as they propagate in the same direction. Additionally, another concern of FW DRA is the rise in signal optical power near the start of the fiber span, leading to an increase in the non-linear phase shift of the signal. These factors, including RIN transfer-induced noise and non-linear noise, contribute to the degradation of system performance in FW DRA systems at the receiver.

As the performance of DRA with backward pumping is well understood with relatively low impact of RIN transfer, our research  is focused on the FW pumping configuration, and is intended to provide a comprehensive analysis on the system performance impact of dual order FW Raman pumping, including signal intensity and phase noise induced by the RINs of both 1st and the 2nd order pump lasers, as well as the impacts of linear and nonlinear noise. The efficiencies of pump RIN to signal intensity and phase noise transfer are theoretically analyzed and experimentally verified by applying a shallow intensity modulation to the pump laser to mimic the RIN. The results indicate that the efficiency of the 2nd order pump RIN to signal phase noise transfer can be more than 2 orders of magnitude higher than that from the 1st order pump. Then the performance of the dual order FW Raman configurations is compared with that of single order Raman pumping to understand trade-offs of system parameters. The nonlinear interference (NLI) noise is analyzed to study the overall OSNR improvement when employing a 2nd order Raman pump. Finally, a DWDM system with 16-QAM modulation is used as an example to investigate the benefit of DRA with dual order Raman pumping and with different pump RIN levels. We also consider a DRA system using a 1st order incoherent pump together with a 2nd order coherent pump. Although dual order FW pumping corresponds to a slight increase of linear amplified spontaneous emission (ASE) compared to using only a 1st order pump, its major advantage comes from the reduction of nonlinear interference noise in a DWDM system. Because the RIN of the 2nd order pump has much higher impact than that of the 1st order pump, there should be more stringent requirement on the RIN of the 2nd order pump laser when dual order FW pumping scheme is used for DRA for efficient fiber-optic communication. Also, the result of system performance analysis reveals that higher baud rate systems, like those operating at 100Gbaud, are less affected by pump laser RIN due to the low-pass characteristics of the transfer of pump RIN to signal phase noise.


Audrey Mockenhaupt

Using Dual Function Radar Communication Waveforms for Synthetic Aperture Radar Automatic Target Recognition

When & Where:


Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jon Owen


Abstract

As machine learning (ML), artificial intelligence (AI), and deep learning continue to advance, their applications become more diverse – one such application is synthetic aperture radar (SAR) automatic target recognition (ATR). These SAR ATR networks use different forms of deep learning such as convolutional neural networks (CNN) to classify targets in SAR imagery. An emerging research area of SAR is dual function radar communication (DFRC) which performs both radar and communications functions using a single co-designed modulation. The utilization of DFRC emissions for SAR imaging impacts image quality, thereby influencing SAR ATR network training. Here, using the Civilian Vehicle Data Dome dataset from the AFRL, SAR ATR networks are trained and evaluated with simulated data generated using Gaussian Minimum Shift Keying (GMSK) and Linear Frequency Modulation (LFM) waveforms. The networks are used to compare how the target classification accuracy of the ATR network differ between DFRC (i.e., GMSK) and baseline (i.e., LFM) emissions. Furthermore, as is common in pulse-agile transmission structures, an effect known as ’range sidelobe modulation’ is examined, along with its impact on SAR ATR. Finally, it is shown that SAR ATR network can be trained for GMSK emissions using existing LFM datasets via two types of data augmentation.


Past Defense Notices

Dates

Ronald Andrews

Evaluating the Proliferation and Pervasiveness of Leaking Sensitive Data in the Secure Shell Protocol and in Internet Protocol Camera Frameworks

When & Where:


246 Nichols Hall

Committee Members:

Alex Bardas, Chair
Fengjun Li
Bo Luo


Abstract

In George Orwell's 1984, there is fear regarding what “Big Brother”, knows due to the fact that even thoughts could be “heard”. Though we are not quite to this point, it should concern us all in what data we are transferring, both intentionally and unintentionally, and whether or not that data is being “leaked”. In this work, we consider the evolving landscape of IoT devices and the threat posed by the pervasive botnets that have been forming over the last several years. We look at two specific cases in this work. One being the practical application of a botnet system actively executing a Man in the Middle Attack against SSH, and the other leveraging the same paradigm as a case of eavesdropping on Internet Protocol (IP) cameras. For the latter case, we construct a web portal for interrogating IP cameras directly for information that they may be exposing. ​


Kevin Carr

Development of a Multichannel Wideband Radar Demonstrator

When & Where:


317 Nichols Hall, (Moore Conference Room)

Committee Members:

Carl Leuschen, Chair
Fernando Rodriguez-Morales
James Stiles


Abstract

With the rise of software defined radios (SDR) and the trend towards integrating more RF components into MMICs the cost and complexity of multichannel radar development has gone down. High-speed RF data converters have seen continuous increases in both sampling rate and resolution, further rendering a growing subset of components in an RF chain unnecessary. A recent development in this trend is the Xilinx RFSoC, which integrates multiple high speed data converters into the same package as an FPGA. The Center for Remote Sensing of Ice Sheets (CReSIS) is regularly upgrading its suite of sensor platforms spanning from HF depth sounders to Ka band altimeters. A radar platform was developed around the RFSoC to demonstrate the capabilities of the chip when acting as a digital backend and evaluate its role in future radar designs at CReSIS. A new ultra-wideband (UWB) FMCW RF frontend was designed that consists of multiple transmit and receive modules operating at microwave frequencies with multi-GHz bandwidth. An antenna array was constructed out of printed-circuit elements to validate radar system performance. Firmware developed for the RFSoC enables radar features that will prove useful in future sensor platforms used for the remote sensing of snow, soil moisture, or crop canopies.

 


Ruturaj Kiran Vaidya

Implementing SoftBound on Binary Executables

When & Where:


2001 B Eaton Hall

Committee Members:

Prasad Kulkarni, Chair
Alex Bardas
Drew Davidson


Abstract

Though languages like C and C++ are known to be memory unsafe, they are still used widely in industry because of their memory management features, low level nature and performance benefits. Also, as most of the systems software has been written using these languages, replacing them with memory safe languages altogether is currently impossible. Memory safety violations are commonplace, despite the fact that that there have been numerous attempts made to conquer them using source code, compiler and post compilation based approaches. SoftBound is a compiler-based technique that enforces spatial memory safety for C/C++ programs. However, SoftBound needs and depends on program information available in the high-level source code. The goal of our work is to develop a mechanism to efficiently and effectively implement a technique, like SoftBound, to provide spatial memory safety for binary executables. Our approach employs a combination of static-time analysis (using Ghidra) and dynamic-time instrumentation checks (using PIN). Softbound is a pointer based approach, which stores base and bound information per pointer. Our implementation determines the array and pointer access patterns statically using reverse engineering techniques in Ghidra. This static information is used by the Pin dynamic binary instrumentation tool to check the correctness of each load and store instruction at run-time. Our technique works without any source code support and no hardware or compiler alterations are needed. We evaluate the effectiveness, limitations, and performance of our implementation. Our tool detects spatial memory errors in about 57% of the test cases and induces about 6% average overhead over that caused by a minimal pintool.


Chinmay Ratnaparkhi

A comparison of data mining based on a single local probabilistic approximation and the MLEM2 algorithm

When & Where:


2001 B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Fengjun Li
Bo Luo


Abstract

Observational data produced in scientific experimentation and in day to day life is a valuable source of information for research. It can be challenging to extract meaningful inferences from large amounts of data. Data mining offers many algorithms to draw useful inferences from large pools of information based on observable patterns.

In this project I have implemented one such data mining algorithm for determining a single local probabilistic approximation, which also computes the corresponding ruleset; and compared it with two versions of the MLEM2 algorithm which induce a certain rule set and a possible rule set respectively. For experimentation, eight data sets with 35% missing values were used to induce corresponding rulesets and classify unseen cases. Two different interpretations of missing values were used, namely, lost values and do not care conditions. k-fold cross validation technique was employed with k=10 to identify error rates in classification. 

The goal of this project was to compare how accurately unseen cases are classified by the rulesets induced by each of the aforementioned algorithms. Error rate calculated from the k-fold cross validation technique was also used to observe how each type of interpretation of missing values affects the ruleset.


Govind Vedala

Digital Compensation of Transmission Impairments in Multi-Subcarrier Fiber Optic Transmission Systems

When & Where:


246 Nichols Hall

Committee Members:

Ron Hui, Chair
Christopher Allen
Erik Perrins
Alessandro Salandrino
Carey Johnson

Abstract

Time and again, fiber optic medium has proved to be the best means for transporting global data traffic which is following an exponential growth trajectory. Rapid development of high bandwidth applications since the past decade based on virtual reality, 5G and big data to name a few have resulted in a sudden surge of research activities across the globe to maximize effective utilization of available fiber bandwidth which until then was supporting low speed services like voice and low bandwidth data traffic. To this end, higher order modulation formats together with multi-subcarrier superchannel based fiber optic transmission systems have proved to enhance spectral efficiency and achieve multi terabit per second data rates. However, spectrally efficient systems are extremely sensitive to transmission impairments stemming from both optical devices and fiber itself. Therefore, such systems mandate the use of robust digital signal processing (DSP) to compensate and/or mitigate the undesired artifacts, thereby extending the transmission reach. The central theme of this dissertation is to propose and validate few efficient DSP techniques to compensate specific impairments as delineated in the next three paragraphs.
For short reach applications, we experimentally demonstrate a digital compensation technique to undo semiconductor optical amplifier (SOA) and photodiode nonlinearity effects by digitally backpropagating the received signal through a virtual SOA with inverse gain characteristics followed by an iterative algorithm to cancel signal-signal beat interference arising from photodiode. We characterize the phase dynamics of comb lines from a quantum dot passive mode locked laser based on a novel multiheterodyne coherent detection technique. In the context of multi-subcarrier, Nyquist pulse shaped, superchannel transmission system with coherent detection, we demonstrate through measurements and numerical simulations an efficient phase noise compensation technique called “Digital Mixing” that operates using a shared pilot tone exploiting the mutual phase coherence among the comb lines.
Finally, we propose and experimentally validate a practical pilot aided relative phase noise compensation technique for forward pumped distributed Raman amplified, digital subcarrier multiplexed coherent transmission systems.


Tong Xu

Real-time DSP-enabled digital subcarrier cross-connect (DSXC) for optical communication systems and networks

When & Where:


246 Nichols Hall

Committee Members:

Ron Hui, Chair
Christopher Allen
Esam Eldin Aly
Erik Perrins
Jie Han

Abstract

Elastic optical networking (EON) is intended to offer flexible channel wavelength granularity to meet the requirement of high spectral efficiency (SE) in today’s optical networks. However, optical cross-connects (OXC) and switches based on optical wavelength division multiplexing (WDM) are not flexible enough due to the coarse bandwidth granularity imposed by optical filtering. Thus, OXC may not meet the requirements of many applications which require finer bandwidth granularities than that carried by an entire wavelength channel. 

 In order to achieve highly flexible and fine enough bandwidth granularities, electrical digital subcarrier cross-connect (DSXC) can be utilized in EON. As presented in this thesis, my research work focuses on the investigation and implementation of real-time digital signal processing (DSP) enabled DSXC which can dynamically assign both bandwidth and power to each individual sub-wavelength channel, known as subcarrier. This DSXC is based on digital sub-carrier multiplexing (DSCM), which is a frequency division multiplexing (FDM) technique that multiplexes a large number of digitally created subcarriers on each optical wavelength. Compared with OXC based on optical WDM, DSXC based on DSCM has much finer bandwidth granularities and flexibilities for dynamic bandwidth allocation. 

Based on a field programmable gate array (FPGA) hardware platform, we have designed and implemented a real-time DSP enabled DSXC which uses Nyquist FDM as the multiplexing scheme. For the first time, we demonstrated resampling filters for channel selection and frequency translation, which enabled real-time DSXC. This circuit-based DSXC supports flexible and fine data-rate subcarrier channel granularities, offering a low latency data plane, transparency to modulation formats, and the capability of compensating transmission impairments in the digital domain. The experimentally demonstrated 8×8 DSXC makes use of a Virtex-7 FPGA platform, which supports any-to-any switching of eight subcarrier channels with mixed modulation formats and data rates. Digital resampling filters, which enable frequency selections and translations of multiple subcarrier channels, have much lower DSP complexity and reduced FPGA resources requirements (DSP slices used in FPGA) in comparison to the traditional technique based on I/Q mixing and filtering.

We have also investigated the feasibility of using the distributed arithmetic (DA) architecture for real-time DSXC to completely eliminate the need of DSP slices in FPGA implementation. For the first time, we experimentally demonstrated the implementation of real-time frequency translation and channel selection based on the DA architecture in the same FPGA platform. Compared with resampling filters that leverage multipliers, the DA-based approach eliminates the need of DSP slices in the FPGA implementation and significantly reduces the hardware cost. In addition, by requiring the time of only a few clock cycles, a DA-based resampling filter is significantly faster when compared to a conventional FIR filter whose overall latency is proportional to the filter order. The DA-based DSXC is, therefore, able to achieve not only the improved spectral efficiency, programmability of multiple orthogonal subcarrier channels, and low hardware resources requirements, but also much reduced cross-connection latency when implemented in a real-time DSP hardware platform. This reduced latency of cross-connect switching can be critically important for time-sensitive applications such as 5G mobile fronthaul, cloud radio access network (C-RAN), cloud-based robot control, tele-surgery and network gaming.


Levi Goodman

Dual Mode W-Band Radar for Range Finding, Static Clutter Suppression & Moving Target Detection

When & Where:


250 Nichols Hall

Committee Members:

Christopher Allen, Chair
Shannon Blunt
James Stiles


Abstract

Many radar applications today require accurate, real-time, unambiguous measurement of target range and radial velocity.  Obstacles that frequently prevent target detection are the presence of noise and the overwhelming backscatter from other objects, referred to as clutter.

In this thesis, a method of static clutter suppression is proposed to increase detectability of moving targets in high clutter environments.  An experimental dual-purpose, single-mode, monostatic FMCW radar, operating at 108 GHz, is used to map the range of stationary targets and determine range and velocity of moving targets.  By transmitting a triangular waveform, which consists of alternating upchirps and downchirps, the received echo signals can be separated into two complementary data sets, an upchirp data set and a downchirp data set.  In one data set, the return signals from moving targets are spectrally isolated (separated in frequency) from static clutter return signals.  The static clutter signals in that first data set are then used to suppress the static clutter in the second data set, greatly improving detectability of moving targets.  Once the moving target signals are recovered from each data set, they are then used to solve for target range and velocity simultaneously.

The moving target of interest for tests performed was a reusable paintball (reball).  Reball range and velocity were accurately measured at distances up to 5 meters and at speeds greater than 90 m/s (200 mph) with a deceleration of approximately 0.155 m/s/ms (meters per second per millisecond).  Static clutter suppression of up to 25 dB was achieved, while moving target signals only suffered a loss of about 3 dB.

 


Ruoting Zheng

Algorithms for Computing Maximal Consistent Blocks

When & Where:


2001 B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Prasad Kulkarni
Bo Luo


Abstract

Rough set theory is a tool to deal with uncertain and incomplete data. It has been successfully used in classification, machine learning and automated knowledge acquisition. A maximal consistent block defined using rough set theory, is used for rule acquisition.

Maximal consistent block technique is applied to acquire knowledge in incomplete data sets by analyzing the structure of a similarity class. 

The main objective of this project is to implement and compare the algorithms for computing the maximal consistent blocks. The brute force method, recursive method and hierarchical method were designed for the data sets with missing attribute values interpreted only as “do not care” conditions. In this project, we extend these algorithms so they can be applied to arbitrary interpretations of missing attribute values, and an approach for computing maximal consistent blocks on the data sets with lost values is introduced in this project. Besides, we found that the brute force method and recursive method have problems dealing with the data sets for which characteristic sets are not transitive, so the limitations of the algorithms and a simplified recursive method are provided in the project as well.


Hao Xue

Trust and Credibility in Online Social Networks

When & Where:


246 Nichols Hall

Committee Members:

Fengjun Li, Chair
Prasad Kulkarni
Bo Luo
Cuncong Zhong
Mei Liu

Abstract

Increasing portions of people's social and communicative activities now take place in the digital world. The growth and popularity of online social networks (OSNs) have tremendously facilitate the online interaction and information exchange. Not only normal users benefit from OSNs as more people now rely online information for news, opinions, and social networking, but also companies and business owners who utilize OSNs as platforms for gathering feedback and marketing activities. As OSNs enable people to communicate more effectively, a large volume of user  generated content (UGC) is produced daily. However, the freedom and ease of of publishing information online has made these systems no longer the sources of reliable information. Not only does biased and misleading information exist, financial incentives drive individual and professional spammers to insert deceptive content and promote harmful information, which jeopardizes the ecosystems of OSNs.
In this dissertation, we present our work of measuring the credibility of information and detect content polluters in OSNs. Firstly, we assume that review spammers spend less effort in maintain social connections and propose to utilize the social relationships and rating deviations to assist the computation of trustworthiness of users. Compared to numeric ratings, textual content contains richer information about the actual opinion of a user toward a target. Thus, we propose a content-based trust propagation framework by extracting the opinions expressed in review content. In addition, we discover that the surrounding network around a user could also provide valuable information about the user himself. Lastly, we study the problem of detecting social bots by utilizing the characteristics of surrounding neighborhood networks.


Casey Sader

Taming WOLF: Building a More Functional and User-Friendly Framework

When & Where:


2001 B Eaton Hall

Committee Members:

Michael Branicky , Chair
Bo Luo
Suzanne Shontz


Abstract

Machine learning is all about automation. Many tools have been created to help data scientists automate repeated tasks and train models. These tools require varying levels of user experience to be used effectively. The ``machine learning WOrk fLow management Framework" (WOLF) aims to automate the machine learning pipeline. One of its key uses is to discover which machine learning model and hyper-parameters are the best configuration for a dataset. In this project, features were explored that could be added to make WOLF behave as a full pipeline in order to be helpful for novice and experienced data scientists alike. One feature to make WOLF more accessible is a website version that can be accessed from anywhere and make using WOLF much more intuitive. To keep WOLF aligned with the most recent trends and models, the ability to train a neural network using the TensorFlow framework and Keras library were added. This project also introduced the ability to pickle and save trained models. Designing the option for using the models to make predictions within the WOLF framework on another collection of data is a fundamental side-effect of saving the models. Understanding how the model makes predictions is a beneficial component of machine learning. This project aids in that understanding by calculating and reporting the relative importance of the dataset features for the given model. Incorporating all these additions to WOLF makes it a more functional and user-friendly framework for machine learning tasks.