Defense Notices

All students and faculty are welcome to attend the final defense of EECS graduate students completing their M.S. or Ph.D. degrees. Defense notices for M.S./Ph.D. presentations for this year and several previous years are listed below in reverse chronological order.

Students who are nearing the completion of their M.S./Ph.D. research should schedule their final defenses through the EECS graduate office at least THREE WEEKS PRIOR to their presentation date so that there is time to complete the degree requirements check, and post the presentation announcement online.

Upcoming Defense Notices

Andrew Riachi

An Investigation Into The Memory Consumption of Web Browsers and A Memory Profiling Tool Using Linux Smaps

When & Where:

Friday, July 25, 2025 - 1:00 PM
Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Prasad Kulkarni, Chair
Perry Alexander
Drew Davidson
Heechul Yun

Abstract

Web browsers are notorious for consuming large amounts of memory. Yet, they have become the dominant framework for writing GUIs because the web languages are ergonomic for programmers and have a cross-platform reach. These benefits are so enticing that even a large portion of mobile apps, which have to run on resource-constrained devices, are running a web browser under the hood. Therefore, it is important to keep the memory consumption of web browsers as low as practicable.

In this thesis, we investigate the memory consumption of web browsers, in particular, compared to applications written in native GUI frameworks. We introduce smaps-profiler, a tool to profile the overall memory consumption of Linux applications that can report memory usage other profilers simply do not measure. Using this tool, we conduct experiments which suggest that most of the extra memory usage compared to native applications could be due the size of the web browser program itself. We discuss our experiments and findings, and conclude that even more rigorous studies are needed to profile GUI applications.

Elizabeth Wyss

A New Frontier for Software Security: Diving Deep into npm

When & Where:

Wednesday, July 23, 2025 - 9:30 AM
Eaton Hall, Room 2001B

Committee Members:

Drew Davidson, Chair
Alex Bardas
Fengjun Li
Bo Luo
J. Walker

Abstract

Open-source package managers (e.g., npm for Node.js) have become an established component of modern software development. Rather than creating applications from scratch, developers may employ modular software dependencies and frameworks--called packages--to serve as building blocks for writing larger applications. Package managers make this process easy. With a simple command line directive, developers are able to quickly fetch and install packages across vast open-source repositories. npm--the largest of such repositories--alone hosts millions of unique packages and serves billions of package downloads each week.

However, the widespread code sharing resulting from open-source package managers also presents novel security implications. Vulnerable or malicious code hiding deep within package dependency trees can be leveraged downstream to attack both software developers and the end-users of their applications. This downstream flow of software dependencies--dubbed the software supply chain--is critical to secure.

This research provides a deep dive into the npm-centric software supply chain, exploring distinctive phenomena that impact its overall security and usability. Such factors include (i) hidden code clones--which may stealthily propagate known vulnerabilities, (ii) install-time attacks enabled by unmediated installation scripts, (iii) hard-coded URLs residing in package code, (iv) the impacts of open-source development practices, (v) package compromise via malicious updates, (vi) spammers disseminating phishing links within package metadata, and (vii) abuse of cryptocurrency protocols designed to reward the creators of high-impact packages. For each facet, tooling is presented to identify and/or mitigate potential security impacts. Ultimately, it is our hope that this research fosters greater awareness, deeper understanding, and further efforts to forge a new frontier for the security of modern software supply chains.

Alfred Fontes

Optimization and Trade-Space Analysis of Pulsed Radar-Communication Waveforms using Constant Envelope Modulations

When & Where:

Tuesday, July 22, 2025 - 2:00 PM
Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jonathan Owen

Abstract

Dual function radar communications (DFRC) is a method of co-designing a single radio frequency system to perform simultaneous radar and communications service. DFRC is ultimately a compromise between radar sensing performance and communications data throughput due to the conflicting requirements between the sensing and information-bearing signals.

A novel waveform-based DFRC approach is phase attached radar communications (PARC), where a communications signal is embedded onto a radar pulse via the phase modulation between the two signals. The PARC framework is used here in a new waveform design technique that designs the radar component of a PARC signal to match the PARC DFRC waveform expected power spectral density (PSD) to a desired spectral template. This provides better control over the PARC signal spectrum, which mitigates the issue of PARC radar performance degradation from spectral growth due to the communications signal.

The characteristics of optimized PARC waveforms are then analyzed to establish a trade-space between radar and communications performance within a PARC DFRC scenario. This is done by sampling the DFRC trade-space continuum with waveforms that contain a varying degree of communications bandwidth, from a pure radar waveform (no embedded communications) to a pure communications waveform (no radar component). Radar performance, which is degraded by range sidelobe modulation (RSM) from the communications signal randomness, is measured from the PARC signal variance across pulses; data throughput is established as the communications performance metric. Comparing the values of these two measures as a function of communications symbol rate explores the trade-offs in performance between radar and communications with optimized PARC waveforms.

Qua Nguyen

Hybrid Array and Privacy-Preserving Signaling Optimization for NextG Wireless Communications

When & Where:

Tuesday, July 22, 2025 - 11:00 AM
Zoom Defense, please email jgrisafe@ku.edu for link.

Committee Members:

Erik Perrins, Chair
Morteza Hashemi
Zijun Yao
Taejoon Kim
KC Kong

Abstract

This PhD research tackles two critical challenges in NextG wireless networks: hybrid precoder design for wideband sub-Terahertz (sub-THz) massive multiple-input multiple-output (MIMO) communications and privacy-preserving federated learning (FL) over wireless networks.

In the first part, we propose a novel hybrid precoding framework that integrates true-time delay (TTD) devices and phase shifters (PS) to counteract the beam squint effect - a significant challenge in the wideband sub-THz massive MIMO systems that leads to considerable loss in array gain. Unlike previous methods that only designed TTD values while fixed PS values and assuming unbounded time delay values, our approach jointly optimizes TTD and PS values under realistic time delays constraint. We determine the minimum number of TTD devices required to achieve a target array gain using our proposed approach. Then, we extend the framework to multi-user wideband systems and formulate a hybrid array optimization problem aiming to maximize the minimum data rate across users. This problem is decomposed into two sub-problems: fair subarray allocation, solved via continuous domain relaxation, and subarray gain maximization, addressed via a phase-domain transformation.

The second part focuses on preserving privacy in FL over wireless networks. First, we design a differentially-private FL algorithm that applies time-varying noise variance perturbation. Taking advantage of existing wireless channel noise, we jointly design differential privacy (DP) noise variances and users transmit power to resolve the tradeoffs between privacy and learning utility. Next, we tackle two critical challenges within FL networks: (i) privacy risks arising from model updates and (ii) reduced learning utility due to quantization heterogeneity. Prior work typically addresses only one of these challenges because maintaining learning utility under both privacy risks and quantization heterogeneity is a non-trivial task. We approach to improve the learning utility of a privacy-preserving FL that allows clusters of devices with different quantization resolutions to participate in each FL round. Specifically, we introduce a novel stochastic quantizer (SQ) that ensures a DP guarantee and minimal quantization distortion. To address quantization heterogeneity, we introduce a cluster size optimization technique combined with a linear fusion approach to enhance model aggregation accuracy. Lastly, inspired by the information-theoretic rate-distortion framework, a privacy-distortion tradeoff problem is formulated to minimize privacy loss under a given maximum allowable quantization distortion. The optimal solution to this problem is identified, revealing that the privacy loss decreases as the maximum allowable quantization distortion increases, and vice versa.

This research advances hybrid array optimization for wideband sub-THz massive MIMO and introduces novel algorithms for privacy-preserving quantized FL with diverse precision. These contributions enable high-throughput wideband MIMO communication systems and privacy-preserving AI-native designs, aligning with the performance and privacy protection demands of NextG networks.

Arin Dutta

Performance Analysis of Distributed Raman Amplification with Different Pumping Configurations

When & Where:

Monday, July 21, 2025 - 10:00 AM
Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Rongqing Hui, Chair
Morteza Hashemi
Rachel Jarvis
Alessandro Salandrino
Hui Zhao

Abstract

As internet services like high-definition videos, cloud computing, and artificial intelligence keep growing, optical networks need to keep up with the demand for more capacity. Optical amplifiers play a crucial role in offsetting fiber loss and enabling long-distance wavelength division multiplexing (WDM) transmission in high-capacity systems. Various methods have been proposed to enhance the capacity and reach of fiber communication systems, including advanced modulation formats, dense wavelength division multiplexing (DWDM) over ultra-wide bands, space-division multiplexing, and high-performance digital signal processing (DSP) technologies. To maintain higher data rates along with maximizing the spectral efficiency of multi-level modulated signals, a higher Optical Signal-to-Noise Ratio (OSNR) is necessary. Despite advancements in coherent optical communication systems, the spectral efficiency of multi-level modulated signals is ultimately constrained by fiber nonlinearity. Raman amplification is an attractive solution for wide-band amplification with low noise figures in multi-band systems.

Distributed Raman Amplification (DRA) have been deployed in recent high-capacity transmission experiments to achieve a relatively flat signal power distribution along the optical path and offers the unique advantage of using conventional low-loss silica fibers as the gain medium, effectively transforming passive optical fibers into active or amplifying waveguides. Also, DRA provides gain at any wavelength by selecting the appropriate pump wavelength, enabling operation in signal bands outside the Erbium doped fiber amplifier (EDFA) bands. Forward (FW) Raman pumping configuration in DRA can be adopted to further improve the DRA performance as it is more efficient in OSNR improvement because the optical noise is generated near the beginning of the fiber span and attenuated along the fiber. Dual-order FW pumping scheme helps to reduce the non-linear effect of the optical signal and improves OSNR by more uniformly distributing the Raman gain along the transmission span.

The major concern with Forward Distributed Raman Amplification (FW DRA) is the fluctuation in pump power, known as relative intensity noise (RIN), which transfers from the pump laser to both the intensity and phase of the transmitted optical signal as they propagate in the same direction. Additionally, another concern of FW DRA is the rise in signal optical power near the start of the fiber span, leading to an increase in the non-linear phase shift of the signal. These factors, including RIN transfer-induced noise and non-linear noise, contribute to the degradation of system performance in FW DRA systems at the receiver.

As the performance of DRA with backward pumping is well understood with relatively low impact of RIN transfer, our research is focused on the FW pumping configuration, and is intended to provide a comprehensive analysis on the system performance impact of dual order FW Raman pumping, including signal intensity and phase noise induced by the RINs of both 1st and the 2nd order pump lasers, as well as the impacts of linear and nonlinear noise. The efficiencies of pump RIN to signal intensity and phase noise transfer are theoretically analyzed and experimentally verified by applying a shallow intensity modulation to the pump laser to mimic the RIN. The results indicate that the efficiency of the 2nd order pump RIN to signal phase noise transfer can be more than 2 orders of magnitude higher than that from the 1st order pump. Then the performance of the dual order FW Raman configurations is compared with that of single order Raman pumping to understand trade-offs of system parameters. The nonlinear interference (NLI) noise is analyzed to study the overall OSNR improvement when employing a 2nd order Raman pump. Finally, a DWDM system with 16-QAM modulation is used as an example to investigate the benefit of DRA with dual order Raman pumping and with different pump RIN levels. We also consider a DRA system using a 1st order incoherent pump together with a 2nd order coherent pump. Although dual order FW pumping corresponds to a slight increase of linear amplified spontaneous emission (ASE) compared to using only a 1st order pump, its major advantage comes from the reduction of nonlinear interference noise in a DWDM system. Because the RIN of the 2nd order pump has much higher impact than that of the 1st order pump, there should be more stringent requirement on the RIN of the 2nd order pump laser when dual order FW pumping scheme is used for DRA for efficient fiber-optic communication. Also, the result of system performance analysis reveals that higher baud rate systems, like those operating at 100Gbaud, are less affected by pump laser RIN due to the low-pass characteristics of the transfer of pump RIN to signal phase noise.

Audrey Mockenhaupt

Using Dual Function Radar Communication Waveforms for Synthetic Aperture Radar Automatic Target Recognition

When & Where:

Friday, July 18, 2025 - 2:30 PM
Nichols Hall, Room 246 (Executive Conference Room)

Committee Members:

Patrick McCormick, Chair
Shannon Blunt
Jon Owen

Abstract

As machine learning (ML), artificial intelligence (AI), and deep learning continue to advance, their applications become more diverse – one such application is synthetic aperture radar (SAR) automatic target recognition (ATR). These SAR ATR networks use different forms of deep learning such as convolutional neural networks (CNN) to classify targets in SAR imagery. An emerging research area of SAR is dual function radar communication (DFRC) which performs both radar and communications functions using a single co-designed modulation. The utilization of DFRC emissions for SAR imaging impacts image quality, thereby influencing SAR ATR network training. Here, using the Civilian Vehicle Data Dome dataset from the AFRL, SAR ATR networks are trained and evaluated with simulated data generated using Gaussian Minimum Shift Keying (GMSK) and Linear Frequency Modulation (LFM) waveforms. The networks are used to compare how the target classification accuracy of the ATR network differ between DFRC (i.e., GMSK) and baseline (i.e., LFM) emissions. Furthermore, as is common in pulse-agile transmission structures, an effect known as ’range sidelobe modulation’ is examined, along with its impact on SAR ATR. Finally, it is shown that SAR ATR network can be trained for GMSK emissions using existing LFM datasets via two types of data augmentation.

Past Defense Notices

AKHILESH MISHRA

Multi-look SAR Processing and Array Optimization Applied to Radio Echo Sounding of Ice Sheets

When & Where:

Friday, December 16, 2016 - 10:00 AM
317 Nichols Hall

Committee Members:

Carl Leuschen, Chair
Stephen Yan
Prasad Gogineni

Abstract

Increase in sea level is a problem of global importance because of its impact on infrastructure and residents in coastal regions. Airborne and satellite observations have shown that the margins of Greenland and Antarctic ice sheets are melting and retreating, steadily increasing their contribution to sea level rise over the last decade. To understand the ice dynamics and develop models to generate accurate estimates of ice sheets’ future contribution to sea level rise, more information on ice thickness and basal conditions are required. Airborne ice penetrating radars are routinely deployed on long-range aircraft to perform ice thickness measurements, which are needed to derive information on bed topography and basal conditions. Acquiring useful radar reflections from the ice-bed interface is very challenging in regions where ice sheets are exhibiting the most rapid changes because returns from the ice-bed are very weak and often masked by the off nadir surface clutter. Advanced signal processing techniques, such as Synthetic Aperture Radar (SAR) and array processing, are required to filter the clutter and extract weak bed echoes buried in the noise. However, past attempts to detect these signals have not been completely successful because system and target-induced errors on SAR and array processing are not fully compensated. SAR processing in areas with significant surface slope degrades signal-to-noise ratio. Also, systematic and random errors in amplitude and phase between receive channels degrade the performance of array processors used to synthesize cross-track beam pattern.
A novel Multi-look Time Domain Back Projection (MLTDBP) parallel processor has been developed to accurately model the electromagnetic wave propagation through the ice and generate echograms with better SNCR (Signal to Noise-Clutter Ratio) in the along-track dimension. A novel dynamic channel equalization method (based on null optimization) has been developed to adaptively calibrate the receive channels, giving an improved SNCR for the cross-track processing algorithms. Results from two-dimensional processing algorithms have been shown to be effective in extracting weak bed echoes, sloped internal ice layers, deep internal ice layers; and these results are also used to generate 3D ice-bed map of fast flowing Kangiata Nunaata Sermia (KNS) glacier in southwest Greenland.

SUSOBHAN DAS

Tunable Nano-photonic Devices

When & Where:

Tuesday, December 6, 2016 - 1:00 PM
246 Nichols Hall

Committee Members:

Ron Hui, Chair
Alessandro Salandrino
Chris Allen
Jim Stiles
Judy Wu

Abstract

High speed photonic systems and networks require electro-optic modulators to encode electronic signal onto optical carrier. The central focus of this research is twofold. First, tunable properties and tuning mechanisms of optical materials like Graphene, Vanadium dioxide (VO2), and Indium Tin Oxide (ITO) are characterized systematically in the 1550nm telecommunication wavelength. Then, these materials are implemented to design novel nano-photonic devices with high efficiency and miniature footprint suitable for photonic integration.
Specifically, we experimentally investigated the complex index of graphene in near infrared (NIR) wavelength through the reflectivity measurement on a SiO2/Si substrate. The measured change of reflectivity as the function of applied gate voltage is highly correlated with the Kubo formula. Based on a fiber-optic pump-probe setup we demonstrated that short optical pulses can be translated from pump wavelength to probe wavelength through dielectric-to-metal phase transition of VO2. In this process, pump leading edge induced optical phase modulation on the probe is converted into an intensity modulation through an optical frequency discriminator. We also theoretically modeled the permittivity of ITO with different levels of doping concentration in NIR region.
We proposed an ultra-compact electro-optic modulator based on switching plasmonic resonance “ON” and “OFF” of ITO-on-graphene via tuning of graphene chemical potential through electrical gating. The plasmonic resonance of ITO-on-graphene significantly enhances the field interaction with graphene which allows the size reduction compare to graphene based modulators without ITO. We presented a scheme of mode-multiplexed NIR modulator by tuning ITO permittivity as the function of carrier density through applied voltage. The wisely patterned ITO on top of an SOI ridge waveguide portrayed the independent modulation of two orthogonal modes simultaneously, which enhances functionality per-area. We proposed a theoretical model of tunable anisotropic metamaterial composed of periodic layers of graphene and Hafnium Oxide where transversal permittivity can be tuned via changing the chemical potential of graphene. A novel metamaterial assisted tunable photonic coupler is designed by inserting the proposed artificial tunable metamaterial in the coupling region of a waveguide coupler. The coupling efficiency can be tuned by changing the permittivity of metamaterial through electrical gating.

PRANAV BAHL

WOLF (machine learning WOrk fLow management Framework)

When & Where:

Monday, December 5, 2016 - 11:00 AM
246 Nichols Hall

Committee Members:

Luke Huan, Chair
Fengjun Li
Bo Luo

Abstract

Recently machine learning has been creating great strides in many areas of work field such as health, finance, education, sports etc., which has encouraged demand for machine learning systems. By definition machine learning automates the task of learning in terms of rule induction, classification, regression etc. This is then used to draw knowledgeable insights and to forecast an event before it actually takes place. Despite this automation, machine learning still does not automate the task of selecting the best algorithm(s) for a specific dataset. With the rapidly growing machine learning algorithms it has become difficult for novices as well as researchers to choose the best algorithm. The crux of a machine learning system is (1) to solve fundamental problems of preprocessing the data to help machine learning algorithm understand the data better; (2) to solve the problem of choosing meaningful features hence reducing the noise from the data; and (3) to choose the best resulting machine learning algorithm which is performed by doing grid search over hyperparameters of various machine learning algorithms and afterwards doing metric comparison amongst all outcomes. These are the problems addressed by Wolf.
Automation is the fuel that drives Wolf. Automating time-consuming and repeatable tasks are the defining characteristics of the project. The rising scope of Artificial Intelligence (AI) and machine learning increases the need for automation to simplify the process, hence help researchers and data scientists dig deeper into the problem and understand it well, rather than spending time in tweaking the algorithms. The positive correlation of growing intelligence and the complexity of solutions has shifted the trend from Artificial Intelligence (AI) to Automated Intelligence, a paradigm on which Wolf is based.
Wolf has been built to have an impact on a wider audience. The automation of machine learning pipeline saves ~40% of the work effort spent towards implementing and testing algorithm. It helps people with different levels of expertise and requirements, helps novices to identify best combinations of algorithms without having in depth knowledge of algorithms and helps researchers and businesses better their machine learning knowledge to figure out best resulting hyperparameters.

FARHAD MAHMOOD

Modeling and Analysis of Energy Efficiency in Wireless Handset Transceiver Systems

When & Where:

Friday, December 2, 2016 - 3:00 PM
250 Nichols Hall

Committee Members:

Erik Perrins, Chair
Lingjia Liu
Shannon Blunt
Victor Frost
Bozenna Pasik-Duncan

Abstract

As it is becoming a significant part of our daily life, wireless mobile handsets have become faster and smarter. One of the main remaining requirement by users is to have a longer lasting wireless cellular devices. Many techniques have been used to increase the capacity of the battery (Ampere per Hour), but that increases the safety concern.
Instead, it is better to have mobile handsets that consume less energy i.e increase energy efficiency. Therefore, in this research proposal, we study and analyze the radio
frequency(RF) transceiver energy consumption, which is the largest energy consumed in the cellular device. We consider a model of large number of parameters in order to make it more realistic. First a transmitter energy of single antenna device is considered for a fixed target probability of error in the receiver for multilevel quadratic amplitude modulations (MQAM). It will be found that the power amplifier (PA) consumes the highest portion of transceiver energy due to the low efficiency of the PA.
Furthermore, when MQAM and raised cosine filter are used, the impact of peak to average ratio (PAR) on PA becomes another source of energy wasting in the PA. This issue is analyzed in this research proposal with a number of promising solutions. This analysis of energy consumption for single antenna devices will help us analyze the energy consumption of multiple antennas devices. In this regard, we discuss the energy efficiency of multiple input multiple output (MIMO) antenna with known channel state information (CSI) at the transmitter. However, the study of energy efficiency of MIMO without CSI using space time coding will be our next step.

THEODORE LINDSEY

Interesting Rule Induction Module: Adding Support for Unknown Attribute Values

When & Where:

Friday, December 2, 2016 - 11:00 AM
2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Bo Luo
Prasad Kulkarni

Abstract

IRIM (Interesting Rule Induction Module) is a rule induction system designed to induce particularly strong, simple rule sets. Additionally, IRIM does not require prior discretization of numerical attribute values. IRIM does not necessarily produce consistent rules that fully describe the target concepts, however, the rules induced by IRIM often lead to novel revelations of hidden relationships in a dataset. In this paper, we attempt to extend the IRIM system to be able to handle missing attribute values (in particular, lost and do-not-care attribute values) more thoroughly than ignoring the cases that they belong to. Further, we include an implementation of IRIM in the modern programming language Python that has been written for easy inclusion in within a Python data mining package or library. The provided implementation makes use of the Pandas module which is built on top of a C back end for quick performance relative to the performance normally found with Python.

Sathya Mahadevan

Implementation of ID3 for Data Stored in Multiple SQL Databases

When & Where:

Wednesday, November 30, 2016 - 10:00 AM
2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Man Kong
Prasad Kulkarni

Abstract

Data classification is a methodology of data mining used to retrieve meaningful information from data. A model is built from the input training set which is later used to classify new observations. One of the most widely used models is a decision tree which uses a tree like structure to list all possible outcomes. Decision trees are preferred for their simple structure, requiring little effort for data preparation and easy interpretation. This project implements ID3, an algorithm for building the decision tree using information gain. The decision tree is converted to a set of rules and the error rate is calculated using the test dataset. The dataset is usually stored in a relational database in the form tables. In practice, it might be desired that data be stored across multiple databases. In such scenarios, retrieving and coordinating data from the databases could be a challenging task. This project provides the implementation of ID3 algorithm with the convenience of reading data stored at multiple data sources.

SATHYA MAHADEVAN

Implementation of ID3 for Data Stored in Multiple SQL Databases

When & Where:

Friday, November 11, 2016 - 10:00 AM
2001B Eaton Hall

Committee Members:

Jerzy Grzymala-Busse, Chair
Man Kong
Prasad Kulkarni

Abstract

CHAO LAN

Inequity Coefficient and Fair Transfer Learning

When & Where:

Monday, October 24, 2016 - 1:00 PM
250 Nichols Hall

Committee Members:

Luke Huan, Chair
Lingjia Liu
Bo Luo
Xintao Wu
Jin Feng

Abstract

Fair machine learning is an emerging and urgent research topic that aims to avoid discriminatory predictions against protected groups of people in real-world decision makings. This project aims to advance the field in two dimensions. First, we propose a more practical measurement of individual fairness called inequity coefficient, which integrates the current individual fairness framework that lacks of practice and the current situation testing practice that lacks of principle. We develop certain foundations of the measurement and present its practice. Second, we propose a first study of fairness in the context of transfer learning, with focuses on the hypothesis transfer and multi-task settings over two tasks. We illustrate a new challenge called discriminatory transfer, where discrimination is enforced by traditional task relatedness constraints that only aim to find accurate hypotheses. We propose a set of new algorithms that aim to avoid discriminatory transfer across tasks or promote fairness within each task.

Chao Lan

Inequity Coefficient and Fair Transfer Learning

When & Where:

Monday, October 24, 2016 - 1:00 PM
250 Nichols Hall

Committee Members:

Luke Huan, Chair
Lingjia Liu
Bo Luo
Xintao Wu
Jin Feng

Abstract

ROHIT BANERJEE

Extraction and Analysis of Amazon Reviews

When & Where:

Wednesday, October 5, 2016 - 2:00 PM
246 Nichols Hall

Committee Members:

Fengjun Li, Chair
Man Kong
Bo Luo

Abstract

Amazon.com is one of the largest online retail stores in the world. Besides selling millions of product on their website, Amazon provides a variety of Web services including Amazon Review and Recommendation System. Users are encouraged to write product reviews to help others to understand products’ features and make purchase decisions. However, product reviews, as a type of user generated content (UGC), suffer from quality and trust problems. To help evaluating the quality of reviews, Amazon also provides the users with the helpfulness vote feature so that a user can support a review that he considers helpful. In this project we aim to study the relation between helpfulness votes and the ranks of the reviews. In particular, we are looking for answers to questions such as “how does the helpfulness votes affect review ranks?” and “how review rank and its presentation mechanism affect people’s voting behavior?” To investigate on these questions, we built a crawler to collect reviews and votes of reviews from Amazon at a daily basis. Then, we conducted an analysis on a dataset with over 50,000 Amazon reviews to identify the voting patterns and their impact on the review ranks. Our results show that there exists a positive correlation between the review ranks and the helpfulness votes.