A Machine Learning Study using Gene Expression Profiles to Distinguish Patients with Non-Small Cell Lung Cancer
Prasad Kulkarni
Hongyang Sun
Early diagnosis can effectively treat non-small cell lung cancer (NSCLC). Lung cancer cells usually have altered gene expression patterns compared to normal cells, which can be utilized to predict cancer through gene expression tests. This study analyzed gene expression values measured from 15227-probe microarray, and 290 patients consisting of cancer and control groups, to find relations between the gene expression features and lung cancer. The study explored k-means, statistical tests, and deep neural networks to obtain optimal feature representations and achieved the highest accuracy of 82%. Furthermore, a bipartite graph was built using the Bio Grid database and gene expression values, where the probe-to-probe relationship based on gene relevance was leveraged to enhance the prediction performance.