Preserving Cluster Features in Imputing High Dimensional Data with Extensive Missing Rate

Chih Lai; Carolin Poschen; Lisa Maria Steinheuer; Jorg Hackermuller

doi:10.1109/BigData55660.2022.10020732

Back

Conference proceeding

Preserving Cluster Features in Imputing High Dimensional Data with Extensive Missing Rate

Chih Lai, Carolin Poschen, Lisa Maria Steinheuer and Jorg Hackermuller

2022 IEEE International Conference on Big Data (Big Data), pp.5295-5304

12/17/2022

DOI: https://doi.org/10.1109/BigData55660.2022.10020732

Abstract

clustering

Computational modeling

convolution operation

data imputation

global/ local features

Gold

Measurement

missing data

neural network

Neural networks

support vector machine

Training

Computer Architecture

Microprocessors

It is a daunting task to impute a large dataset that has majority of data missing from its tens of thousands of predictors, but still to preserve a known cluster structure in the imputed results. In this study, we propose a novel two-step approach for this task. First, we use simple linear classification models to derive the global and local features of cluster structures from a template ground truth (i.e. known gold data). Second, we integrate the cluster features extracted from each cluster into a neural network architecture and its training responses for guiding the imputation process. Since our neural network utilizes the global and local features of gold data in training the imputation network, we refer our neural network as GLIN (Global-Local Imputation Network). We test our imputation method on two high-dimensional datasets: a single cell dataset and a movie rating dataset, that have up to of 95% missing rates. Finally, we use four evaluation metrics: distance, correlation, data distribution, and predictability difference, to evaluate how well the cluster structures of the gold data are preserved in the imputed results.

Metrics

2 Record Views

Details

Title: Preserving Cluster Features in Imputing High Dimensional Data with Extensive Missing Rate
Author/Creator: Chih Lai - University of St. Thomas - Minnesota
Carolin Poschen - University of St. Thomas - Minnesota
Lisa Maria Steinheuer - Helmholtz Centre for Environmental Research
Jorg Hackermuller - Helmholtz Centre for Environmental Research
Publication Details: 2022 IEEE International Conference on Big Data (Big Data), pp.5295-5304
Publisher: IEEE
Academic Unit: Software Engineering and Data Science
Language: English
Resource Type: Conference proceeding
Record Identifier: 991015165641703691

Preserving Cluster Features in Imputing High Dimensional Data with Extensive Missing Rate

Abstract

Related links

Metrics

Details