Identification of Corrupted Data via $k$-Means Clustering for Function Approximation

Identification of Corrupted Data via $k$-Means Clustering for Function Approximation

Year:    2021

Author:    Jun Hou, Yeonjong Shin, Dongbin Xiu

CSIAM Transactions on Applied Mathematics, Vol. 2 (2021), Iss. 1 : pp. 81–107

Abstract

In addition to measurement noises, real world data are often corrupted by unexpected internal or external errors. Corruption errors can be much larger than the standard noises and negatively affect data processing results. In this paper, we propose a method of identifying corrupted data in the context of function approximation. The method is a two-step procedure consisting of approximation stage and identification stage. In the approximation stage, we conduct straightforward function approximation to the entire data set for preliminary processing. In the identification stage, a clustering algorithm is applied to the processed data to identify the potentially corrupted data entries. In particular, we found $k$-means clustering algorithm to be highly effective. Our theoretical analysis reveals that under sufficient conditions the proposed method can exactly identify all corrupted data entries. Numerous examples are provided to verify our theoretical findings and demonstrate the effectiveness of the method.

You do not have full access to this article.

Already a Subscriber? Sign in as an individual or via your institution

Journal Article Details

Publisher Name:    Global Science Press

Language:    English

DOI:    https://doi.org/10.4208/csiam-am.2020-0212

CSIAM Transactions on Applied Mathematics, Vol. 2 (2021), Iss. 1 : pp. 81–107

Published online:    2021-01

AMS Subject Headings:    Global Science Press

Copyright:    COPYRIGHT: © Global Science Press

Pages:    27

Keywords:    Data corruption function approximation sparse approximation $k$-means clustering.

Author Details

Jun Hou

Yeonjong Shin

Dongbin Xiu