Soft-Constrained Distance Preserving t-SNE

Author(s)

,
,
&

Abstract

Dimension reduction is a crucial tool for high-dimensional data analysis. Many dimension reduction techniques have been proposed for preserving different properties of a given dataset. For data visualization, t-distributed stochastic neighborhood embedding (t-SNE) is a popular method due to its ability to produce nicely separated clusters. However, t-SNE suffers from some major drawbacks. In this paper, a distance preserving t-SNE (DPt-SNE) is proposed, aiming to capture the global structure of the data and simultaneously maintain its local cluster separation. The basic idea is to incorporate a set of soft constraints, i.e. relaxing expected pairwise distance preserving constraints in order to regulate the low-dimensional embedding to preserve global structure encoded in the given distance metric of input data. In addition, we introduce a scaling optimization parameter to alleviate potential issues that arise when the difference between high and low dimensional distances is too large to overcome. Experimental results on six datasets positively confirm that our DPt-SNE can better reveal global structure than t-SNE, while retaining competitive clustering separation.

Author Biographies

  • Joseph Balderas

    Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA

  • Li Wang

    Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA

  • Andrzej Korzeniowski
    Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA
  • Ren-Cang Li
    Department of Mathematics, University of Texas at Arlington, Arlington, TX 76019, USA
About this article

Abstract View

  • 247

Pdf View

  • 80

DOI

10.4208/jml.250414