APS Logo

Permutation Enhances the Rigor of Single-cell Genomics Data Analysis

ORAL · Invited

Abstract

Ensuring the reliability and accuracy of genomics data analysis is critical, particularly in visualizing complex biological structures and addressing data sparsity. This talk introduces two novel statistical methods—scDEED and mcRigor—that leverage permutation-based techniques to enhance the rigor of single-cell data analyses.

scDEED ([Xia et al., 2024, Nature Communications](https://www.nature.com/articles/s41467-024-45891-y)) addresses the challenge of evaluating the reliability of two-dimensional (2D) embeddings produced by visualization methods like t-SNE and UMAP, which are commonly used to visualize cell clusters. These methods, however, can sometimes misrepresent data structure, leading to erroneous interpretations. scDEED calculates a reliability score for each cell embedding, comparing the consistency between a cell's neighbors in the 2D embedding space and its pre-embedding neighbors. Cells with low reliability scores are flagged as dubious, while those with high scores are deemed trustworthy. Additionally, scDEED provides guidance for optimizing t-SNE and UMAP hyperparameters by minimizing the number of dubious embeddings, significantly improving visualization reliability across multiple datasets.

mcRigor ([Liu and Li, 2024, bioRxiv](https://doi.org/10.1101/2024.10.30.621093)) focuses on enhancing metacell partitioning in single-cell RNA-seq and ATAC-seq data analysis, a common strategy to address data sparsity by aggregating similar single cells into metacells. Existing algorithms often fail to verify metacell homogeneity, risking bias and spurious findings. mcRigor introduces a feature-correlation-based statistic to measure heterogeneity within a metacell, identifying dubious metacells composed of heterogeneous single cells. By optimizing metacell partitioning algorithm hyperparameters, mcRigor enhances the reliability of downstream analyses. Moreover, mcRigor allows for benchmarking and selecting the most suitable partitioning algorithm for a dataset, ensuring more robust discoveries.

Publication: Xia, L., Lee, C. & Li, J.J. Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters. Nat Commun 15, 1753 (2024). https://doi.org/10.1038/s41467-024-45891-y<br>​​​​​​​<br>Liu, P. & Li, J.J. mcRigor: a statistical method to enhance the rigor of metacell partitioning in single-cell data analysis. bioRxiv (2024). https://doi.org/10.1101/2024.10.30.621093

Presenters

  • Jingyi Jessica Li

    UCLA

Authors

  • Jingyi Jessica Li

    UCLA