Single cell RNA sequencing typically produces reads on tens of thousands of base pairs. We say that the resulting coverage data for each cell is high dimensional as the data contains many fields. High dimensionality is a significant obstacle for any statistical analysis, most notably clustering methods which perform poorly in high dimensions. Typically one applies dimension reduction to the data before feeding it on to any statistical method. Historically this was done using linear methods like principle component analysis (PCA). Recently though, nonlinear dimension reduction such as t-distributed Stochastic Neighbour Embedding (t-SNE) and Universal Manifold Approximation and Projection (UMAP) have gained a foothold as standard methodology. Such methods are capable of detecting an underlying low-dimensional “manifold” on which data points lie, and then flatten the manifold, making the data much more convenient for any statistical methodologies, or for visualisation.
In this workshop we will pick apart the mechanics of such techniques with ample graphical demonstration, to present the capabilities and limitations of such methods. The focus will be on the UMAP algorithm, and attendees will gain a deeper understanding of the algorithm and the effect of its hyper-parameters.
Keywords: Dimension reduction, single cell, transcriptomics, UMAP, clustering, unsupervised learning
Requirements: The workshop will be presented with a Jupyter Notebook running Python. Attendees will benefit from running the notebook on their own laptop through the workshop.
Research Fellow, Biological Data Science Institute (BDSI), ANU
Dr James Nichols is a mathematician interested in numerical simulation, approximation, and statistical learning techniques. He is particularly interested in developing numerical methods that are able to tackle high dimensional problems in statistics and simulation, and methods to fit models to data.
Dr. Nichols obtained his PhD in 2014 from the University of New South Wales, investigating quasi-Monte Carlo methods to simulate fluid flows in random media, applicable to modelling large aquifers like the Great Artesian Basin. Following this he was a postdoc at UNSW and Sorbonne Université, Paris, investigating a variety of problems from simulating sub-diffusive particle movement, to approximation methods for stochastic partial differential equations. Before his PhD, James was a quantative analyst at Macquarie bank, assessing portfolio risk and pricing exotic financial instruments, and developing a love for probability and simulation.