A few tips for using scHPL

This page will be updated soon with more tips. If you have questions in the mean time, just open a GitHub issue or send an email to l.c.m.michielsen ‘at’ tudelft.nl

Which classifier to use?

We advise to use:

the linear SVM when your integrated data still has a lot of dimensions (e.g. when you have used Seurat to integrate the datasets)
the kNN when your integrated data has less, 10-50, dimensions (e.g. when you have used scVI or Harmony to integrate the datasets)
the one-class SVM when your main focus is to find unseen cell populations. A downside of the one-class SVM, however, is that the classification performance drops.

Preparing your AnnData object

The input for the learn function is an AnnData object where the labels and batch-indicators are a column in the metadata. If you integrated your data, it can be that it’s a different slot in the object. At the moment, it is NOT possible to indicate which slot to use for scHPL. Therefore, we advise to make a new AnnData object and copy the integrated data to the ‘.X’.

treeArches

One way to integrate your data is with treeArches. treeArches is a wrapper around scHPL and scArches to easily create and update reference atlases and the corresponding hierarchy. There are two tutorials explaining how to use treeArches.