Train hierarchical classifier

Created on Wed Oct 23 11:37:16 2019

@author: Lieke

scHPL.train.train_tree(data, labels, tree: TreeNode, classifier: Literal['knn', 'svm', 'svm_occ'] = 'knn', dimred: bool = False, useRE: bool = True, FN: float = 0.5, n_neighbors: int = 50, dynamic_neighbors: bool = True, distkNN: int = 99, gpu: int | None = None)[source]

Train a hierarchical classifier.

Parameters:
  • data (array_like) – Training data (cells x genes)

  • labels (array_like) – Cell type labels of the training data

  • tree (TreeNode) – Classification tree to train (can be build using utils.create_tree())

  • classifier (String = 'knn') – Classifier to use (either ‘svm’, ‘svm_occ’ or ‘knn’).

  • dimred (Boolean = False) – If ‘True’, PCA is applied before training the classifier.

  • useRE (Boolean = True) – If ‘True’, cells are also rejected based on the reconstruction error.

  • FN (Float = 0.5) – Percentage of false negatives allowed when determining the threshold for the reconstruction error.

  • n_neighbors (int = 50) – Number of neighbors for the kNN classifier (only used when classifier=’knn’).

  • dynamic_neighbors (bool = True) – Number of neighbors for the kNN classifier can change when a node contains a very small cell population. k is set to min(n_neighbors, smallest-cell-population)

  • distkNN (int = 99) – Used to determine the threshold for the maximum distance between a cell and it’s closest neighbor of the training set. Threshold is set to the distkNN’s percentile of distances within the training set

  • gpu (int | None = None) – GPU index to use for the Faiss library (only used when classifier=’knn’)

Return type:

Trained classification tree