Learn hierarchy

Created on Fri Nov 22 14:11:01 2019

@author: Lieke

scHPL.learn.learn_tree(data: AnnData, batch_key: str, batch_order: list, cell_type_key: str, tree: TreeNode | None = None, retrain: bool = False, batch_added: list | None = None, classifier: Literal['knn', 'svm', 'svm_occ'] = 'knn', n_neighbors: int = 50, dynamic_neighbors: bool = True, distkNN: int = 99, dimred: bool = False, useRE: bool = True, FN: float = 0.5, rej_threshold: float = 0.5, match_threshold: float = 0.25, attach_missing: bool = False, print_conf: bool = False, gpu: int | None = None)[source]

Learn a classification tree based on multiple labeled datasets.

Parameters:
  • data (AnnData) – AnnData matrix containing aligned datasets.

  • batch_key (String) – Column name in adata.obs containing batch information.

  • batch_order (List) – List containing the order in which the batches should be added to the tree.

  • cell_type_key (String) – Column name in adata.obs containing the celltype labels.

  • tree (TreeNode = None) – Existing tree to update with the new datasets.

  • retrain (Boolean = False) – If ‘True’, the inputted tree will be retrained (needed if tree or datasets are changed after intial construction).

  • batch_added (List = None) – List that indicates which batches were used to build the existing tree.

  • classifier (String = 'knn') – Classifier to use (either ‘svm’, ‘svm_occ’ or ‘knn’).

  • n_neighbors (int = 50) – Number of neighbors for the kNN classifier (only used when classifier=’knn’).

  • dynamic_neighbors (bool = True) – Number of neighbors for the kNN classifier can change when a node contains a very small cell population. k is set to min(n_neighbors, smallest-cell-population)

  • distkNN (int = 99) – Used to determine the threshold for the maximum distance between a cell and it’s closest neighbor of the training set. Threshold is set to the distkNN’s percentile of distances within the training set

  • dimred (Boolean = False) – If ‘True’, PCA is applied before training the classifier.

  • useRE (Boolean = True) – If ‘True’, cells are also rejected based on the reconstruction error.

  • FN (Float = 0.5) – Percentage of false negatives allowed when determining the threshold for the reconstruction error.

  • rej_threshold (Float = 0.5) – If prediction probability lower that this threshold, a cell is rejected. (only used when using kNN classifier)

  • match_threshold (Float = 0.25) – Threshold to use when matching the labels.

  • attach_missing (Boolean = False) – If ‘True’ missing nodes are attached to the root node.

  • print_conf (Boolean = False) – Whether to print the confusion matrices during the matching step.

  • gpu (int = None) – GPU index to use for the Faiss library (only used when classifier=’knn’)

Return type:

Trained classification tree and a list with the missing populations.