Learn hierarchy
Created on Fri Nov 22 14:11:01 2019
@author: Lieke
- scHPL.learn.learn_tree(data: AnnData, batch_key: str, batch_order: list, cell_type_key: str, tree: TreeNode | None = None, retrain: bool = False, batch_added: list | None = None, classifier: Literal['knn', 'svm', 'svm_occ'] = 'knn', n_neighbors: int = 50, dynamic_neighbors: bool = True, distkNN: int = 99, dimred: bool = False, useRE: bool = True, FN: float = 0.5, rej_threshold: float = 0.5, match_threshold: float = 0.25, attach_missing: bool = False, print_conf: bool = False, gpu: int | None = None)[source]
Learn a classification tree based on multiple labeled datasets.
- Parameters:
data (AnnData) – AnnData matrix containing aligned datasets.
batch_key (String) – Column name in adata.obs containing batch information.
batch_order (List) – List containing the order in which the batches should be added to the tree.
cell_type_key (String) – Column name in adata.obs containing the celltype labels.
tree (TreeNode = None) – Existing tree to update with the new datasets.
retrain (Boolean = False) – If ‘True’, the inputted tree will be retrained (needed if tree or datasets are changed after intial construction).
batch_added (List = None) – List that indicates which batches were used to build the existing tree.
classifier (String = 'knn') – Classifier to use (either ‘svm’, ‘svm_occ’ or ‘knn’).
n_neighbors (int = 50) – Number of neighbors for the kNN classifier (only used when classifier=’knn’).
dynamic_neighbors (bool = True) – Number of neighbors for the kNN classifier can change when a node contains a very small cell population. k is set to min(n_neighbors, smallest-cell-population)
distkNN (int = 99) – Used to determine the threshold for the maximum distance between a cell and it’s closest neighbor of the training set. Threshold is set to the distkNN’s percentile of distances within the training set
dimred (Boolean = False) – If ‘True’, PCA is applied before training the classifier.
useRE (Boolean = True) – If ‘True’, cells are also rejected based on the reconstruction error.
FN (Float = 0.5) – Percentage of false negatives allowed when determining the threshold for the reconstruction error.
rej_threshold (Float = 0.5) – If prediction probability lower that this threshold, a cell is rejected. (only used when using kNN classifier)
match_threshold (Float = 0.25) – Threshold to use when matching the labels.
attach_missing (Boolean = False) – If ‘True’ missing nodes are attached to the root node.
print_conf (Boolean = False) – Whether to print the confusion matrices during the matching step.
gpu (int = None) – GPU index to use for the Faiss library (only used when classifier=’knn’)
- Return type:
Trained classification tree and a list with the missing populations.