AMB inter-dataset
Warning: vignette for scHPL v. 0.0.2, this should be updated
[1]:
import os
import pandas as pd
import numpy as np
import time as tm
from scHPL import progressive_learning
During this vignette we will repeat the AMB inter-dataset experiment. We use the AMB2016 an AMB2018 datasets to construct a tree for neuronal cell populations. The aligned datasets and labels can be downloaded from https://doi.org/10.5281/zenodo.4557712
Read the data
We start with reading the different datasets, corresponding labels and them to a list.
In the datasets, the rows represent different cells and columns represent the genes
[2]:
data0 = 'integrated16.csv'
labels0 = 'Labels16.csv'
data1 = 'integrated18.csv'
labels1 = 'Labels18.csv'
data = []
labels = []
data.append((pd.read_csv(data0, index_col=0, sep=',').transpose()))
labels.append(pd.read_csv(labels0, header=0, index_col=None, sep=',', usecols = [1]))
data.append((pd.read_csv(data1, index_col=0, sep=',').transpose()))
labels.append(pd.read_csv(labels1, header=0, index_col=None, sep=',', usecols = [2]))
Construct and train the classification tree
Next, we use hierarchical progressive learning to construct and train a classification tree. After each iteration, an updated tree will be printed. If two labels have a perfect match, one of the labels will not be visible in the tree. Therefore, we will also indicate these perfect matches using a print statement
During this experiment, we used the linear SVM, didn’t apply dimensionality reduction and used the default threshold of 0.25. In you want to use a one-class SVM instead of a linear, the following can be used: classifier = ‘svm_occ’.
When matching the two datasets, there are three populations from the AMB2018 dataset that cause a complex scenario and could not be added to the tree. Here, we used return_missing = True, such that these populations are returned to the user. By using return_missing = False, these populations are attached to the root.
[3]:
start = tm.time()
classifier = 'svm'
dimred = False
threshold = 0.25
tree = progressive_learning.learn_tree(data, labels,
classifier = classifier,
dimred = dimred,
threshold = threshold,
return_missing = True)
training_time = tm.time()-start
print('Training time: ', training_time)
Iteration 1
Perfect match: Lamp5 Lhx6 is now: Igtp
Perfect match: L2/3 IT VISp Adamts2 is now: L2 Ngb
Perfect match: L5 PT VISp Chrna6 is now: L5 Chrna6
Perfect match: Lamp5 Plch2 Dock5 is now: Ndnf Car4
Perfect match: Pvalb Vipr2 is now: Pvalb Cpne5
Perfect match: Lamp5 Lsp1 is now: Smad3
Perfect match: Sst Chodl is now: Sst Chodl
Perfect match: L5 IT VISp Col27a1 is now: L5 Ucma
Perfect match: L5 IT VISp Hsd11b1 Endou is now: L5a Hsd11b1
Perfect match: Sst Chrna2 Ptgdr is now: Sst Cdk6
These populations are missing from the tree:
Index(['Pvalb Sema3e Kank4', 'Sst Hpse Sema3c', 'Sst Tac1 Tacr3'], dtype='object')
Updated tree:
root
Igtp
L2 Ngb
L2/3 Ptgs2
L2/3 IT VISp Agmat
L2/3 IT VISp Rrad
L5 Chrna6
L5 Ucma
L5a Hsd11b1
L5a Tcerg1l
L5 IT VISp Whrn Tox2
L5b Cdh13
L5 PT VISp C1ql2 Cdh13
L5 PT VISp Krt80
L5b Tph2
L5 PT VISp C1ql2 Ptgfr
L5 PT VISp Lgr5
L6a Car12
L6 IT VISp Penk Col27a1
L6 IT VISp Penk Fst
L6a Mgp
L6 CT ALM Nxph2 Sla
L6 CT VISp Ctxn3 Brinp3
L6 CT VISp Gpr139
L6a Sla
L5 NP VISp Trhr Cpne7
L5 NP VISp Trhr Met
L6 CT VISp Ctxn3 Sla
L6 CT VISp Krt80 Sla
L6 CT VISp Nxph2 Wls
L6b VISp Col8a1 Rprm
L6a Syt17
L6 IT VISp Col18a1
L6 IT VISp Col23a1 Adamts2
L6b Rgs12
L6b VISp Col8a1 Rxfp1
L6b VISp Mup5
L6b Serpinb11
L6b P2ry12
L6b VISp Crh
Ndnf Car4
Ndnf Cxcl14
Lamp5 Fam19a1 Pax6
Lamp5 Fam19a1 Tmem182
Lamp5 Krt73
Lamp5 Ntn1 Npy2r
Vip Igfbp6 Pltp
Pvalb Cpne5
Pvalb Gpx3
Pvalb Gpr149 Islr
Pvalb Reln Tac1
Pvalb Th Sst
Pvalb Obox3
Pvalb Rspo2
Pvalb Akr1c18 Ntf3
Pvalb Wt1
Pvalb Reln Itm2a
Smad3
Sst Cbln4
Sst Calb2 Necab1
Sst Calb2 Pdlim5
Sst Crh 4930553C11Rik
Sst Crhr2 Efemp1
Sst Hpse Cbln4
Sst Mme Fam114a1
Sst Tac1 Htr1d
Sst Cdk6
Sst Chodl
Sst Myh8
Sst Chrna2 Glra3
Sst Myh8 Etv1
Sst Myh8 Fibin
Sst Tacstd2
Sst Rxfp1 Eya1
Sst Rxfp1 Prdm8
Sst Tac2 Tacstd2
Sst Th
Pvalb Gabrg1
Sst Esm1
Sst Nts
Vip Chat
Vip Lect1 Oxtr
Vip Lmo1 Myl1
Vip Ptprt Pkp2
Vip Gpc3
Vip Arhgap36 Hmcn1
Vip Gpc3 Slc18a3
Vip Lmo1 Fam159b
Vip Mybpc1
Vip Crispld2 Htr2c
Vip Crispld2 Kcne4
Vip Parm1
Vip Chat Htr1f
Vip Pygm C1ql1
Vip Rspo1 Itga4
Vip Sncg
Sncg Gpr50
Sncg Vip Itih5
Sncg Vip Nptx2
Vip Col15a1 Pde1a
L4 IT VISp Rspo1
L4 Arf5
L4 Ctxn3
L4 Scnn1a
L5 IT VISp Batf3
L5a Batf3
L5a Pde1c
Pvalb Tpbg 2018
Pvalb Tacr3
Pvalb Tpbg 2016
Astro Aqp4
L5 IT VISp Col6a1 Fezf2
L6 IT VISp Car3
Pvalb Calb1 Sst
Serpinf1 Aqp5 Vip
Sncg Slc17a8
Sst Nr2f2 Necab1
Sst Tac2 Myh4
Vip Igfbp4 Mab21l1
Vip Igfbp6 Car10
Vip Rspo4 Rxfp1 Chat
Training time: 360.18609166145325