AMB inter-dataset

Warning: vignette for scHPL v. 0.0.2, this should be updated

[1]:
import os
import pandas as pd
import numpy as np
import time as tm
from scHPL import progressive_learning

During this vignette we will repeat the AMB inter-dataset experiment. We use the AMB2016 an AMB2018 datasets to construct a tree for neuronal cell populations. The aligned datasets and labels can be downloaded from https://doi.org/10.5281/zenodo.4557712

Read the data

We start with reading the different datasets, corresponding labels and them to a list.

In the datasets, the rows represent different cells and columns represent the genes

[2]:
data0 = 'integrated16.csv'
labels0 = 'Labels16.csv'

data1 = 'integrated18.csv'
labels1 = 'Labels18.csv'

data = []
labels = []

data.append((pd.read_csv(data0, index_col=0, sep=',').transpose()))
labels.append(pd.read_csv(labels0, header=0, index_col=None, sep=',', usecols = [1]))

data.append((pd.read_csv(data1, index_col=0, sep=',').transpose()))
labels.append(pd.read_csv(labels1, header=0, index_col=None, sep=',', usecols = [2]))

Construct and train the classification tree

Next, we use hierarchical progressive learning to construct and train a classification tree. After each iteration, an updated tree will be printed. If two labels have a perfect match, one of the labels will not be visible in the tree. Therefore, we will also indicate these perfect matches using a print statement

During this experiment, we used the linear SVM, didn’t apply dimensionality reduction and used the default threshold of 0.25. In you want to use a one-class SVM instead of a linear, the following can be used: classifier = ‘svm_occ’.

When matching the two datasets, there are three populations from the AMB2018 dataset that cause a complex scenario and could not be added to the tree. Here, we used return_missing = True, such that these populations are returned to the user. By using return_missing = False, these populations are attached to the root.

[3]:
start = tm.time()
classifier = 'svm'
dimred = False
threshold = 0.25
tree = progressive_learning.learn_tree(data, labels,
                                       classifier = classifier,
                                       dimred = dimred,
                                       threshold = threshold,
                                       return_missing = True)

training_time = tm.time()-start

print('Training time: ', training_time)
Iteration  1

Perfect match:  Lamp5 Lhx6 is now: Igtp
Perfect match:  L2/3 IT VISp Adamts2 is now: L2 Ngb
Perfect match:  L5 PT VISp Chrna6 is now: L5 Chrna6
Perfect match:  Lamp5 Plch2 Dock5 is now: Ndnf Car4
Perfect match:  Pvalb Vipr2 is now: Pvalb Cpne5
Perfect match:  Lamp5 Lsp1 is now: Smad3
Perfect match:  Sst Chodl is now: Sst Chodl
Perfect match:  L5 IT VISp Col27a1 is now: L5 Ucma
Perfect match:  L5 IT VISp Hsd11b1 Endou is now: L5a Hsd11b1
Perfect match:  Sst Chrna2 Ptgdr is now: Sst Cdk6
These populations are missing from the tree:
Index(['Pvalb Sema3e Kank4', 'Sst Hpse Sema3c', 'Sst Tac1 Tacr3'], dtype='object')

Updated tree:
root
        Igtp
        L2 Ngb
        L2/3 Ptgs2
                L2/3 IT VISp Agmat
                L2/3 IT VISp Rrad
        L5 Chrna6
        L5 Ucma
        L5a Hsd11b1
        L5a Tcerg1l
                L5 IT VISp Whrn Tox2
        L5b Cdh13
                L5 PT VISp C1ql2 Cdh13
                L5 PT VISp Krt80
        L5b Tph2
                L5 PT VISp C1ql2 Ptgfr
                L5 PT VISp Lgr5
        L6a Car12
                L6 IT VISp Penk Col27a1
                L6 IT VISp Penk Fst
        L6a Mgp
                L6 CT ALM Nxph2 Sla
                L6 CT VISp Ctxn3 Brinp3
                L6 CT VISp Gpr139
        L6a Sla
                L5 NP VISp Trhr Cpne7
                L5 NP VISp Trhr Met
                L6 CT VISp Ctxn3 Sla
                L6 CT VISp Krt80 Sla
                L6 CT VISp Nxph2 Wls
                L6b VISp Col8a1 Rprm
        L6a Syt17
                L6 IT VISp Col18a1
                L6 IT VISp Col23a1 Adamts2
        L6b Rgs12
                L6b VISp Col8a1 Rxfp1
                L6b VISp Mup5
        L6b Serpinb11
                L6b P2ry12
                L6b VISp Crh
        Ndnf Car4
        Ndnf Cxcl14
                Lamp5 Fam19a1 Pax6
                Lamp5 Fam19a1 Tmem182
                Lamp5 Krt73
                Lamp5 Ntn1 Npy2r
                Vip Igfbp6 Pltp
        Pvalb Cpne5
        Pvalb Gpx3
                Pvalb Gpr149 Islr
                Pvalb Reln Tac1
                Pvalb Th Sst
        Pvalb Obox3
        Pvalb Rspo2
                Pvalb Akr1c18 Ntf3
        Pvalb Wt1
                Pvalb Reln Itm2a
        Smad3
        Sst Cbln4
                Sst Calb2 Necab1
                Sst Calb2 Pdlim5
                Sst Crh 4930553C11Rik
                Sst Crhr2 Efemp1
                Sst Hpse Cbln4
                Sst Mme Fam114a1
                Sst Tac1 Htr1d
        Sst Cdk6
        Sst Chodl
        Sst Myh8
                Sst Chrna2 Glra3
                Sst Myh8 Etv1
                Sst Myh8 Fibin
        Sst Tacstd2
                Sst Rxfp1 Eya1
                Sst Rxfp1 Prdm8
                Sst Tac2 Tacstd2
        Sst Th
                Pvalb Gabrg1
                Sst Esm1
                Sst Nts
        Vip Chat
                Vip Lect1 Oxtr
                Vip Lmo1 Myl1
                Vip Ptprt Pkp2
        Vip Gpc3
                Vip Arhgap36 Hmcn1
                Vip Gpc3 Slc18a3
                Vip Lmo1 Fam159b
        Vip Mybpc1
                Vip Crispld2 Htr2c
                Vip Crispld2 Kcne4
        Vip Parm1
                Vip Chat Htr1f
                Vip Pygm C1ql1
                Vip Rspo1 Itga4
        Vip Sncg
                Sncg Gpr50
                Sncg Vip Itih5
                Sncg Vip Nptx2
                Vip Col15a1 Pde1a
        L4 IT VISp Rspo1
                L4 Arf5
                L4 Ctxn3
                L4 Scnn1a
        L5 IT VISp Batf3
                L5a Batf3
                L5a Pde1c
        Pvalb Tpbg 2018
                Pvalb Tacr3
                Pvalb Tpbg 2016
        Astro Aqp4
        L5 IT VISp Col6a1 Fezf2
        L6 IT VISp Car3
        Pvalb Calb1 Sst
        Serpinf1 Aqp5 Vip
        Sncg Slc17a8
        Sst Nr2f2 Necab1
        Sst Tac2 Myh4
        Vip Igfbp4 Mab21l1
        Vip Igfbp6 Car10
        Vip Rspo4 Rxfp1 Chat
Training time:  360.18609166145325