{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# AMB inter-dataset\n", "\n", "Warning: vignette for scHPL v. 0.0.2, this should be updated" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import numpy as np\n", "import time as tm\n", "from scHPL import progressive_learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "During this vignette we will repeat the AMB inter-dataset experiment. We use the AMB2016 an AMB2018 datasets to construct a tree for neuronal cell populations. The aligned datasets and labels can be downloaded from https://doi.org/10.5281/zenodo.4557712" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read the data\n", "\n", "We start with reading the different datasets, corresponding labels and them to a list.\n", "\n", "In the datasets, the rows represent different cells and columns represent the genes" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "data0 = 'integrated16.csv'\n", "labels0 = 'Labels16.csv'\n", "\n", "data1 = 'integrated18.csv'\n", "labels1 = 'Labels18.csv'\n", "\n", "data = []\n", "labels = []\n", "\n", "data.append((pd.read_csv(data0, index_col=0, sep=',').transpose()))\n", "labels.append(pd.read_csv(labels0, header=0, index_col=None, sep=',', usecols = [1]))\n", "\n", "data.append((pd.read_csv(data1, index_col=0, sep=',').transpose()))\n", "labels.append(pd.read_csv(labels1, header=0, index_col=None, sep=',', usecols = [2]))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Construct and train the classification tree\n", "\n", "Next, we use hierarchical progressive learning to construct and train a classification tree. After each iteration, an updated tree will be printed. If two labels have a perfect match, one of the labels will not be visible in the tree. Therefore, we will also indicate these perfect matches using a print statement\n", "\n", "During this experiment, we used the linear SVM, didn't apply dimensionality reduction and used the default threshold of 0.25. In you want to use a one-class SVM instead of a linear, the following can be used: classifier = 'svm_occ'. \n", "\n", "When matching the two datasets, there are three populations from the AMB2018 dataset that cause a complex scenario and could not be added to the tree. Here, we used return_missing = True, such that these populations are returned to the user. By using return_missing = False, these populations are attached to the root." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 1 \n", "\n", "Perfect match: Lamp5 Lhx6 is now: Igtp\n", "Perfect match: L2/3 IT VISp Adamts2 is now: L2 Ngb\n", "Perfect match: L5 PT VISp Chrna6 is now: L5 Chrna6\n", "Perfect match: Lamp5 Plch2 Dock5 is now: Ndnf Car4\n", "Perfect match: Pvalb Vipr2 is now: Pvalb Cpne5\n", "Perfect match: Lamp5 Lsp1 is now: Smad3\n", "Perfect match: Sst Chodl is now: Sst Chodl\n", "Perfect match: L5 IT VISp Col27a1 is now: L5 Ucma\n", "Perfect match: L5 IT VISp Hsd11b1 Endou is now: L5a Hsd11b1\n", "Perfect match: Sst Chrna2 Ptgdr is now: Sst Cdk6\n", "These populations are missing from the tree: \n", "Index(['Pvalb Sema3e Kank4', 'Sst Hpse Sema3c', 'Sst Tac1 Tacr3'], dtype='object')\n", "\n", "Updated tree:\n", "root\n", "\tIgtp\n", "\tL2 Ngb\n", "\tL2/3 Ptgs2\n", "\t\tL2/3 IT VISp Agmat\n", "\t\tL2/3 IT VISp Rrad\n", "\tL5 Chrna6\n", "\tL5 Ucma\n", "\tL5a Hsd11b1\n", "\tL5a Tcerg1l\n", "\t\tL5 IT VISp Whrn Tox2\n", "\tL5b Cdh13\n", "\t\tL5 PT VISp C1ql2 Cdh13\n", "\t\tL5 PT VISp Krt80\n", "\tL5b Tph2\n", "\t\tL5 PT VISp C1ql2 Ptgfr\n", "\t\tL5 PT VISp Lgr5\n", "\tL6a Car12\n", "\t\tL6 IT VISp Penk Col27a1\n", "\t\tL6 IT VISp Penk Fst\n", "\tL6a Mgp\n", "\t\tL6 CT ALM Nxph2 Sla\n", "\t\tL6 CT VISp Ctxn3 Brinp3\n", "\t\tL6 CT VISp Gpr139\n", "\tL6a Sla\n", "\t\tL5 NP VISp Trhr Cpne7\n", "\t\tL5 NP VISp Trhr Met\n", "\t\tL6 CT VISp Ctxn3 Sla\n", "\t\tL6 CT VISp Krt80 Sla\n", "\t\tL6 CT VISp Nxph2 Wls\n", "\t\tL6b VISp Col8a1 Rprm\n", "\tL6a Syt17\n", "\t\tL6 IT VISp Col18a1\n", "\t\tL6 IT VISp Col23a1 Adamts2\n", "\tL6b Rgs12\n", "\t\tL6b VISp Col8a1 Rxfp1\n", "\t\tL6b VISp Mup5\n", "\tL6b Serpinb11\n", "\t\tL6b P2ry12\n", "\t\tL6b VISp Crh\n", "\tNdnf Car4\n", "\tNdnf Cxcl14\n", "\t\tLamp5 Fam19a1 Pax6\n", "\t\tLamp5 Fam19a1 Tmem182\n", "\t\tLamp5 Krt73\n", "\t\tLamp5 Ntn1 Npy2r\n", "\t\tVip Igfbp6 Pltp\n", "\tPvalb Cpne5\n", "\tPvalb Gpx3\n", "\t\tPvalb Gpr149 Islr\n", "\t\tPvalb Reln Tac1\n", "\t\tPvalb Th Sst\n", "\tPvalb Obox3\n", "\tPvalb Rspo2\n", "\t\tPvalb Akr1c18 Ntf3\n", "\tPvalb Wt1\n", "\t\tPvalb Reln Itm2a\n", "\tSmad3\n", "\tSst Cbln4\n", "\t\tSst Calb2 Necab1\n", "\t\tSst Calb2 Pdlim5\n", "\t\tSst Crh 4930553C11Rik\n", "\t\tSst Crhr2 Efemp1\n", "\t\tSst Hpse Cbln4\n", "\t\tSst Mme Fam114a1\n", "\t\tSst Tac1 Htr1d\n", "\tSst Cdk6\n", "\tSst Chodl\n", "\tSst Myh8\n", "\t\tSst Chrna2 Glra3\n", "\t\tSst Myh8 Etv1\n", "\t\tSst Myh8 Fibin\n", "\tSst Tacstd2\n", "\t\tSst Rxfp1 Eya1\n", "\t\tSst Rxfp1 Prdm8\n", "\t\tSst Tac2 Tacstd2\n", "\tSst Th\n", "\t\tPvalb Gabrg1\n", "\t\tSst Esm1\n", "\t\tSst Nts\n", "\tVip Chat\n", "\t\tVip Lect1 Oxtr\n", "\t\tVip Lmo1 Myl1\n", "\t\tVip Ptprt Pkp2\n", "\tVip Gpc3\n", "\t\tVip Arhgap36 Hmcn1\n", "\t\tVip Gpc3 Slc18a3\n", "\t\tVip Lmo1 Fam159b\n", "\tVip Mybpc1\n", "\t\tVip Crispld2 Htr2c\n", "\t\tVip Crispld2 Kcne4\n", "\tVip Parm1\n", "\t\tVip Chat Htr1f\n", "\t\tVip Pygm C1ql1\n", "\t\tVip Rspo1 Itga4\n", "\tVip Sncg\n", "\t\tSncg Gpr50\n", "\t\tSncg Vip Itih5\n", "\t\tSncg Vip Nptx2\n", "\t\tVip Col15a1 Pde1a\n", "\tL4 IT VISp Rspo1\n", "\t\tL4 Arf5\n", "\t\tL4 Ctxn3\n", "\t\tL4 Scnn1a\n", "\tL5 IT VISp Batf3\n", "\t\tL5a Batf3\n", "\t\tL5a Pde1c\n", "\tPvalb Tpbg 2018\n", "\t\tPvalb Tacr3\n", "\t\tPvalb Tpbg 2016\n", "\tAstro Aqp4\n", "\tL5 IT VISp Col6a1 Fezf2\n", "\tL6 IT VISp Car3\n", "\tPvalb Calb1 Sst\n", "\tSerpinf1 Aqp5 Vip\n", "\tSncg Slc17a8\n", "\tSst Nr2f2 Necab1\n", "\tSst Tac2 Myh4\n", "\tVip Igfbp4 Mab21l1\n", "\tVip Igfbp6 Car10\n", "\tVip Rspo4 Rxfp1 Chat\n", "Training time: 360.18609166145325\n" ] } ], "source": [ "start = tm.time()\n", "classifier = 'svm'\n", "dimred = False\n", "threshold = 0.25\n", "tree = progressive_learning.learn_tree(data, labels, \n", " classifier = classifier, \n", " dimred = dimred, \n", " threshold = threshold,\n", " return_missing = True)\n", "\n", "training_time = tm.time()-start\n", " \n", "print('Training time: ', training_time)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }