scprint
  • Home
  • structure
  • pre-training
  • usage example

example notebooks

  • scPRINT use case on BPH
  • scPRINT use case on BPH (part 2, GN analysis)

documentation

  • model
  • tasks
  • cli
  • embedders
  • utils
scprint
  • example notebooks
  • scPRINT use case on BPH (part 2, GN analysis)

scPRINT use case on BPH (part 2, GN analysis)¶

In this use-case, which some of the results are presented in Figure 5 of our manuscript, we perform an extensive analysis of gene networks generated by scPRINT in our previous notebook for fibroblasts of the prostate in both normal and benign prostatic hyperplasia states.

We can look at many patterns from the gene networks such as hub genes, gene modules, and gene-gene interactions. Some of them are related to highly variable and differentially expressed genes while others can only be infered by using the gene networks.

Table of content:

  1. Hub genes
  2. Network similarity
  3. Network communities
    1. Cancer version
    2. Normal version
  4. Differential Gene - Gene connections
  5. Using classification
In [1]:
Copied!
from bengrn import BenGRN
import scanpy as sc
from bengrn.base import train_classifier

from anndata.utils import make_index_unique
from grnndata import utils as grnutils
from grnndata import read_h5ad

from matplotlib import pyplot as plt
from scdataloader import utils as data_utils
import numpy as np

from pyvis import network as pnx
import networkx as nx
import scipy.sparse
import pandas as pd
import gseapy as gp
from gseapy import dotplot

%load_ext autoreload
%autoreload 2
from bengrn import BenGRN import scanpy as sc from bengrn.base import train_classifier from anndata.utils import make_index_unique from grnndata import utils as grnutils from grnndata import read_h5ad from matplotlib import pyplot as plt from scdataloader import utils as data_utils import numpy as np from pyvis import network as pnx import networkx as nx import scipy.sparse import pandas as pd import gseapy as gp from gseapy import dotplot %load_ext autoreload %autoreload 2
💡 connected lamindb: jkobject/scprint
In [ ]:
Copied!
prostate_combined = sc.read_h5ad("../data/prostate_combined_o2uniqsx.h5ad")
prostate_combined = sc.read_h5ad("../data/prostate_combined_o2uniqsx.h5ad")
In [2]:
Copied!
# load the generated gene networks from the previous notebook
grn = read_h5ad("../../data/prostate_fibro_grn.h5ad")
grn_c = read_h5ad("../../data/prostate_cancer_fibro_grn.h5ad")
#remove duplicates
grn.var.symbol = make_index_unique(grn.var.symbol.astype(str))
grn_c.var.symbol = make_index_unique(grn_c.var.symbol.astype(str))

# convert gene ids to symbols
grn.var['ensembl_id'] = grn.var.index
grn.var.index = grn.var.symbol
grn_c.var['ensembl_id'] = grn_c.var.index
grn_c.var.index = grn_c.var.symbol

grn.obs.cleaned_pred_disease_ontology_term_id.value_counts(),grn_c.obs.cleaned_pred_disease_ontology_term_id.value_counts()
# load the generated gene networks from the previous notebook grn = read_h5ad("../../data/prostate_fibro_grn.h5ad") grn_c = read_h5ad("../../data/prostate_cancer_fibro_grn.h5ad") #remove duplicates grn.var.symbol = make_index_unique(grn.var.symbol.astype(str)) grn_c.var.symbol = make_index_unique(grn_c.var.symbol.astype(str)) # convert gene ids to symbols grn.var['ensembl_id'] = grn.var.index grn.var.index = grn.var.symbol grn_c.var['ensembl_id'] = grn_c.var.index grn_c.var.index = grn_c.var.symbol grn.obs.cleaned_pred_disease_ontology_term_id.value_counts(),grn_c.obs.cleaned_pred_disease_ontology_term_id.value_counts()
Out[2]:
(cleaned_pred_disease_ontology_term_id
 normal    300
 Name: count, dtype: int64,
 cleaned_pred_disease_ontology_term_id
 benign prostatic hyperplasia    300
 Name: count, dtype: int64)

1. Hub genes¶

We will use 2 different definition of centrality to compare the hubs of the networks in both states. The first one is the degree centrality, which is the number of connections of a node. The second one is the eigenvector centrality, which is the sum of the centrality of the neighbors of a node.

For each gene we mostly interogate their meaning with genecards, e.g. https://www.genecards.org/cgi-bin/carddisp.pl?gene=CD99

In [7]:
Copied!
# between the two networks we have ~3000 genes in common over their 4000 genes
common = set(grn_c.var.symbol) & set(grn.var.symbol)
len(common)
# between the two networks we have ~3000 genes in common over their 4000 genes common = set(grn_c.var.symbol) & set(grn.var.symbol) len(common)
Out[7]:
2881
In [8]:
Copied!
# hub genes based on edge centrality (number of connections / strength of connections)
grn.grn.sum(0).sort_values(ascending=False).head(20)
# hub genes based on edge centrality (number of connections / strength of connections) grn.grn.sum(0).sort_values(ascending=False).head(20)
Out[8]:
symbol
S100A6            58.131020
TGIF2-RAB5IF      51.177635
MIF               44.534096
DNAJB9            40.953262
IGFBP7            38.629910
APOD              38.003906
BRME1             37.598698
SPARCL1           37.160854
TIMP1             36.671650
DCN               34.621376
C1S               32.746254
MGP               30.496748
nan-270           29.884596
SLC25A6           29.308729
BLOC1S5-TXNDC5    28.986681
CETN1             28.746281
FTL               28.266533
AKR1C1            27.641245
FBLN1             26.828733
MT2A              25.590197
dtype: float32
In [7]:
Copied!
# without taking in account genes that are not present in the cancer network
grn.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
# without taking in account genes that are not present in the cancer network grn.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
Out[7]:
symbol
TGIF2-RAB5IF      51.177635
IGFBP7            38.629910
APOD              38.003906
BRME1             37.598698
SPARCL1           37.160854
TIMP1             36.671650
DCN               34.621376
C1S               32.746254
MGP               30.496748
BLOC1S5-TXNDC5    28.986681
AKR1C1            27.641245
FBLN1             26.828733
MT2A              25.590197
MYOC              25.428144
FAM205A           25.296257
YBX3              24.937267
CST3              24.676405
C11orf96          18.731924
TFPI              18.429214
ATP6V0C           17.695705
dtype: float32
In [11]:
Copied!
# cancer hub genes
grn_c.grn.sum(0).sort_values(ascending=False).head(20)
# cancer hub genes grn_c.grn.sum(0).sort_values(ascending=False).head(20)
Out[11]:
symbol
HSPA1A          75.300415
MT2A            57.938267
CREM            42.478115
TGIF2-RAB5IF    39.980556
HSPE1           38.607624
CALD1           36.512047
SPOCK3          35.564945
HLA-A           33.446110
SPARCL1         33.403152
RBP1            32.482758
C1S             32.037743
BRME1-1         31.879770
FABP4           31.604069
nan-99          31.555256
LUM             30.523245
EIF4A1          29.690474
ATP6V0C         29.475002
PDZD2           27.333025
DEFA1           26.979454
CTSG            25.070097
dtype: float32
In [10]:
Copied!
# without taking in account genes not present in the normal network
grn_c.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
# without taking in account genes not present in the normal network grn_c.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
Out[10]:
symbol
HSPA1A            75.300415
MT2A              57.938267
TGIF2-RAB5IF      39.980556
SPOCK3            35.564945
HLA-A             33.446110
SPARCL1           33.403152
C1S               32.037743
nan-99            31.555256
LUM               30.523245
EIF4A1            29.690474
ATP6V0C           29.475002
DEFA1             26.979454
IGFBP7            23.982243
DCN               23.304916
MGP               22.596451
CD99              21.624668
BLOC1S5-TXNDC5    20.914230
SERPING1          20.824636
COL6A2            18.076559
THBS1             17.826540
dtype: float32
In [12]:
Copied!
# top differential hubs
TOP = 10
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
# top differential hubs TOP = 10 (set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[12]:
{'HLA-A', 'HSPA1A', 'MT2A', 'SPOCK3'}
In [13]:
Copied!
TOP = 20
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 20 (set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[13]:
{'ATP6V0C', 'DEFA1', 'EIF4A1', 'HLA-A', 'HSPA1A', 'LUM', 'SPOCK3', 'nan-99'}
In [14]:
Copied!
TOP = 50
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 50 (set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[14]:
{'CD99',
 'CPE',
 'DEFA1',
 'EIF4A1',
 'HLA-A',
 'HSPA1A',
 'LGALS1',
 'LUM',
 'PYDC2',
 'SERPING1',
 'SPOCK3',
 'THBS1',
 'nan-191',
 'nan-99'}

overall we see that both are somewhat similar in the way the networks classifies main nodes but at least around half of the top 200 nodes are different between the two datasets and half of those are not due to diffent gene being used by the networks (50)

genes preferentially given as top elements in the tumor version only:

  • CD99
  • CPE
  • DEFA1
  • EIF4A1
  • HLA-A
  • HSPA1A
  • LGALS1
  • LUM
  • PYDC2
  • SERPING1
  • SPOCK3
  • THBS1
In [3]:
Copied!
# we now compute eigen centrality creating a sparse network by only keeping the top 20 neighbors for each gene in the network
TOP = 20

grnutils.get_centrality(grn, TOP, top_k_to_disp=0)
grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
# we now compute eigen centrality creating a sparse network by only keeping the top 20 neighbors for each gene in the network TOP = 20 grnutils.get_centrality(grn, TOP, top_k_to_disp=0) grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
Top central genes: []
Top central genes: []
Out[3]:
[]
In [12]:
Copied!
grn_c.var.centrality.sort_values(ascending=False).head(10)
grn_c.var.centrality.sort_values(ascending=False).head(10)
Out[12]:
symbol
HSPA1A     0.271584
MT2A       0.271584
CREM       0.271574
CD99       0.271517
NR4A3      0.271385
RBP1       0.271130
SOX4       0.264214
ATP6V0C    0.263235
HLA-A      0.232913
LUM        0.202007
Name: centrality, dtype: float64
In [9]:
Copied!
grn_c.var.centrality.sort_values(ascending=False).head(10)
grn_c.var.centrality.sort_values(ascending=False).head(10)
Out[9]:
symbol
HSPA1A     0.271584
MT2A       0.271584
CREM       0.271574
CD99       0.271517
NR4A3      0.271385
RBP1       0.271130
SOX4       0.264214
ATP6V0C    0.263235
HLA-A      0.232913
LUM        0.202007
Name: centrality, dtype: float64
In [10]:
Copied!
grn.var.centrality.sort_values(ascending=False).head(10)
grn.var.centrality.sort_values(ascending=False).head(10)
Out[10]:
symbol
MT2A       0.276470
SLC25A6    0.276183
CXCL8      0.275859
FTH1       0.275757
MIF        0.267968
DNAJB9     0.265635
SOX4       0.241950
RELB       0.234482
YBX3       0.216567
CST3       0.213585
Name: centrality, dtype: float64
In [13]:
Copied!
TOP = 10
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 10 (set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[13]:
{'ATP6V0C', 'CD99', 'HLA-A', 'HSPA1A', 'LUM'}
In [12]:
Copied!
TOP = 20
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 20 (set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[12]:
{'ATP6V0C',
 'CD99',
 'EIF4A1',
 'HLA-A',
 'HSPA1A',
 'LUM',
 'PAGE4',
 'RYR2',
 'SERPINF1'}
In [11]:
Copied!
TOP = 50
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 50 (set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
Out[11]:
{'C1R',
 'CD99',
 'COL6A2',
 'HLA-A',
 'HNRNPA0',
 'HSPA1A',
 'LUM',
 'PAGE4',
 'SERPINA3',
 'SERPING1',
 'SPOCK3',
 'THBS1'}

it seems the networks are even more similar when comparing them at the level of the centrality of their nodes. this means that some are highly central while maybe not

2. Network similarity¶

We now look at the similarity of the two networks based on general overlap of their top K edges across their common nodes

In [41]:
Copied!
grn.var.index = grn.var.ensembl_id
grn_c.var.index = grn_c.var.ensembl_id
overlap = list(set(grn.var.index) & set(grn_c.var.index))
grn.var.index = grn.var.ensembl_id grn_c.var.index = grn_c.var.ensembl_id overlap = list(set(grn.var.index) & set(grn_c.var.index))
In [43]:
Copied!
K = 20
subgrn_c = grn_c.get(overlap).grn
subgrn_c = subgrn_c.apply(lambda row: row >= row.nlargest(K).min(), axis=1)

subgrn = grn.get(overlap).grn
subgrn = subgrn.apply(lambda row: row >= row.nlargest(K).min(), axis=1)
(subgrn & subgrn_c).sum(1).mean() / K
K = 20 subgrn_c = grn_c.get(overlap).grn subgrn_c = subgrn_c.apply(lambda row: row >= row.nlargest(K).min(), axis=1) subgrn = grn.get(overlap).grn subgrn = subgrn.apply(lambda row: row >= row.nlargest(K).min(), axis=1) (subgrn & subgrn_c).sum(1).mean() / K
Out[43]:
0.5055735930735931

when looking at the top 20 connection for each genes in the network we see that on average only 50% of them agree between the two networks

3. Network communities¶

Again looking at the top 20 connections for each gene in the network we use the leiden (or louvain) algorithm to find communities in both networks.

We then study the hub of these communities as well as their enrichment using enrichr and ontology databases. We will only look at communities between 20 and 200 genes

In [33]:
Copied!
TOP = 20

grnutils.get_centrality(grn, TOP, top_k_to_disp=0)
grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
TOP = 20 grnutils.get_centrality(grn, TOP, top_k_to_disp=0) grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
Top central genes: []
Top central genes: []
Out[33]:
[]
In [17]:
Copied!
grn_c = grnutils.compute_cluster(grn_c, 1.5, use="leiden", n_iterations=10, max_comm_size=100)
grn_c = grnutils.compute_cluster(grn_c, 1.5, use="leiden", n_iterations=10, max_comm_size=100)
In [4]:
Copied!
grn = grnutils.compute_cluster(grn, 1.5)
grn_c = grnutils.compute_cluster(grn_c, 1.5)
grn = grnutils.compute_cluster(grn, 1.5) grn_c = grnutils.compute_cluster(grn_c, 1.5)

3.1 Cancer version¶

In [30]:
Copied!
# our communities
grn.var['cluster_1.5'].value_counts().head(10)
# our communities grn.var['cluster_1.5'].value_counts().head(10)
Out[30]:
cluster_1.5
0     1620
1     1483
2      732
3      111
4       23
5        9
6        4
16       1
23       1
22       1
Name: count, dtype: int64
In [31]:
Copied!
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='4'].index.tolist(), only=200, interactive=False)#
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='4'].index.tolist(), only=200, interactive=False)#
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['PRDM6', 'MSANTD3', 'STAG3', 'TRHDE', 'TBX5', 'HEPH', 'GALNT16',
       'NFATC4', 'SYNDIG1', 'HSF4', 'ZNF423', 'PLA2G4C', 'PLPPR2', 'SCN1B',
       'ASPA', 'SNX15', 'BAG2', 'PTK7', 'HRH2', 'PLCL1', 'PTCH2', 'TGFB3',
       'RASL11A', 'TOX2', 'CPNE5', 'GLIS2', 'HIVEP3', 'FGF13', 'ACAP3',
       'DUSP26', 'CTIF', 'CAMK1', 'SLC9A5', 'FBN2', 'LRIG1', 'LIMD1', 'TRPC1',
       'SLC2A13', 'ADAMTS12', 'FZD7', 'AFF2', 'CACNA1D', 'MRAS', 'ST6GALNAC6',
       'FGFR4', 'PTGER1', 'PKDCC', 'SLC9B2', 'NPY1R', 'ANKRD33B', 'DCLK2',
       'PURG', 'GXYLT2', 'LRRN4CL', 'CA8', 'EXOC3L1', 'BHLHE22', 'AATK',
       'CMC4', 'BOLA2', 'CCR10', 'TCEAL2', 'SLC24A3', 'ROR1', 'SLC51B',
       'EMID1', 'AGMO', 'CENPP', 'OPTC', 'SPOCK3', 'COL27A1', 'DACT3', 'STMN3',
       'DLL1', 'SMOC1', 'COLGALT2', 'ZDBF2', 'DOK6', 'HNRNPUL2-BSCL2',
       'C3orf84', 'C2orf74', 'ARHGEF25', 'NPIPB5', 'TNFSF12-TNFSF13',
       'ZFP91-CNTF', 'nan-41', 'nan-81', 'nan-96', 'nan-98', 'HERC3',
       'nan-116', 'nan-117'],
      dtype='object', name='symbol_2')
No description has been provided for this image
In [26]:
Copied!
# if you want to draw better looking networks
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_4.html")
# if you want to draw better looking networks pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) #first_node = list(G.nodes)[-1] #pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_c_4.html")
In [32]:
Copied!
# Assuming 'node_names' contains the list of gene names.
# Note: it seems with enrichr, one can only use one gene set at a time, choose accordingly
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn_c.var.symbol.tolist(),
)
enr.res2d.head(20)
# Assuming 'node_names' contains the list of gene names. # Note: it seems with enrichr, one can only use one gene set at a time, choose accordingly enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'], organism='Human', # change accordingly #description='pathway', #cutoff=0.08, # test dataset, use lower value for real case background=grn_c.var.symbol.tolist(), ) enr.res2d.head(20)
2024-07-25 18:31:53,534 [WARNING] Input library not found: Reactome. Skip
Out[32]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 Solute:Potassium Antiporter Activity (GO:0022821) 0.000523 0.050652 0 0 inf inf SLC24A3;SLC9A5
1 GO_Molecular_Function_2023 Coreceptor Activity Involved In Wnt Signaling ... 0.001547 0.050652 0 0 86.822222 561.890021 PTK7;ROR1
2 GO_Molecular_Function_2023 Sodium Ion Transmembrane Transporter Activity ... 0.001547 0.050652 0 0 86.822222 561.890021 SLC9A5;SLC9B2
3 GO_Molecular_Function_2023 Calcium Ion Transmembrane Transporter Activity... 0.001700 0.050652 0 0 16.432584 104.795452 SLC24A3;TRPC1;CACNA1D
4 GO_Molecular_Function_2023 Calcium-Dependent Phospholipid Binding (GO:000... 0.003047 0.050652 0 0 43.400000 251.445478 CPNE5;PLA2G4C
5 GO_Molecular_Function_2023 Coreceptor Activity Involved In Wnt Signaling ... 0.003047 0.050652 0 0 43.400000 251.445478 PTK7;ROR1
6 GO_Molecular_Function_2023 Metal Cation:Proton Antiporter Activity (GO:00... 0.003047 0.050652 0 0 43.400000 251.445478 SLC9A5;SLC9B2
7 GO_Molecular_Function_2023 Sodium:Proton Antiporter Activity (GO:0015385) 0.003047 0.050652 0 0 43.400000 251.445478 SLC9A5;SLC9B2
8 GO_Molecular_Function_2023 Iron Ion Binding (GO:0005506) 0.005002 0.070586 0 0 28.925926 153.247105 HEPH;AGMO
9 GO_Molecular_Function_2023 Calcium Channel Activity (GO:0005262) 0.005307 0.070586 0 0 10.099395 52.907606 SLC24A3;TRPC1;CACNA1D
10 GO_Molecular_Function_2023 Wnt Receptor Activity (GO:0042813) 0.007391 0.089364 0 0 21.688889 106.438022 FZD7;ROR1
11 GO_Molecular_Function_2023 RNA Polymerase II-specific DNA-binding Transcr... 0.010110 0.104284 0 0 7.715135 35.445265 TBX5;DUSP26;DLL1
12 GO_Molecular_Function_2023 Phospholipase Activity (GO:0004620) 0.010193 0.104284 0 0 17.346667 79.552387 PLCL1;PLA2G4C
13 GO_Molecular_Function_2023 Monoatomic Cation Channel Activity (GO:0005261) 0.011601 0.110210 0 0 7.284644 32.465172 SLC24A3;TRPC1;CACNA1D
14 GO_Molecular_Function_2023 Sodium Channel Regulator Activity (GO:0017080) 0.013389 0.118714 0 0 14.451852 62.335739 FGF13;SCN1B
15 GO_Molecular_Function_2023 RNA Polymerase II Cis-Regulatory Region Sequen... 0.022992 0.139045 0 0 2.668376 10.066740 HSF4;HIVEP3;BHLHE22;TBX5;ZNF423;NFATC4;GLIS2
16 GO_Molecular_Function_2023 Calcium:Sodium Antiporter Activity (GO:0005432) 0.023000 0.139045 0 0 inf inf SLC24A3
17 GO_Molecular_Function_2023 Potassium:Proton Antiporter Activity (GO:0015386) 0.023000 0.139045 0 0 inf inf SLC9A5
18 GO_Molecular_Function_2023 Xylosyltransferase Activity (GO:0042285) 0.023000 0.139045 0 0 inf inf GXYLT2
19 GO_Molecular_Function_2023 Tat Protein Binding (GO:0030957) 0.023000 0.139045 0 0 inf inf DLL1
In [33]:
Copied!
# we make a plot of the enrichment
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)
# we make a plot of the enrichment enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", #title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image
In [34]:
Copied!
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='5'].index.tolist(), only=80, interactive=False)#
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='5'].index.tolist(), only=80, interactive=False)#
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['ABCF2', 'ITIH4', 'OLFM2', 'ICAM4', 'RIGI', 'KAZALD1', 'BST1', 'PANX1',
       'C9', 'FBXO2', 'ZNF410', 'ACVR1C', 'VAMP7', 'TOP2A', 'TAF4B',
       'KIAA1755', 'EDA', 'YPEL4', 'ENTPD3', 'GP9', 'NLGN2', 'CLCF1', 'RPRM',
       'ODF3B', 'MSC', 'NTF3', 'COL13A1', 'GP1BB', 'HSPA1A', 'TMEM158',
       'nan-82', 'NBPF19', 'SCO2', 'SMIM41'],
      dtype='object', name='symbol_2')
No description has been provided for this image
In [87]:
Copied!
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_5.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) #first_node = list(G.nodes)[-1] #pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_c_5.html")
In [44]:
Copied!
grn_c.var.loc['nan-82'] # par of the Glycoside Hydrolase Family
grn_c.var.loc['nan-82'] # par of the Glycoside Hydrolase Family
Out[44]:
uid                    4giyQrh7tkXI
symbol                       nan-82
ncbi_gene_ids                      
biotype              protein_coding
description           novel protein
synonyms                           
organism_id                       2
public_source_id                9.0
created_by_id                     1
mt                            False
ribo                          False
hb                            False
organism             NCBITaxon:9606
TFs                           False
ensembl_id          ENSG00000269590
centrality                      0.0
cluster_1.5                       5
Name: nan-82, dtype: object
In [35]:
Copied!
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn_c.var.symbol.tolist(),
)
enr.res2d.head(20)
enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'], organism='Human', # change accordingly #description='pathway', #cutoff=0.08, # test dataset, use lower value for real case background=grn_c.var.symbol.tolist(), ) enr.res2d.head(20)
2024-07-25 18:33:16,565 [WARNING] Input library not found: Reactome. Skip
Out[35]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 Death Receptor Binding (GO:0005123) 0.000416 0.028729 0 0 123.875000 964.236668 EDA;NTF3
1 GO_Molecular_Function_2023 Purine Ribonucleoside Triphosphate Binding (GO... 0.007050 0.073312 0 0 8.626100 42.739637 ABCF2;RIGI;HSPA1A
2 GO_Molecular_Function_2023 Receptor Ligand Activity (GO:0048018) 0.007732 0.073312 0 0 5.742222 27.920829 EDA;CLCF1;NTF3;HSPA1A
3 GO_Molecular_Function_2023 Activin Receptor Activity (GO:0017002) 0.008500 0.073312 0 0 inf inf ACVR1C
4 GO_Molecular_Function_2023 Gap Junction Hemi-Channel Activity (GO:0055077) 0.008500 0.073312 0 0 inf inf PANX1
5 GO_Molecular_Function_2023 NF-kappaB Binding (GO:0051059) 0.008500 0.073312 0 0 inf inf TAF4B
6 GO_Molecular_Function_2023 Hydrolase Activity, Hydrolyzing N-glycosyl Com... 0.008500 0.073312 0 0 inf inf BST1
7 GO_Molecular_Function_2023 Ubiquitin-Protein Transferase Activator Activi... 0.008500 0.073312 0 0 inf inf RIGI
8 GO_Molecular_Function_2023 DNA Binding, Bending (GO:0008301) 0.016930 0.083096 0 0 120.151515 490.060134 TOP2A
9 GO_Molecular_Function_2023 Disordered Domain Specific Binding (GO:0097718) 0.016930 0.083096 0 0 120.151515 490.060134 HSPA1A
10 GO_Molecular_Function_2023 UDP Phosphatase Activity (GO:0045134) 0.016930 0.083096 0 0 120.151515 490.060134 ENTPD3
11 GO_Molecular_Function_2023 GDP Phosphatase Activity (GO:0004382) 0.016930 0.083096 0 0 120.151515 490.060134 ENTPD3
12 GO_Molecular_Function_2023 Nucleoside Diphosphate Phosphatase Activity (G... 0.016930 0.083096 0 0 120.151515 490.060134 ENTPD3
13 GO_Molecular_Function_2023 ATP Binding (GO:0005524) 0.017220 0.083096 0 0 11.204545 45.509349 ABCF2;HSPA1A
14 GO_Molecular_Function_2023 Ubiquitin Protein Ligase Binding (GO:0031625) 0.020064 0.083096 0 0 10.265625 40.126707 RIGI;HSPA1A
15 GO_Molecular_Function_2023 Ubiquitin-Like Protein Ligase Binding (GO:0044... 0.021555 0.083096 0 0 9.852500 37.805688 RIGI;HSPA1A
16 GO_Molecular_Function_2023 Growth Factor Activity (GO:0008083) 0.023090 0.083096 0 0 9.471154 35.690549 CLCF1;NTF3
17 GO_Molecular_Function_2023 Adenyl Ribonucleotide Binding (GO:0032559) 0.024670 0.083096 0 0 9.118056 33.756535 ABCF2;HSPA1A
18 GO_Molecular_Function_2023 C3HC4-type RING Finger Domain Binding (GO:0055... 0.025290 0.083096 0 0 60.060606 220.863728 HSPA1A
19 GO_Molecular_Function_2023 Ubiquitin Binding (GO:0043130) 0.025290 0.083096 0 0 60.060606 220.863728 TOP2A
In [36]:
Copied!
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", #title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image

similar story to cluster above. this time with PTN often found in the litterature with 6 other m Studies on the role of STEAP1, OR51E2, PTN, ABCC4, NDRG2 have been relatively in-depth. For instance, STEAP1 is all-called six-transmembrane epithelial antigen of the prostate 1, which belongs to a family of metalloproteinases involved in iron and copper homeostasis and other cellular processes

has been listed with 8 other members as predictors of CAF-related gene prognostic index (CRGPI) in prostate cancer. https://www.nature.com/articles/s41598-023-36125-0#Sec11

(NDRG2, TSPAN1, PTN, APOE, OR51E2, P4HB, STEAP1 and ABCC4 were used to construct molecular subtypes and)

https://www.genecards.org/cgi-bin/carddisp.pl?gene=GSN

3.2 Normal version¶

In [9]:
Copied!
grn.var['cluster_1.5'].value_counts().head(10)
grn.var['cluster_1.5'].value_counts().head(10)
Out[9]:
cluster_1.5
0     1620
1     1483
2      732
3      111
4       23
5        9
6        4
16       1
23       1
22       1
Name: count, dtype: int64

https://www.genecards.org/cgi-bin/carddisp.pl?gene=MYOC

In [37]:
Copied!
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='3'].index.tolist(), only=200, interactive=False)
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='3'].index.tolist(), only=200, interactive=False)
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['TMEM132A', 'ADGRA2', 'AGPAT4', 'THAP3', 'DNAJC25', 'TRAM2', 'NTN1',
       'STAG3', 'COL4A4', 'HSD17B14',
       ...
       'nan-75', 'TRABD2B', 'nan-77', 'PCGF2', 'nan-107', 'nan-182', 'nan-192',
       'nan-195', 'nan-239', 'nan-259'],
      dtype='object', name='symbol_2', length=111)
No description has been provided for this image
In [26]:
Copied!
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_3.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) #first_node = list(G.nodes)[-1] #pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_3.html")
In [27]:
Copied!
grn.var.loc['nan-259'] # par of the Glycoside Hydrolase Family
grn.var.loc['nan-259'] # par of the Glycoside Hydrolase Family
Out[27]:
uid                    62JAXxQNqtOd
symbol                      nan-259
ncbi_gene_ids                 83737
biotype              protein_coding
description           novel protein
synonyms                           
organism_id                       2
public_source_id                9.0
created_by_id                     1
mt                            False
ribo                          False
hb                            False
organism             NCBITaxon:9606
TFs                           False
ensembl_id          ENSG00000289720
centrality                      0.0
cluster_1.5                       3
Name: nan-259, dtype: object
In [12]:
Copied!
grn.get(grn.var[grn.var['cluster_1.5']=='3'].index.tolist()).grn.sum(0).sort_values(ascending=False).head(3)##.grn # selenom
grn.get(grn.var[grn.var['cluster_1.5']=='3'].index.tolist()).grn.sum(0).sort_values(ascending=False).head(3)##.grn # selenom
Out[12]:
symbol
RNASEK     0.942102
SELENOM    0.292933
nan-259    0.256140
dtype: float32

RNASEK acidifying the internal vesicle important in endocytosis# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10547866/ RNA phosphodiester bond hydrolysis --> hydrolisis pathway in the model and RNA transcription regulation

In [38]:
Copied!
# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)
# Assuming 'node_names' contains the list of gene names enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'], organism='Human', # change accordingly #description='pathway', #cutoff=0.08, # test dataset, use lower value for real case background=grn.var.symbol.tolist(), ) enr.res2d.head(20)
2024-07-25 18:33:49,291 [WARNING] Input library not found: Reactome. Skip
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/enrichr.py:643: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  self.results = pd.concat(self.results, ignore_index=True)
Out[38]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 Sulfotransferase Activity (GO:0008146) 0.007252 0.123369 0 0 20.138462 99.212171 CHST7;SULT1C4
1 GO_Molecular_Function_2023 RNA Polymerase II Transcription Regulatory Reg... 0.007713 0.123369 0 0 3.386005 16.472449 ZNF483;HAND2;NTN1;NFATC4;GLIS2;PURG;ZNF286A
2 GO_Molecular_Function_2023 ATPase Binding (GO:0051117) 0.013790 0.123369 0 0 13.415385 57.468682 SLC2A13;NKD2
3 GO_Molecular_Function_2023 Chondroitin Sulfotransferase Activity (GO:0034... 0.016750 0.123369 0 0 inf inf CHST7
4 GO_Molecular_Function_2023 Peptidyl-Proline 4-Dioxygenase Activity (GO:00... 0.016750 0.123369 0 0 inf inf P4HA3
5 GO_Molecular_Function_2023 Aryl Sulfotransferase Activity (GO:0004062) 0.016750 0.123369 0 0 inf inf SULT1C4
6 GO_Molecular_Function_2023 Histone Reader Activity (GO:0140566) 0.016750 0.123369 0 0 inf inf TONSL
7 GO_Molecular_Function_2023 Hydrolase Activity, Hydrolyzing N-glycosyl Com... 0.016750 0.123369 0 0 inf inf BST1
8 GO_Molecular_Function_2023 Tat Protein Binding (GO:0030957) 0.016750 0.123369 0 0 inf inf DLL1
9 GO_Molecular_Function_2023 WW Domain Binding (GO:0050699) 0.016750 0.123369 0 0 inf inf ENTREP3
10 GO_Molecular_Function_2023 RNA Polymerase II Cis-Regulatory Region Sequen... 0.016754 0.123369 0 0 3.180050 13.003634 ZNF483;HAND2;NTN1;NFATC4;GLIS2;ZNF286A
11 GO_Molecular_Function_2023 DNA-binding Transcription Activator Activity, ... 0.026719 0.134555 0 0 5.074219 18.380719 HAND2;INSL3;GLIS2
12 GO_Molecular_Function_2023 Procollagen-Proline Dioxygenase Activity (GO:0... 0.033223 0.134555 0 0 59.575758 202.825781 P4HA3
13 GO_Molecular_Function_2023 N-acetylglucosamine 6-O-sulfotransferase Activ... 0.033223 0.134555 0 0 59.575758 202.825781 CHST7
14 GO_Molecular_Function_2023 ABC-type Xenobiotic Transporter Activity (GO:0... 0.033223 0.134555 0 0 59.575758 202.825781 ABCC1
15 GO_Molecular_Function_2023 alpha-N-acetylgalactosaminide Alpha-2,6-Sialyl... 0.033223 0.134555 0 0 59.575758 202.825781 ST6GALNAC6
16 GO_Molecular_Function_2023 miRNA Binding (GO:0035198) 0.033223 0.134555 0 0 59.575758 202.825781 DND1
17 GO_Molecular_Function_2023 Myosin Heavy Chain Binding (GO:0032036) 0.033223 0.134555 0 0 59.575758 202.825781 NKD2
18 GO_Molecular_Function_2023 Estradiol 17-Beta-Dehydrogenase [NAD(P)] Activ... 0.033223 0.134555 0 0 59.575758 202.825781 HSD17B14
19 GO_Molecular_Function_2023 Solute:Proton Symporter Activity (GO:0015295) 0.033223 0.134555 0 0 59.575758 202.825781 SLC2A13
In [39]:
Copied!
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
            top_term=20,
            figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", #title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image

https://www.genecards.org/cgi-bin/carddisp.pl?gene=PTN

In [40]:
Copied!
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='4'].index.tolist(), only=80, interactive=False)#
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='4'].index.tolist(), only=80, interactive=False)#
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['TRHDE', 'PGR', 'ESR1', 'RASD2', 'SYNDIG1', 'OLFM2', 'FZD10', 'BHMT2',
       'DUSP26', 'FOXF2', 'ADAM33', 'ADAMTS12', 'HS3ST3A1', 'CHODL', 'FTCD',
       'FNDC1', 'HSPB3', 'FOXL1', 'AGMO', 'ADAMTSL2', 'S100A6', 'AKR1B10',
       'SCGB1C2'],
      dtype='object', name='symbol_2')
No description has been provided for this image
In [29]:
Copied!
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_4.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True) pnet.from_nx(G) #first_node = list(G.nodes)[-1] #pnet.get_node(first_node)['color'] = "red" pnet.save_graph("../figures/pyvis/grn_4.html")
In [41]:
Copied!
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['GO_Molecular_Function_2023', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)
enr = gp.enrichr(gene_list=list(G.nodes), gene_sets=['GO_Molecular_Function_2023', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'], organism='Human', # change accordingly #description='pathway', #cutoff=0.08, # test dataset, use lower value for real case background=grn.var.symbol.tolist(), ) enr.res2d.head(20)
2024-07-25 18:34:27,578 [WARNING] Input library not found: Reactome. Skip
Out[41]:
Gene_set Term P-value Adjusted P-value Old P-value Old adjusted P-value Odds Ratio Combined Score Genes
0 GO_Molecular_Function_2023 RNA Polymerase II General Transcription Initia... 0.000032 0.000933 0 0 inf inf FOXF2;ESR1
1 GO_Molecular_Function_2023 Estrogen Response Element Binding (GO:0034056) 0.000032 0.000933 0 0 inf inf PGR;ESR1
2 GO_Molecular_Function_2023 General Transcription Initiation Factor Bindin... 0.000095 0.001860 0 0 378.666667 3508.819727 FOXF2;ESR1
3 GO_Molecular_Function_2023 Transcription Coactivator Binding (GO:0001223) 0.000188 0.002780 0 0 189.285714 1623.428489 PGR;ESR1
4 GO_Molecular_Function_2023 Transition Metal Ion Binding (GO:0046914) 0.000446 0.005263 0 0 13.515099 104.272240 BHMT2;AGMO;ADAM33;TRHDE
5 GO_Molecular_Function_2023 Metalloendopeptidase Activity (GO:0004222) 0.000551 0.005414 0 0 22.794231 171.058857 ADAMTSL2;ADAM33;ADAMTS12
6 GO_Molecular_Function_2023 Metallopeptidase Activity (GO:0008237) 0.000672 0.005666 0 0 21.155357 154.536591 ADAMTSL2;ADAM33;ADAMTS12
7 GO_Molecular_Function_2023 Transcription Coregulator Binding (GO:0001221) 0.001111 0.008195 0 0 54.013605 367.419603 PGR;ESR1
8 GO_Molecular_Function_2023 DNA-binding Transcription Activator Activity, ... 0.001429 0.009365 0 0 15.972973 104.640056 FOXF2;PGR;ESR1
9 GO_Molecular_Function_2023 ATPase Binding (GO:0051117) 0.002016 0.011893 0 0 37.780952 234.495632 PGR;ESR1
10 GO_Molecular_Function_2023 Zinc Ion Binding (GO:0008270) 0.002284 0.012250 0 0 13.407955 81.545835 BHMT2;ADAM33;TRHDE
11 GO_Molecular_Function_2023 Cis-Regulatory Region Sequence-Specific DNA Bi... 0.004517 0.021203 0 0 6.945569 37.504924 FOXF2;PGR;FOXL1;ESR1
12 GO_Molecular_Function_2023 RNA Polymerase II Cis-Regulatory Region Sequen... 0.005523 0.021203 0 0 6.541596 34.009122 FOXF2;PGR;FOXL1;ESR1
13 GO_Molecular_Function_2023 DNA Binding, Bending (GO:0008301) 0.005750 0.021203 0 0 inf inf FOXL1
14 GO_Molecular_Function_2023 TBP-class Protein Binding (GO:0017025) 0.005750 0.021203 0 0 inf inf ESR1
15 GO_Molecular_Function_2023 [Heparan Sulfate]-Glucosamine 3-Sulfotransfera... 0.005750 0.021203 0 0 inf inf HS3ST3A1
16 GO_Molecular_Function_2023 RNA Polymerase II Transcription Regulatory Reg... 0.008164 0.028333 0 0 5.812950 27.949028 FOXF2;PGR;FOXL1;ESR1
17 GO_Molecular_Function_2023 Endopeptidase Activity (GO:0004175) 0.008882 0.028593 0 0 8.021918 37.893119 ADAMTSL2;ADAM33;ADAMTS12
18 GO_Molecular_Function_2023 DNA Binding (GO:0003677) 0.009208 0.028593 0 0 7.911486 37.086673 FOXF2;PGR;FOXL1
19 GO_Molecular_Function_2023 Heparan Sulfate Sulfotransferase Activity (GO:... 0.011468 0.030756 0 0 180.727273 807.520606 HS3ST3A1
In [42]:
Copied!
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
                top_term=20,
                figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50] ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(), column="Adjusted P-value", #title='normal fibro PAGE4', cmap=plt.cm.viridis, size=4, # adjust dot size top_term=20, figsize=(4,10), cutoff=0.25)
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(
No description has been provided for this image

4. Differential Gene - Gene connections¶

  • look at the connections of some key TFs in here
  • do GSEA over these connections
  • compare the connections of these same TFs for both conditions
In [5]:
Copied!
G = grn_c.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)#
G = grn_c.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)#
['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e'] Index(['MT2A', 'HSPA1A', 'CREM', 'PTN', 'HLA-A', 'SPARCL1', 'SPOCK3', 'NR4A3',
       'APOD', 'CALD1', 'MGP', 'HSPE1', 'EIF4A1', 'CD99', 'EMP1', 'FABP4',
       'NAMPT', 'NNMT', 'RBP1', 'LUM', 'IGFBP7', 'THBS1', 'YBX3', 'NAF1',
       'DCN', 'A2M', 'CPE', 'C1R', 'ATP6V0C', 'MATN2', 'COL6A2', 'UBE2S',
       'CST3', 'RHOQ', 'UBE2C', 'C1S', 'H2AZ1', 'GPRC5A', 'HLA-C', 'LGALS1',
       'PAGE4'],
      dtype='object', name='symbol_2')