scPRINT use case on BPH (part 2, GN analysis)¶

In this use-case, which some of the results are presented in Figure 5 of our manuscript, we perform an extensive analysis of gene networks generated by scPRINT in our previous notebook for fibroblasts of the prostate in both normal and benign prostatic hyperplasia states.

We can look at many patterns from the gene networks such as hub genes, gene modules, and gene-gene interactions. Some of them are related to highly variable and differentially expressed genes while others can only be infered by using the gene networks.

Table of content:

Hub genes
Network similarity
Network communities
1. Cancer version
2. Normal version
Differential Gene - Gene connections
Using classification

In [1]:

Copied!





from bengrn import BenGRN
import scanpy as sc
from bengrn.base import train_classifier

from anndata.utils import make_index_unique
from grnndata import utils as grnutils
from grnndata import read_h5ad

from matplotlib import pyplot as plt
from scdataloader import utils as data_utils
import numpy as np

from pyvis import network as pnx
import networkx as nx
import scipy.sparse
import pandas as pd
import gseapy as gp
from gseapy import dotplot

%load_ext autoreload
%autoreload 2
from bengrn import BenGRN
import scanpy as sc
from bengrn.base import train_classifier

from anndata.utils import make_index_unique
from grnndata import utils as grnutils
from grnndata import read_h5ad

from matplotlib import pyplot as plt
from scdataloader import utils as data_utils
import numpy as np

from pyvis import network as pnx
import networkx as nx
import scipy.sparse
import pandas as pd
import gseapy as gp
from gseapy import dotplot

%load_ext autoreload
%autoreload 2 

💡 connected lamindb: jkobject/scprint

In [ ]:

Copied!

prostate_combined = sc.read_h5ad("../data/prostate_combined_o2uniqsx.h5ad")
prostate_combined = sc.read_h5ad("../data/prostate_combined_o2uniqsx.h5ad")

In [2]:

Copied!





# load the generated gene networks from the previous notebook
grn = read_h5ad("../../data/prostate_fibro_grn.h5ad")
grn_c = read_h5ad("../../data/prostate_cancer_fibro_grn.h5ad")
#remove duplicates
grn.var.symbol = make_index_unique(grn.var.symbol.astype(str))
grn_c.var.symbol = make_index_unique(grn_c.var.symbol.astype(str))

# convert gene ids to symbols
grn.var['ensembl_id'] = grn.var.index
grn.var.index = grn.var.symbol
grn_c.var['ensembl_id'] = grn_c.var.index
grn_c.var.index = grn_c.var.symbol

grn.obs.cleaned_pred_disease_ontology_term_id.value_counts(),grn_c.obs.cleaned_pred_disease_ontology_term_id.value_counts()
# load the generated gene networks from the previous notebook
grn = read_h5ad("../../data/prostate_fibro_grn.h5ad")
grn_c = read_h5ad("../../data/prostate_cancer_fibro_grn.h5ad")
#remove duplicates
grn.var.symbol = make_index_unique(grn.var.symbol.astype(str))
grn_c.var.symbol = make_index_unique(grn_c.var.symbol.astype(str))

# convert gene ids to symbols
grn.var['ensembl_id'] = grn.var.index
grn.var.index = grn.var.symbol
grn_c.var['ensembl_id'] = grn_c.var.index
grn_c.var.index = grn_c.var.symbol

grn.obs.cleaned_pred_disease_ontology_term_id.value_counts(),grn_c.obs.cleaned_pred_disease_ontology_term_id.value_counts()

Out[2]:

(cleaned_pred_disease_ontology_term_id
 normal    300
 Name: count, dtype: int64,
 cleaned_pred_disease_ontology_term_id
 benign prostatic hyperplasia    300
 Name: count, dtype: int64)

1. Hub genes¶

We will use 2 different definition of centrality to compare the hubs of the networks in both states. The first one is the degree centrality, which is the number of connections of a node. The second one is the eigenvector centrality, which is the sum of the centrality of the neighbors of a node.

For each gene we mostly interogate their meaning with genecards, e.g. https://www.genecards.org/cgi-bin/carddisp.pl?gene=CD99

In [7]:

Copied!

# between the two networks we have ~3000 genes in common over their 4000 genes
common = set(grn_c.var.symbol) & set(grn.var.symbol)
len(common)
# between the two networks we have ~3000 genes in common over their 4000 genes
common = set(grn_c.var.symbol) & set(grn.var.symbol)
len(common)

Out[7]:

In [8]:

Copied!

# hub genes based on edge centrality (number of connections / strength of connections)
grn.grn.sum(0).sort_values(ascending=False).head(20)
# hub genes based on edge centrality (number of connections / strength of connections)
grn.grn.sum(0).sort_values(ascending=False).head(20)

Out[8]:

symbol
S100A6            58.131020
TGIF2-RAB5IF      51.177635
MIF               44.534096
DNAJB9            40.953262
IGFBP7            38.629910
APOD              38.003906
BRME1             37.598698
SPARCL1           37.160854
TIMP1             36.671650
DCN               34.621376
C1S               32.746254
MGP               30.496748
nan-270           29.884596
SLC25A6           29.308729
BLOC1S5-TXNDC5    28.986681
CETN1             28.746281
FTL               28.266533
AKR1C1            27.641245
FBLN1             26.828733
MT2A              25.590197
dtype: float32

In [7]:

Copied!

# without taking in account genes that are not present in the cancer network
grn.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
# without taking in account genes that are not present in the cancer network
grn.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)

Out[7]:

symbol
TGIF2-RAB5IF      51.177635
IGFBP7            38.629910
APOD              38.003906
BRME1             37.598698
SPARCL1           37.160854
TIMP1             36.671650
DCN               34.621376
C1S               32.746254
MGP               30.496748
BLOC1S5-TXNDC5    28.986681
AKR1C1            27.641245
FBLN1             26.828733
MT2A              25.590197
MYOC              25.428144
FAM205A           25.296257
YBX3              24.937267
CST3              24.676405
C11orf96          18.731924
TFPI              18.429214
ATP6V0C           17.695705
dtype: float32

In [11]:

Copied!

# cancer hub genes
grn_c.grn.sum(0).sort_values(ascending=False).head(20)
# cancer hub genes
grn_c.grn.sum(0).sort_values(ascending=False).head(20)

Out[11]:

symbol
HSPA1A          75.300415
MT2A            57.938267
CREM            42.478115
TGIF2-RAB5IF    39.980556
HSPE1           38.607624
CALD1           36.512047
SPOCK3          35.564945
HLA-A           33.446110
SPARCL1         33.403152
RBP1            32.482758
C1S             32.037743
BRME1-1         31.879770
FABP4           31.604069
nan-99          31.555256
LUM             30.523245
EIF4A1          29.690474
ATP6V0C         29.475002
PDZD2           27.333025
DEFA1           26.979454
CTSG            25.070097
dtype: float32

In [10]:

Copied!

# without taking in account genes not present in the normal network
grn_c.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)
# without taking in account genes not present in the normal network
grn_c.grn.sum(0).loc[list(common)].sort_values(ascending=False).head(20)

Out[10]:

symbol
HSPA1A            75.300415
MT2A              57.938267
TGIF2-RAB5IF      39.980556
SPOCK3            35.564945
HLA-A             33.446110
SPARCL1           33.403152
C1S               32.037743
nan-99            31.555256
LUM               30.523245
EIF4A1            29.690474
ATP6V0C           29.475002
DEFA1             26.979454
IGFBP7            23.982243
DCN               23.304916
MGP               22.596451
CD99              21.624668
BLOC1S5-TXNDC5    20.914230
SERPING1          20.824636
COL6A2            18.076559
THBS1             17.826540
dtype: float32

In [12]:

Copied!

# top differential hubs
TOP = 10
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
# top differential hubs
TOP = 10
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))

Out[12]:

{'HLA-A', 'HSPA1A', 'MT2A', 'SPOCK3'}

In [13]:

Copied!

TOP = 20
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 20
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))

Out[13]:

{'ATP6V0C', 'DEFA1', 'EIF4A1', 'HLA-A', 'HSPA1A', 'LUM', 'SPOCK3', 'nan-99'}

In [14]:

Copied!

TOP = 50
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 50
(set(grn_c.grn.sum(0).sort_values(ascending=False).head(TOP).index) - set(grn.grn.sum(0).sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))

Out[14]:

{'CD99',
 'CPE',
 'DEFA1',
 'EIF4A1',
 'HLA-A',
 'HSPA1A',
 'LGALS1',
 'LUM',
 'PYDC2',
 'SERPING1',
 'SPOCK3',
 'THBS1',
 'nan-191',
 'nan-99'}

overall we see that both are somewhat similar in the way the networks classifies main nodes but at least around half of the top 200 nodes are different between the two datasets and half of those are not due to diffent gene being used by the networks (50)

genes preferentially given as top elements in the tumor version only:

CD99
CPE
DEFA1
EIF4A1
HLA-A
HSPA1A
LGALS1
LUM
PYDC2
SERPING1
SPOCK3
THBS1

In [3]:

Copied!

# we now compute eigen centrality creating a sparse network by only keeping the top 20 neighbors for each gene in the network
TOP = 20

grnutils.get_centrality(grn, TOP, top_k_to_disp=0)
grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
# we now compute eigen centrality creating a sparse network by only keeping the top 20 neighbors for each gene in the network
TOP = 20

grnutils.get_centrality(grn, TOP, top_k_to_disp=0)
grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)

Top central genes: []
Top central genes: []

Out[3]:

[]

In [12]:

Copied!

grn_c.var.centrality.sort_values(ascending=False).head(10)
grn_c.var.centrality.sort_values(ascending=False).head(10)

Out[12]:

symbol
HSPA1A     0.271584
MT2A       0.271584
CREM       0.271574
CD99       0.271517
NR4A3      0.271385
RBP1       0.271130
SOX4       0.264214
ATP6V0C    0.263235
HLA-A      0.232913
LUM        0.202007
Name: centrality, dtype: float64

In [9]:

Copied!

grn_c.var.centrality.sort_values(ascending=False).head(10)
grn_c.var.centrality.sort_values(ascending=False).head(10)

Out[9]:

symbol
HSPA1A     0.271584
MT2A       0.271584
CREM       0.271574
CD99       0.271517
NR4A3      0.271385
RBP1       0.271130
SOX4       0.264214
ATP6V0C    0.263235
HLA-A      0.232913
LUM        0.202007
Name: centrality, dtype: float64

In [10]:

Copied!

grn.var.centrality.sort_values(ascending=False).head(10)
grn.var.centrality.sort_values(ascending=False).head(10)

Out[10]:

symbol
MT2A       0.276470
SLC25A6    0.276183
CXCL8      0.275859
FTH1       0.275757
MIF        0.267968
DNAJB9     0.265635
SOX4       0.241950
RELB       0.234482
YBX3       0.216567
CST3       0.213585
Name: centrality, dtype: float64

In [13]:

Copied!

TOP = 10
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 10
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))

Out[13]:

{'ATP6V0C', 'CD99', 'HLA-A', 'HSPA1A', 'LUM'}

In [12]:

Copied!

TOP = 20
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 20
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))

Out[12]:

{'ATP6V0C',
 'CD99',
 'EIF4A1',
 'HLA-A',
 'HSPA1A',
 'LUM',
 'PAGE4',
 'RYR2',
 'SERPINF1'}

In [11]:

Copied!

TOP = 50
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))
TOP = 50
(set(grn_c.var.centrality.sort_values(ascending=False).head(TOP).index) - set(grn.var.centrality.sort_values(ascending=False).head(TOP).index)) & (set(grn_c.var.symbol) & set(grn.var.symbol))

Out[11]:

{'C1R',
 'CD99',
 'COL6A2',
 'HLA-A',
 'HNRNPA0',
 'HSPA1A',
 'LUM',
 'PAGE4',
 'SERPINA3',
 'SERPING1',
 'SPOCK3',
 'THBS1'}

it seems the networks are even more similar when comparing them at the level of the centrality of their nodes. this means that some are highly central while maybe not

2. Network similarity¶

We now look at the similarity of the two networks based on general overlap of their top K edges across their common nodes

In [41]:

Copied!

grn.var.index = grn.var.ensembl_id
grn_c.var.index = grn_c.var.ensembl_id
overlap = list(set(grn.var.index) & set(grn_c.var.index))
grn.var.index = grn.var.ensembl_id
grn_c.var.index = grn_c.var.ensembl_id
overlap = list(set(grn.var.index) & set(grn_c.var.index))

In [43]:

Copied!





K = 20
subgrn_c = grn_c.get(overlap).grn
subgrn_c = subgrn_c.apply(lambda row: row >= row.nlargest(K).min(), axis=1)

subgrn = grn.get(overlap).grn
subgrn = subgrn.apply(lambda row: row >= row.nlargest(K).min(), axis=1)
(subgrn & subgrn_c).sum(1).mean() / K
K = 20
subgrn_c = grn_c.get(overlap).grn
subgrn_c = subgrn_c.apply(lambda row: row >= row.nlargest(K).min(), axis=1)

subgrn = grn.get(overlap).grn
subgrn = subgrn.apply(lambda row: row >= row.nlargest(K).min(), axis=1)
(subgrn & subgrn_c).sum(1).mean() / K

Out[43]:

0.5055735930735931

when looking at the top 20 connection for each genes in the network we see that on average only 50% of them agree between the two networks

3. Network communities¶

Again looking at the top 20 connections for each gene in the network we use the leiden (or louvain) algorithm to find communities in both networks.

We then study the hub of these communities as well as their enrichment using enrichr and ontology databases. We will only look at communities between 20 and 200 genes

In [33]:

Copied!

TOP = 20

grnutils.get_centrality(grn, TOP, top_k_to_disp=0)
grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)
TOP = 20

grnutils.get_centrality(grn, TOP, top_k_to_disp=0)
grnutils.get_centrality(grn_c, TOP, top_k_to_disp=0)

Top central genes: []
Top central genes: []

Out[33]:

[]

In [17]:

Copied!

grn_c = grnutils.compute_cluster(grn_c, 1.5, use="leiden", n_iterations=10, max_comm_size=100)
grn_c = grnutils.compute_cluster(grn_c, 1.5, use="leiden", n_iterations=10, max_comm_size=100)

In [4]:

Copied!

grn = grnutils.compute_cluster(grn, 1.5)
grn_c = grnutils.compute_cluster(grn_c, 1.5)
grn = grnutils.compute_cluster(grn, 1.5)
grn_c = grnutils.compute_cluster(grn_c, 1.5)

3.1 Cancer version¶

In [30]:

Copied!

# our communities
grn.var['cluster_1.5'].value_counts().head(10)
# our communities
grn.var['cluster_1.5'].value_counts().head(10)

Out[30]:

cluster_1.5
0     1620
1     1483
2      732
3      111
4       23
5        9
6        4
16       1
23       1
22       1
Name: count, dtype: int64

In [31]:

Copied!

G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='4'].index.tolist(), only=200, interactive=False)#
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='4'].index.tolist(), only=200, interactive=False)#

['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['PRDM6', 'MSANTD3', 'STAG3', 'TRHDE', 'TBX5', 'HEPH', 'GALNT16',
       'NFATC4', 'SYNDIG1', 'HSF4', 'ZNF423', 'PLA2G4C', 'PLPPR2', 'SCN1B',
       'ASPA', 'SNX15', 'BAG2', 'PTK7', 'HRH2', 'PLCL1', 'PTCH2', 'TGFB3',
       'RASL11A', 'TOX2', 'CPNE5', 'GLIS2', 'HIVEP3', 'FGF13', 'ACAP3',
       'DUSP26', 'CTIF', 'CAMK1', 'SLC9A5', 'FBN2', 'LRIG1', 'LIMD1', 'TRPC1',
       'SLC2A13', 'ADAMTS12', 'FZD7', 'AFF2', 'CACNA1D', 'MRAS', 'ST6GALNAC6',
       'FGFR4', 'PTGER1', 'PKDCC', 'SLC9B2', 'NPY1R', 'ANKRD33B', 'DCLK2',
       'PURG', 'GXYLT2', 'LRRN4CL', 'CA8', 'EXOC3L1', 'BHLHE22', 'AATK',
       'CMC4', 'BOLA2', 'CCR10', 'TCEAL2', 'SLC24A3', 'ROR1', 'SLC51B',
       'EMID1', 'AGMO', 'CENPP', 'OPTC', 'SPOCK3', 'COL27A1', 'DACT3', 'STMN3',
       'DLL1', 'SMOC1', 'COLGALT2', 'ZDBF2', 'DOK6', 'HNRNPUL2-BSCL2',
       'C3orf84', 'C2orf74', 'ARHGEF25', 'NPIPB5', 'TNFSF12-TNFSF13',
       'ZFP91-CNTF', 'nan-41', 'nan-81', 'nan-96', 'nan-98', 'HERC3',
       'nan-116', 'nan-117'],
      dtype='object', name='symbol_2')

No description has been provided for this image

In [26]:

Copied!





# if you want to draw better looking networks
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_4.html")
# if you want to draw better looking networks
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_4.html")

In [32]:

Copied!





# Assuming 'node_names' contains the list of gene names.
# Note: it seems with enrichr, one can only use one gene set at a time, choose accordingly
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn_c.var.symbol.tolist(),
)
enr.res2d.head(20)
# Assuming 'node_names' contains the list of gene names.
# Note: it seems with enrichr, one can only use one gene set at a time, choose accordingly
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn_c.var.symbol.tolist(),
)
enr.res2d.head(20)

2024-07-25 18:31:53,534 [WARNING] Input library not found: Reactome. Skip

Out[32]:

	Gene_set	Term	P-value	Adjusted P-value	Odds Ratio	Combined Score	Genes
0	GO_Molecular_Function_2023	Solute:Potassium Antiporter Activity (GO:0022821)	0.000523	0.050652	inf	inf	SLC24A3;SLC9A5
1	GO_Molecular_Function_2023	Coreceptor Activity Involved In Wnt Signaling ...	0.001547	0.050652	86.822222	561.890021	PTK7;ROR1
2	GO_Molecular_Function_2023	Sodium Ion Transmembrane Transporter Activity ...	0.001547	0.050652	86.822222	561.890021	SLC9A5;SLC9B2
3	GO_Molecular_Function_2023	Calcium Ion Transmembrane Transporter Activity...	0.001700	0.050652	16.432584	104.795452	SLC24A3;TRPC1;CACNA1D
4	GO_Molecular_Function_2023	Calcium-Dependent Phospholipid Binding (GO:000...	0.003047	0.050652	43.400000	251.445478	CPNE5;PLA2G4C
5	GO_Molecular_Function_2023	Coreceptor Activity Involved In Wnt Signaling ...	0.003047	0.050652	43.400000	251.445478	PTK7;ROR1
6	GO_Molecular_Function_2023	Metal Cation:Proton Antiporter Activity (GO:00...	0.003047	0.050652	43.400000	251.445478	SLC9A5;SLC9B2
7	GO_Molecular_Function_2023	Sodium:Proton Antiporter Activity (GO:0015385)	0.003047	0.050652	43.400000	251.445478	SLC9A5;SLC9B2
8	GO_Molecular_Function_2023	Iron Ion Binding (GO:0005506)	0.005002	0.070586	28.925926	153.247105	HEPH;AGMO
9	GO_Molecular_Function_2023	Calcium Channel Activity (GO:0005262)	0.005307	0.070586	10.099395	52.907606	SLC24A3;TRPC1;CACNA1D
10	GO_Molecular_Function_2023	Wnt Receptor Activity (GO:0042813)	0.007391	0.089364	21.688889	106.438022	FZD7;ROR1
11	GO_Molecular_Function_2023	RNA Polymerase II-specific DNA-binding Transcr...	0.010110	0.104284	7.715135	35.445265	TBX5;DUSP26;DLL1
12	GO_Molecular_Function_2023	Phospholipase Activity (GO:0004620)	0.010193	0.104284	17.346667	79.552387	PLCL1;PLA2G4C
13	GO_Molecular_Function_2023	Monoatomic Cation Channel Activity (GO:0005261)	0.011601	0.110210	7.284644	32.465172	SLC24A3;TRPC1;CACNA1D
14	GO_Molecular_Function_2023	Sodium Channel Regulator Activity (GO:0017080)	0.013389	0.118714	14.451852	62.335739	FGF13;SCN1B
15	GO_Molecular_Function_2023	RNA Polymerase II Cis-Regulatory Region Sequen...	0.022992	0.139045	2.668376	10.066740	HSF4;HIVEP3;BHLHE22;TBX5;ZNF423;NFATC4;GLIS2
16	GO_Molecular_Function_2023	Calcium:Sodium Antiporter Activity (GO:0005432)	0.023000	0.139045	inf	inf	SLC24A3
17	GO_Molecular_Function_2023	Potassium:Proton Antiporter Activity (GO:0015386)	0.023000	0.139045	inf	inf	SLC9A5
18	GO_Molecular_Function_2023	Xylosyltransferase Activity (GO:0042285)	0.023000	0.139045	inf	inf	GXYLT2
19	GO_Molecular_Function_2023	Tat Protein Binding (GO:0030957)	0.023000	0.139045	inf	inf	DLL1

In [33]:

Copied!





# we make a plot of the enrichment
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)
# we make a plot of the enrichment
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)

/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(

In [34]:

Copied!

G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='5'].index.tolist(), only=80, interactive=False)#
G = grn_c.plot_subgraph(grn_c.var[grn_c.var['cluster_1.5']=='5'].index.tolist(), only=80, interactive=False)#

['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['ABCF2', 'ITIH4', 'OLFM2', 'ICAM4', 'RIGI', 'KAZALD1', 'BST1', 'PANX1',
       'C9', 'FBXO2', 'ZNF410', 'ACVR1C', 'VAMP7', 'TOP2A', 'TAF4B',
       'KIAA1755', 'EDA', 'YPEL4', 'ENTPD3', 'GP9', 'NLGN2', 'CLCF1', 'RPRM',
       'ODF3B', 'MSC', 'NTF3', 'COL13A1', 'GP1BB', 'HSPA1A', 'TMEM158',
       'nan-82', 'NBPF19', 'SCO2', 'SMIM41'],
      dtype='object', name='symbol_2')

In [87]:

Copied!





pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_5.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_5.html")

In [44]:

Copied!

grn_c.var.loc['nan-82'] # par of the Glycoside Hydrolase Family
grn_c.var.loc['nan-82'] # par of the Glycoside Hydrolase Family

Out[44]:

uid                    4giyQrh7tkXI
symbol                       nan-82
ncbi_gene_ids                      
biotype              protein_coding
description           novel protein
synonyms                           
organism_id                       2
public_source_id                9.0
created_by_id                     1
mt                            False
ribo                          False
hb                            False
organism             NCBITaxon:9606
TFs                           False
ensembl_id          ENSG00000269590
centrality                      0.0
cluster_1.5                       5
Name: nan-82, dtype: object

In [35]:

Copied!





enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn_c.var.symbol.tolist(),
)
enr.res2d.head(20)
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn_c.var.symbol.tolist(),
)
enr.res2d.head(20)

2024-07-25 18:33:16,565 [WARNING] Input library not found: Reactome. Skip

Out[35]:

	Gene_set	Term	P-value	Adjusted P-value	Odds Ratio	Combined Score	Genes
0	GO_Molecular_Function_2023	Death Receptor Binding (GO:0005123)	0.000416	0.028729	123.875000	964.236668	EDA;NTF3
1	GO_Molecular_Function_2023	Purine Ribonucleoside Triphosphate Binding (GO...	0.007050	0.073312	8.626100	42.739637	ABCF2;RIGI;HSPA1A
2	GO_Molecular_Function_2023	Receptor Ligand Activity (GO:0048018)	0.007732	0.073312	5.742222	27.920829	EDA;CLCF1;NTF3;HSPA1A
3	GO_Molecular_Function_2023	Activin Receptor Activity (GO:0017002)	0.008500	0.073312	inf	inf	ACVR1C
4	GO_Molecular_Function_2023	Gap Junction Hemi-Channel Activity (GO:0055077)	0.008500	0.073312	inf	inf	PANX1
5	GO_Molecular_Function_2023	NF-kappaB Binding (GO:0051059)	0.008500	0.073312	inf	inf	TAF4B
6	GO_Molecular_Function_2023	Hydrolase Activity, Hydrolyzing N-glycosyl Com...	0.008500	0.073312	inf	inf	BST1
7	GO_Molecular_Function_2023	Ubiquitin-Protein Transferase Activator Activi...	0.008500	0.073312	inf	inf	RIGI
8	GO_Molecular_Function_2023	DNA Binding, Bending (GO:0008301)	0.016930	0.083096	120.151515	490.060134	TOP2A
9	GO_Molecular_Function_2023	Disordered Domain Specific Binding (GO:0097718)	0.016930	0.083096	120.151515	490.060134	HSPA1A
10	GO_Molecular_Function_2023	UDP Phosphatase Activity (GO:0045134)	0.016930	0.083096	120.151515	490.060134	ENTPD3
11	GO_Molecular_Function_2023	GDP Phosphatase Activity (GO:0004382)	0.016930	0.083096	120.151515	490.060134	ENTPD3
12	GO_Molecular_Function_2023	Nucleoside Diphosphate Phosphatase Activity (G...	0.016930	0.083096	120.151515	490.060134	ENTPD3
13	GO_Molecular_Function_2023	ATP Binding (GO:0005524)	0.017220	0.083096	11.204545	45.509349	ABCF2;HSPA1A
14	GO_Molecular_Function_2023	Ubiquitin Protein Ligase Binding (GO:0031625)	0.020064	0.083096	10.265625	40.126707	RIGI;HSPA1A
15	GO_Molecular_Function_2023	Ubiquitin-Like Protein Ligase Binding (GO:0044...	0.021555	0.083096	9.852500	37.805688	RIGI;HSPA1A
16	GO_Molecular_Function_2023	Growth Factor Activity (GO:0008083)	0.023090	0.083096	9.471154	35.690549	CLCF1;NTF3
17	GO_Molecular_Function_2023	Adenyl Ribonucleotide Binding (GO:0032559)	0.024670	0.083096	9.118056	33.756535	ABCF2;HSPA1A
18	GO_Molecular_Function_2023	C3HC4-type RING Finger Domain Binding (GO:0055...	0.025290	0.083096	60.060606	220.863728	HSPA1A
19	GO_Molecular_Function_2023	Ubiquitin Binding (GO:0043130)	0.025290	0.083096	60.060606	220.863728	TOP2A

In [36]:

Copied!





enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)

/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(

similar story to cluster above. this time with PTN often found in the litterature with 6 other m Studies on the role of STEAP1, OR51E2, PTN, ABCC4, NDRG2 have been relatively in-depth. For instance, STEAP1 is all-called six-transmembrane epithelial antigen of the prostate 1, which belongs to a family of metalloproteinases involved in iron and copper homeostasis and other cellular processes

has been listed with 8 other members as predictors of CAF-related gene prognostic index (CRGPI) in prostate cancer. https://www.nature.com/articles/s41598-023-36125-0#Sec11

(NDRG2, TSPAN1, PTN, APOE, OR51E2, P4HB, STEAP1 and ABCC4 were used to construct molecular subtypes and)

https://www.genecards.org/cgi-bin/carddisp.pl?gene=GSN

3.2 Normal version¶

In [9]:

Copied!

grn.var['cluster_1.5'].value_counts().head(10)
grn.var['cluster_1.5'].value_counts().head(10)

Out[9]:

cluster_1.5
0     1620
1     1483
2      732
3      111
4       23
5        9
6        4
16       1
23       1
22       1
Name: count, dtype: int64

https://www.genecards.org/cgi-bin/carddisp.pl?gene=MYOC

In [37]:

Copied!

G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='3'].index.tolist(), only=200, interactive=False)
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='3'].index.tolist(), only=200, interactive=False)

['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['TMEM132A', 'ADGRA2', 'AGPAT4', 'THAP3', 'DNAJC25', 'TRAM2', 'NTN1',
       'STAG3', 'COL4A4', 'HSD17B14',
       ...
       'nan-75', 'TRABD2B', 'nan-77', 'PCGF2', 'nan-107', 'nan-182', 'nan-192',
       'nan-195', 'nan-239', 'nan-259'],
      dtype='object', name='symbol_2', length=111)

In [26]:

Copied!





pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_3.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_3.html")

In [27]:

Copied!

grn.var.loc['nan-259'] # par of the Glycoside Hydrolase Family
grn.var.loc['nan-259'] # par of the Glycoside Hydrolase Family

Out[27]:

uid                    62JAXxQNqtOd
symbol                      nan-259
ncbi_gene_ids                 83737
biotype              protein_coding
description           novel protein
synonyms                           
organism_id                       2
public_source_id                9.0
created_by_id                     1
mt                            False
ribo                          False
hb                            False
organism             NCBITaxon:9606
TFs                           False
ensembl_id          ENSG00000289720
centrality                      0.0
cluster_1.5                       3
Name: nan-259, dtype: object

In [12]:

Copied!

grn.get(grn.var[grn.var['cluster_1.5']=='3'].index.tolist()).grn.sum(0).sort_values(ascending=False).head(3)##.grn # selenom
grn.get(grn.var[grn.var['cluster_1.5']=='3'].index.tolist()).grn.sum(0).sort_values(ascending=False).head(3)##.grn # selenom

Out[12]:

symbol
RNASEK     0.942102
SELENOM    0.292933
nan-259    0.256140
dtype: float32

RNASEK acidifying the internal vesicle important in endocytosis# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10547866/ RNA phosphodiester bond hydrolysis --> hydrolisis pathway in the model and RNA transcription regulation

In [38]:

Copied!





# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)
# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)

2024-07-25 18:33:49,291 [WARNING] Input library not found: Reactome. Skip
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/enrichr.py:643: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  self.results = pd.concat(self.results, ignore_index=True)

Out[38]:

	Gene_set	Term	P-value	Adjusted P-value	Odds Ratio	Combined Score	Genes
0	GO_Molecular_Function_2023	Sulfotransferase Activity (GO:0008146)	0.007252	0.123369	20.138462	99.212171	CHST7;SULT1C4
1	GO_Molecular_Function_2023	RNA Polymerase II Transcription Regulatory Reg...	0.007713	0.123369	3.386005	16.472449	ZNF483;HAND2;NTN1;NFATC4;GLIS2;PURG;ZNF286A
2	GO_Molecular_Function_2023	ATPase Binding (GO:0051117)	0.013790	0.123369	13.415385	57.468682	SLC2A13;NKD2
3	GO_Molecular_Function_2023	Chondroitin Sulfotransferase Activity (GO:0034...	0.016750	0.123369	inf	inf	CHST7
4	GO_Molecular_Function_2023	Peptidyl-Proline 4-Dioxygenase Activity (GO:00...	0.016750	0.123369	inf	inf	P4HA3
5	GO_Molecular_Function_2023	Aryl Sulfotransferase Activity (GO:0004062)	0.016750	0.123369	inf	inf	SULT1C4
6	GO_Molecular_Function_2023	Histone Reader Activity (GO:0140566)	0.016750	0.123369	inf	inf	TONSL
7	GO_Molecular_Function_2023	Hydrolase Activity, Hydrolyzing N-glycosyl Com...	0.016750	0.123369	inf	inf	BST1
8	GO_Molecular_Function_2023	Tat Protein Binding (GO:0030957)	0.016750	0.123369	inf	inf	DLL1
9	GO_Molecular_Function_2023	WW Domain Binding (GO:0050699)	0.016750	0.123369	inf	inf	ENTREP3
10	GO_Molecular_Function_2023	RNA Polymerase II Cis-Regulatory Region Sequen...	0.016754	0.123369	3.180050	13.003634	ZNF483;HAND2;NTN1;NFATC4;GLIS2;ZNF286A
11	GO_Molecular_Function_2023	DNA-binding Transcription Activator Activity, ...	0.026719	0.134555	5.074219	18.380719	HAND2;INSL3;GLIS2
12	GO_Molecular_Function_2023	Procollagen-Proline Dioxygenase Activity (GO:0...	0.033223	0.134555	59.575758	202.825781	P4HA3
13	GO_Molecular_Function_2023	N-acetylglucosamine 6-O-sulfotransferase Activ...	0.033223	0.134555	59.575758	202.825781	CHST7
14	GO_Molecular_Function_2023	ABC-type Xenobiotic Transporter Activity (GO:0...	0.033223	0.134555	59.575758	202.825781	ABCC1
15	GO_Molecular_Function_2023	alpha-N-acetylgalactosaminide Alpha-2,6-Sialyl...	0.033223	0.134555	59.575758	202.825781	ST6GALNAC6
16	GO_Molecular_Function_2023	miRNA Binding (GO:0035198)	0.033223	0.134555	59.575758	202.825781	DND1
17	GO_Molecular_Function_2023	Myosin Heavy Chain Binding (GO:0032036)	0.033223	0.134555	59.575758	202.825781	NKD2
18	GO_Molecular_Function_2023	Estradiol 17-Beta-Dehydrogenase [NAD(P)] Activ...	0.033223	0.134555	59.575758	202.825781	HSD17B14
19	GO_Molecular_Function_2023	Solute:Proton Symporter Activity (GO:0015295)	0.033223	0.134555	59.575758	202.825781	SLC2A13

In [39]:

Copied!





enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
            top_term=20,
            figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
            top_term=20,
            figsize=(4,10), cutoff=0.25)

/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(

https://www.genecards.org/cgi-bin/carddisp.pl?gene=PTN

In [40]:

Copied!

G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='4'].index.tolist(), only=80, interactive=False)#
G = grn.plot_subgraph(grn.var[grn.var['cluster_1.5']=='4'].index.tolist(), only=80, interactive=False)#

['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4'] Index(['TRHDE', 'PGR', 'ESR1', 'RASD2', 'SYNDIG1', 'OLFM2', 'FZD10', 'BHMT2',
       'DUSP26', 'FOXF2', 'ADAM33', 'ADAMTS12', 'HS3ST3A1', 'CHODL', 'FTCD',
       'FNDC1', 'HSPB3', 'FOXL1', 'AGMO', 'ADAMTSL2', 'S100A6', 'AKR1B10',
       'SCGB1C2'],
      dtype='object', name='symbol_2')

In [29]:

Copied!





pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_4.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
#first_node = list(G.nodes)[-1]
#pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_4.html")

In [41]:

Copied!





enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['GO_Molecular_Function_2023', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['GO_Molecular_Function_2023', 'MSigDB_Hallmark_2020', 'Reactome_2022', 'Tabula_Sapiens', 'WikiPathway_2023_Human', 'TF_Perturbations_Followed_by_Expression', 'Reactome', 'PPI_Hub_Proteins', 'OMIM_Disease', 'GO_Molecular_Function_2023'],
                 organism='Human', # change accordingly
                 #description='pathway',
                 #cutoff=0.08, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)

2024-07-25 18:34:27,578 [WARNING] Input library not found: Reactome. Skip

Out[41]:

	Gene_set	Term	P-value	Adjusted P-value	Odds Ratio	Combined Score	Genes
0	GO_Molecular_Function_2023	RNA Polymerase II General Transcription Initia...	0.000032	0.000933	inf	inf	FOXF2;ESR1
1	GO_Molecular_Function_2023	Estrogen Response Element Binding (GO:0034056)	0.000032	0.000933	inf	inf	PGR;ESR1
2	GO_Molecular_Function_2023	General Transcription Initiation Factor Bindin...	0.000095	0.001860	378.666667	3508.819727	FOXF2;ESR1
3	GO_Molecular_Function_2023	Transcription Coactivator Binding (GO:0001223)	0.000188	0.002780	189.285714	1623.428489	PGR;ESR1
4	GO_Molecular_Function_2023	Transition Metal Ion Binding (GO:0046914)	0.000446	0.005263	13.515099	104.272240	BHMT2;AGMO;ADAM33;TRHDE
5	GO_Molecular_Function_2023	Metalloendopeptidase Activity (GO:0004222)	0.000551	0.005414	22.794231	171.058857	ADAMTSL2;ADAM33;ADAMTS12
6	GO_Molecular_Function_2023	Metallopeptidase Activity (GO:0008237)	0.000672	0.005666	21.155357	154.536591	ADAMTSL2;ADAM33;ADAMTS12
7	GO_Molecular_Function_2023	Transcription Coregulator Binding (GO:0001221)	0.001111	0.008195	54.013605	367.419603	PGR;ESR1
8	GO_Molecular_Function_2023	DNA-binding Transcription Activator Activity, ...	0.001429	0.009365	15.972973	104.640056	FOXF2;PGR;ESR1
9	GO_Molecular_Function_2023	ATPase Binding (GO:0051117)	0.002016	0.011893	37.780952	234.495632	PGR;ESR1
10	GO_Molecular_Function_2023	Zinc Ion Binding (GO:0008270)	0.002284	0.012250	13.407955	81.545835	BHMT2;ADAM33;TRHDE
11	GO_Molecular_Function_2023	Cis-Regulatory Region Sequence-Specific DNA Bi...	0.004517	0.021203	6.945569	37.504924	FOXF2;PGR;FOXL1;ESR1
12	GO_Molecular_Function_2023	RNA Polymerase II Cis-Regulatory Region Sequen...	0.005523	0.021203	6.541596	34.009122	FOXF2;PGR;FOXL1;ESR1
13	GO_Molecular_Function_2023	DNA Binding, Bending (GO:0008301)	0.005750	0.021203	inf	inf	FOXL1
14	GO_Molecular_Function_2023	TBP-class Protein Binding (GO:0017025)	0.005750	0.021203	inf	inf	ESR1
15	GO_Molecular_Function_2023	[Heparan Sulfate]-Glucosamine 3-Sulfotransfera...	0.005750	0.021203	inf	inf	HS3ST3A1
16	GO_Molecular_Function_2023	RNA Polymerase II Transcription Regulatory Reg...	0.008164	0.028333	5.812950	27.949028	FOXF2;PGR;FOXL1;ESR1
17	GO_Molecular_Function_2023	Endopeptidase Activity (GO:0004175)	0.008882	0.028593	8.021918	37.893119	ADAMTSL2;ADAM33;ADAMTS12
18	GO_Molecular_Function_2023	DNA Binding (GO:0003677)	0.009208	0.028593	7.911486	37.086673	FOXF2;PGR;FOXL1
19	GO_Molecular_Function_2023	Heparan Sulfate Sulfotransferase Activity (GO:...	0.011468	0.030756	180.727273	807.520606	HS3ST3A1

In [42]:

Copied!





enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
                top_term=20,
                figsize=(4,10), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split(',').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             #title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
                top_term=20,
                figsize=(4,10), cutoff=0.25)

/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(

4. Differential Gene - Gene connections¶

look at the connections of some key TFs in here
do GSEA over these connections
compare the connections of these same TFs for both conditions

In [5]:

Copied!

G = grn_c.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)#
G = grn_c.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)#

['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e'] Index(['MT2A', 'HSPA1A', 'CREM', 'PTN', 'HLA-A', 'SPARCL1', 'SPOCK3', 'NR4A3',
       'APOD', 'CALD1', 'MGP', 'HSPE1', 'EIF4A1', 'CD99', 'EMP1', 'FABP4',
       'NAMPT', 'NNMT', 'RBP1', 'LUM', 'IGFBP7', 'THBS1', 'YBX3', 'NAF1',
       'DCN', 'A2M', 'CPE', 'C1R', 'ATP6V0C', 'MATN2', 'COL6A2', 'UBE2S',
       'CST3', 'RHOQ', 'UBE2C', 'C1S', 'H2AZ1', 'GPRC5A', 'HLA-C', 'LGALS1',
       'PAGE4'],
      dtype='object', name='symbol_2')

In [96]:

Copied!





pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
first_node = list(G.nodes)[-1]
pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_PAGE4.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
first_node = list(G.nodes)[-1]
pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_c_PAGE4.html")

In [22]:

Copied!





# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=[
                #'KEGG_2021_Human', 
                 #'MSigDB_Hallmark_2020',
                 #'Reactome_2022', 
                 #'Tabula_Sapiens', 
                 'WikiPathway_2023_Human', #'TF_Perturbations_Followed_by_Expression', 
                 #'PPI_Hub_Proteins', 
                 #'OMIM_Disease', 
                 #'GO_Molecular_Function_2023'
                 ],
                 organism='Human', # change accordingly
                 #description='pathway',
                # cutoff=0.02, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(10)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split('R-HSA').str[0].str.split(' WP').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)
# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=[
                #'KEGG_2021_Human', 
                 #'MSigDB_Hallmark_2020',
                 #'Reactome_2022', 
                 #'Tabula_Sapiens', 
                 'WikiPathway_2023_Human', #'TF_Perturbations_Followed_by_Expression', 
                 #'PPI_Hub_Proteins', 
                 #'OMIM_Disease', 
                 #'GO_Molecular_Function_2023'
                 ],
                 organism='Human', # change accordingly
                 #description='pathway',
                # cutoff=0.02, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(10)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0].str.split('R-HSA').str[0].str.split(' WP').str[0].str[:50]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             top_term=20,
             figsize=(4,10), cutoff=0.25)

/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(

In [139]:

Copied!





enr.res2d.Term = enr.res2d.Term.str.split("(").str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             figsize=(4,5), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             figsize=(4,5), cutoff=0.25)

/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(

In [17]:

Copied!

G = grn.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)
G = grn.plot_subgraph("PAGE4", only=100, max_genes=40, interactive=False)

['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e'] Index(['nan-270', 'S100A6', 'MT2A', 'SPARCL1', 'MIF', 'SLC25A6', 'IGFBP7',
       'APOD', 'MGP', 'DCN', 'A2M', 'PTN', 'RELB', 'DNAJB9', 'AKR1C1', 'C1S',
       'CST3', 'CXCL8', 'FTH1', 'BRME1', 'FAM205A', 'TIMP1', 'YBX3', 'NPPB',
       'TGIF2-RAB5IF', 'FBLN1', 'MTDH', 'IGFBP5', 'TFPI', 'MAP1B', 'COL6A2',
       'HNRNPA0', 'EIF4A1', 'CCL21', 'FTL', 'CETN1', 'NME2', 'LSAMP', 'ACR',
       'ATP6V0C', 'PAGE4'],
      dtype='object', name='symbol_2')

In [98]:

Copied!





pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
first_node = list(G.nodes)[-1]
pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_PAGE4.html")
pnet = pnx.Network(notebook=True, cdn_resources="remote", directed=True)
pnet.from_nx(G)
first_node = list(G.nodes)[-1]
pnet.get_node(first_node)['color'] = "red"
pnet.save_graph("../figures/pyvis/grn_PAGE4.html")

In [24]:

Copied!





# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 
                 #'MSigDB_Hallmark_2020',
                 #'Reactome_2022',
                 #'Tabula_Sapiens',
                 #'WikiPathway_2023_Human',
                 # 'TF_Perturbations_Followed_by_Expression', 
                 # 'Reactome', 
                 #'PPI_Hub_Proteins', 
                 #'OMIM_Disease',
                 'GO_Molecular_Function_2023'
                 ],
                 organism='Human', # change accordingly
                 #description='pathway',
                # cutoff=0.02, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)
# Assuming 'node_names' contains the list of gene names
enr = gp.enrichr(gene_list=list(G.nodes),
                 gene_sets=['KEGG_2021_Human', 
                 #'MSigDB_Hallmark_2020',
                 #'Reactome_2022',
                 #'Tabula_Sapiens',
                 #'WikiPathway_2023_Human',
                 # 'TF_Perturbations_Followed_by_Expression', 
                 # 'Reactome', 
                 #'PPI_Hub_Proteins', 
                 #'OMIM_Disease',
                 'GO_Molecular_Function_2023'
                 ],
                 organism='Human', # change accordingly
                 #description='pathway',
                # cutoff=0.02, # test dataset, use lower value for real case  
                 background=grn.var.symbol.tolist(),
)
enr.res2d.head(20)

Out[24]:

	Gene_set	Term	P-value	Adjusted P-value	Odds Ratio	Combined Score	Genes
0	GO_Molecular_Function_2023	Peptidase Inhibitor Activity (GO:0030414)	0.000020	0.001566	104.105263	1128.623016	CST3;TIMP1;TFPI
1	GO_Molecular_Function_2023	Transition Metal Ion Binding (GO:0046914)	0.000463	0.016191	9.025463	69.301557	ACR;MT2A;FTH1;TIMP1;FTL
2	GO_Molecular_Function_2023	Double-Stranded RNA Binding (GO:0003725)	0.000607	0.016191	101.461538	751.496153	EIF4A1;MTDH
3	GO_Molecular_Function_2023	Endopeptidase Inhibitor Activity (GO:0004866)	0.001010	0.019977	18.306502	126.278211	CST3;TIMP1;TFPI
4	GO_Molecular_Function_2023	Endopeptidase Regulator Activity (GO:0061135)	0.001498	0.019977	50.705128	329.757829	CST3;TFPI
5	GO_Molecular_Function_2023	Ferrous Iron Binding (GO:0008198)	0.001498	0.019977	50.705128	329.757829	FTH1;FTL
6	GO_Molecular_Function_2023	Protease Binding (GO:0002020)	0.002465	0.027606	12.944079	77.736001	CST3;ACR;TIMP1
7	GO_Molecular_Function_2023	Iron Ion Binding (GO:0005506)	0.002761	0.027606	33.786325	199.079000	FTH1;FTL
8	GO_Molecular_Function_2023	Calcium Ion Binding (GO:0005509)	0.003252	0.028910	7.400664	42.393777	CETN1;S100A6;SPARCL1;FBLN1
9	GO_Molecular_Function_2023	Kinase Binding (GO:0019900)	0.005211	0.038377	9.688322	50.930509	PTN;RELB;HNRNPA0
10	GO_Molecular_Function_2023	RNA Binding (GO:0003723)	0.005277	0.038377	5.000000	26.222182	EIF4A1;YBX3;DCN;MTDH;HNRNPA0
11	GO_Molecular_Function_2023	Chemokine Activity (GO:0008009)	0.007445	0.039047	18.405594	90.191859	CXCL8;CCL21
12	GO_Molecular_Function_2023	Chemokine Receptor Binding (GO:0042379)	0.007445	0.039047	18.405594	90.191859	CXCL8;CCL21
13	GO_Molecular_Function_2023	Metal Ion Binding (GO:0046872)	0.008569	0.039047	5.523471	26.289761	CETN1;S100A6;SPARCL1;FBLN1
14	GO_Molecular_Function_2023	Protein Kinase Binding (GO:0019901)	0.009285	0.039047	7.734868	36.193807	PTN;RELB;HNRNPA0
15	GO_Molecular_Function_2023	mRNA 3'-UTR Binding (GO:0003730)	0.009893	0.039047	15.566075	71.852090	YBX3;HNRNPA0
16	GO_Molecular_Function_2023	NF-kappaB Binding (GO:0051059)	0.010250	0.039047	inf	inf	MTDH
17	GO_Molecular_Function_2023	Phosphatase Inhibitor Activity (GO:0019212)	0.010250	0.039047	inf	inf	PTN
18	GO_Molecular_Function_2023	Phosphotransferase Activity, Phosphate Group A...	0.010250	0.039047	inf	inf	NME2
19	GO_Molecular_Function_2023	Nucleobase-Containing Compound Kinase Activity...	0.010250	0.039047	inf	inf	NME2

In [30]:

Copied!





enr.res2d.Term = enr.res2d.Term.str.split("(").str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             figsize=(4,5), cutoff=0.25)
enr.res2d.Term = enr.res2d.Term.str.split("(").str[0]
ax = dotplot(enr.res2d.replace([np.inf, -np.inf], 0).dropna(),
             column="Adjusted P-value",
             title='normal fibro PAGE4',
             cmap=plt.cm.viridis,
             size=4, # adjust dot size
             figsize=(4,5), cutoff=0.25)

/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: The 'method' keyword in Series.replace is deprecated and will be removed in a future version.
  df[self.colname].replace(
/home/ml4ig1/miniconda3/envs/scprint/lib/python3.10/site-packages/gseapy/plot.py:689: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[self.colname].replace(

5. Using classification¶

Here we used the network given by the mean of all heads in our scPRINT model. We could also use other scPRINT versions to see if the results are consistent. but we could also use the classification mechanism presented in our manuscript and other notebooks to get to possibly more specialized networks.

In [ ]:

Copied!

grn_normal = grn_inferer(layer=list(range(model.nlayers))[:], cell_type="naive B cell")
grn_normal, m, omni_cls = train_classifier(grn_normal, C=0.5, train_size=0.9, class_weight={
                                    1: 200, 0: 1}, shuffle=True)
grn_normal = grn_inferer(layer=list(range(model.nlayers))[:], cell_type="naive B cell")
grn_normal, m, omni_cls = train_classifier(grn_normal, C=0.5, train_size=0.9, class_weight={
                                    1: 200, 0: 1}, shuffle=True)

number of expressed genes in this cell type: 16801

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

Predicting: 0it [00:00, ?it/s]

true elem 2115 ...
doing regression....
metrics {'used_heads': 25, 'precision': 0.0, 'random_precision': 0.0006433550820644942, 'recall': 0.0, 'predicted_true': 177.0, 'number_of_true': 219.0, 'epr': 0.0}

In [ ]:

Copied!

# highlight differential links on genes that are expressed in both 
grn_normal.varp['all'] = grn_normal.varp['GRN'].copy()
grn_normal.varp['GRN'] = grn_normal.varp['classified']
# highlight differential links on genes that are expressed in both 
grn_normal.varp['all'] = grn_normal.varp['GRN'].copy()
grn_normal.varp['GRN'] = grn_normal.varp['classified']

Notes on findings and their relation to the literature¶

CAV1-expressing CAFs contribute to invasion and metastasis in breast cancer24 loss of caveolin1 (CAV1) is found in metabolically reprogrammed CAFs that promote tumorigenesis

discoidin domain-containing receptor 2 (DDR2)21 and integrin α11β122, have emerged to identify CAFs in the context of a specific TME https://www.nature.com/articles/s12276-023-01013-0

normal fibroblast populations marked by the expression of Gli1 and Hoxb6

CAFs, remain perpetually activated with a high capacity for ECM synthesis and microenvironmental remodeling, leading to stromal desmoplasia, a phenomenon characterized by increased deposition of ECM components in tumors. In this regard, CAFs share many basic characteristics, such as a secretory phenotype and capacity to synthesize ECM components, with fibroblasts found in nonmalignant tissue fibrosis. Therefore, the classic markers found to be expressed in fibroblasts, including α-SMA, vimentin, desmin, fibroblast-specific protein 1 (FSP1; also known as S100A4)

Fibrogenesis is part of a normal protective response to tissue injury that can become irreversible and progressive, leading to fatal diseases. Senescent cells are a main driver of fibrotic diseases through their secretome, known as senescence-associated secretory phenotype (SASP) CAFs (vCAFs), cycling CAFs (cCAFs), and developmental CAFs (dCAFs)

The stem-like ‘universal’ type of fibroblast cell, marked by expression of peptidase inhibitor 16 (Pi16) and Col15a, found in the steady state across tissues, activated fibroblasts, as observed by the development of LRRC15+ CAFs in PDAC.

. SOCS1: Suppressor of cytokine signaling; STAT3: Signal transducer and activator of transcription 3; LDH: Lactate dehydrogenase; PYCR1: pyrroline-5-carboxylate reductase

senescence, and multiple types of fibrotic diseases in mice and humans are characterized by the accumulation of iron. https://www.nature.com/articles/s42255-023-00928-2

Current evidence indicate that cancer-associated fibroblasts (CAFs) play an important role in prostate cancer (PCa) development and progression. playing a role in invasion.

NDRG2, TSPAN1, PTN, APOE, OR51E2, P4HB, STEAP1 and ABCC4 were used to construct molecular subtypes and CAF-related gene prognostic index (CRGPI)

Studies on the role of STEAP1, OR51E2, PTN, ABCC4, NDRG2 have been relatively in-depth. For instance, STEAP1 is all-called six-transmembrane epithelial antigen of the prostate 1, which belongs to a family of metalloproteinases involved in iron and copper homeostasis and other cellular processes37. STEAP1 is overexpressed on the plasma membrane of PCa cells and is associated with PCa invasiveness and metastasis50. The possible mechanism of STEAP1 promoting tumor proliferation and metastasis is by acting as a channel for small molecules that are involved in intercellular communication OR51E2 perhaps could be used as one of the biomarkers for PCa53,54,55,56 https://www.nature.com/articles/s41598-023-36125-0#Sec11

The protein encoded by ABCC4 is a member of the superfamily of ATP-binding cassette (ABC) transporters, which is also called multidrug resistance protein 4 (MRP4)64. MRP4 was reported to be associated with drug resistance of PCa in many studies65,66,67

PRRX1, OSR1, FOXD1, HOXC4,LHX9, TBX3 and TWIST2, targeted five or more TFs inthe fibroblast TRN. This influential-set explained 62.5% ofthe total number of regulatory edges and they collectivelytargeted 15 out of 18 TFs represented in the network. Upona closer examination of the fibroblastic network, the inter-actions among OSR1–PRRX1–TWIST2 were notably co-ordinated, regulating one another in both directions https://www.researchgate.net/publication/264633205_A_transient_disruption_of_fibroblastic_transcriptional_regulatory_network_facilitates_trans-differentiation

FGF receptors are important in PCa initiation and progression. These endocrine FGFs require alpha-Klotho (KL) and/or beta-Klotho (KLB), two related single-pass transmembrane proteins restricted in their tissue distribution. FGF19 is expressed in primary and metastatic PCa tissues where it functions as an autocrine growth factor. https://www.e-cancer.fr/Professionnels-de-sante/Veille-bibliographique/Nota-Bene-Cancer/NBC-174/The-endocrine-fibroblast-growth-factor-FGF19-promotes-prostate-cancer-progression

• Prostate cancer stroma contains diverse populations of fibroblasts. • Carcinoma associated fibroblasts can promote prostate carcinogenesis. • Interaction among fibroblast subsets modulate stromal-epithelial paracrine signaling. The stroma in the normal prostate is predominantly composed of smooth muscle cells while in cancer this is partially replaced by fibroblasts and myofibroblasts6. These fibroblastic cells are responsible for collagen and extracellular matrix (ECM) deposition, changing local organ stiffness

iCAF, myCAF, apCAF

It has been shown that SDF-1/CXCL12, secreted by CAF, acts via TGFβ–mediated upregulation of CXCR4 in epithelial cells to activate Akt signaling associated with tumorigenesis. Overexpression of SDF-1/CXCL12 and TGFβ1 in benign human prostate fibroblasts has been shown to induce malignant transformation of benign prostate epithelial cells and formation of highly invasive tumors in vivo

increased cadherin 2 (CDH2) mRNA expression in LNCaP cells

Upregulation of DNA methyltransferases (DNMT) consistent with aberrant promoter hypermethylation of tumor-suppressor genes in multiple cancer types including PCa has been reported

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8788937/#:~:text=The%20stroma%20in%20the%20normal,deposition%2C%20changing%20local%20organ%20stiffness.

CD10 and GPR77 can define a human CAF subset that sustains cancer stemness

Fibroblasts are also fundamental for inducing angiogenesis by secreting angiogenic factors such as vascular endothelial growth factor (VEGF) and matrix proteins Epithelial cells release TGFβ ligands, Kallikrein-related peptidase-4 (KLK4), CAFs are characterized by molecular markers that are upregulated compared to normal fibroblasts, such as fibroblast activation protein (FAP), PDGFR-β, fibroblast-specific protein-1 (FSP-1) (also known as S100A4)

ACTA2 α-smooth actin (α-SMA) Upregulated RT-qPCR, IHC on tissue microarrays [42,43] ASPN Asporin Upregulated Tag-based RNA profiling, microarray profiling, IHC, RT-qPCR [38,44,45] CAV1 Caveolin-1 Downregulated Tag profiling, IHC, RT-qPCR [38] COL1A1 Collagen Type-I Upregulated RT-qPCR, IHC [42,43] CXCL12 Stromal cell-derived factor 1 (SDF1)/ (C-X-C motif chemokine ligand 12 (CXCL12) Upregulated Tag-profiling, RT-qPCR, ELISA [38,46] FAP Fibroblast activation protein Upregulated IHC [42] FGF2, FGF7, FGF10 Fibroblast growth factor-2/-7/-10 Upregulated RT-qPCR, Western Blot, IHC [43,47,48] FN1 Fibronectin Upregulated Tag profiling, IHC, RT-qPCR [38] ITGA1 Integrin-α1 (CD49a) Upregulated IHC [49] OGN Osteoglycin Upregulated Tag-profiling, IHC, RT-qPCR [38] PDGFRB Platelet-derived growth factor receptor β Upregulated Microarray profiling [44,50] POSTN Periostin Upregulated Microarray profiling, IHC [44] S100A4 Fibroblast-specific protein 1 (FSP1)/S100 Calcium Binding Protein A4 (S100A4) Upregulated Immunofluorescence, RT-qPCR, Western Blot [51,52,53] S100A6 S100 Calcium Binding Protein A6 Downregulated Tag profiling, IHC, RT-qPCR [38] SPARC Secreted Protein Acidic and Cysteine Rich Up/downregulated Microarray profiling, Tag-profiling [38,44] STC1 Stanniocalcin 1 Downregulated Tag profiling, IHC, RT-qPCR [38] THY1 Cluster of differentiation 90 (CD90) antigen Upregulated IHC [54] TNC Tenascin C Upregulated RT-qPCR, IHC [42,43] VIM Vimentin Upregulated IHC

Benign prostatic hyperplasia (BPH) is a non-malignant growth of the prostate, typically occurring in older men with an occurrence of 80–90% of men in their 70s BPH develops in the transition zone differently from PCa foci, which usually develop in the peripheral zone

ESR1= . Here we studied the role of CAF estrogen receptor alpha (ERα) and found that it could protect against PCa invasion. ERα could function through a CAF–epithelial interaction via selectively upregulating thrombospondin 2 (Thbs2) and downregulating matrix metalloproteinase 3 (MMP3) at the protein and messenger RNA levels https://academic.oup.com/carcin/article/35/6/1301/449417

ACTA2: we found that “reactive CAFs” induce shear resistance to prostate tumor cells via intercellular contact and soluble derived factors. The reactive CAFs showed higher expression of α-smooth muscle actin (α-SMA) and fibroblast activation protein (FAP) compared to differentiated CAFs https://www.oncotarget.com/article/27510/

Proteins that exhibited a significant increase in CAF versus NPF were enriched for the functional categories “cell adhesion” and the “extracellular matrix.” The CAF phosphoproteome exhibited enhanced phosphorylation of proteins associated with the “spliceosome” and “actin binding.” COL1A1/2 and COL5A1; the receptor tyrosine kinase discoidin domain-containing receptor 2 (DDR2), a receptor for fibrillar collagens; and lysyl oxidase-like 2 (LOXL2) https://www.mcponline.org/article/S1535-9476(20)31548-6/fulltext

•Cancer-associated fibroblasts (CAFs) exhibit a highly aligned cytoskeleton and extracellular matrix. •CAFs are stiffer than normal fibroblasts from the prostate. •Benign prostate epithelial cells are more compliant after co-culture with CAFs. •Benign epithelial cells are more invasive and proliferative in presence of CAFs. https://www.sciencedirect.com/science/article/pii/S2590006420300338

These findings suggest that CAFs exosomes drive PCa metastasis via the miR-500a-3p/FBXW7/HSF1 axis in a hypoxic microenvironmen https://www.nature.com/articles/s41417-024-00742-2

https://www.genecards.org/cgi-bin/carddisp.pl?gene=CCT8L2 CCT8L2 (Chaperonin Containing TCP1 Subunit 8 Like 2) is a Protein Coding gene. Gene Ontology (GO) annotations related to this gene include calcium-activated potassium channel activity and monoatomic anion channel activity. An important paralog of this gene is CCT8. related to the meta transmembrane activity via

Possible molecular chaperone; assists the folding of proteins upon ATP hydrolysis. The TRiC complex mediates the folding of WRAP53/TCAB1, thereby regulating telomere maintenance (PubMed:25467444). As part of the TRiC complex may play a role in the assembly of BBSome, a complex involved in ciliogenesis regulating transports vesicles to the cilia

https://www.genecards.org/cgi-bin/carddisp.pl?gene=TIMP1 TIMP Metallopeptidase Inhibitor 1 inhibitors of the matrix metalloproteinases (MMPs), a group of peptidases involved in degradation of the extracellular matrix. In addition to its inhibitory role against most of the known MMPs, the encoded protein is able to promote cell proliferation

connected to TLL2: zinc-dependent metalloprotease part of the metzincin protein complex which also is Among its related pathways are Collagen chain trimerization and Extracellular matrix organization.

PDXP, SEPTIN5 cytoskeletal organization and regulation

PRAME inhibits the signaling of retinoic acid and mediates the regulation of selenoproteins

many genes involved in cancer. indeed, it is known that 80% adult male above 70 years old will have such hyperplastic prostate. This is due to scenesence. and that Iron accumulation drives fibrosis, senescence and the senescence-associated secretory phenotype. (of which we have some proteins present here.) A mechanism that this clustered is one of genes that strongly regulates each other in this fibroblasts from pre-cancerous lesions via fibrotic iron accumulation and a secretory mechanism GJC1 (exchange of things between cells (here likely metals?))

Unknown genes like SMIM40. integral membrane protein could be linked to this phenotype