biopipen.ns.scrna
Tools to analyze single-cell RNA
SeuratLoading
(Proc) — Seurat - Loading data</>SeuratPreparing
(Proc) — Load, prepare and apply QC to data, usingSeurat
</>SeuratClustering
(Proc) — Determine the clusters of cells without reference using Seurat FindClustersprocedure. </>SeuratSubClustering
(Proc) — Find clusters of a subset of cells.</>SeuratClusterStats
(Proc) — Statistics of the clustering.</>ModuleScoreCalculator
(Proc) — Calculate the module scores for each cell</>CellsDistribution
(Proc) — Distribution of cells (i.e. in a TCR clone) from different groupsfor each cluster </>SeuratMetadataMutater
(Proc) — Mutate the metadata of the seurat object</>DimPlots
(Proc) — Seurat - Dimensional reduction plots</>MarkersFinder
(Proc) — Find markers between different groups of cells</>TopExpressingGenes
(Proc) — Find the top expressing genes in each cluster</>ExprImputation
(Proc) — This process imputes the dropout values in scRNA-seq data.</>SCImpute
(Proc) — Impute the dropout values in scRNA-seq data.</>SeuratFilter
(Proc) — Filtering cells from a seurat object</>SeuratSubset
(Proc) — Subset a seurat object into multiple seruat objects</>SeuratSplit
(Proc) — Split a seurat object into multiple seruat objects</>Subset10X
(Proc) — Subset 10X data, mostly used for testing</>SeuratTo10X
(Proc) — Write a Seurat object to 10X format</>ScFGSEA
(Proc) — Gene set enrichment analysis for cells in different groups usingfgsea
</>CellTypeAnnotation
(Proc) — Annotate the cell clusters. Currently, four ways are supported:</>SeuratMap2Ref
(Proc) — Map the seurat object to reference</>RadarPlots
(Proc) — Radar plots for cell proportion in different clusters.</>MetaMarkers
(Proc) — Find markers between three or more groups of cells, using one-way ANOVAor Kruskal-Wallis test. </>Seurat2AnnData
(Proc) — Convert seurat object to AnnData</>AnnData2Seurat
(Proc) — Convert AnnData to seurat object</>ScSimulation
(Proc) — Simulate single-cell data using splatter.</>CellCellCommunication
(Proc) — Cell-cell communication inference</>CellCellCommunicationPlots
(Proc) — Visualization for cell-cell communication inference.</>
biopipen.ns.scrna.
SeuratLoading
(
*args
, **kwds
)
→ Proc
Seurat - Loading data
Deprecated, should be superseded by SeuratPreparing
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
metafile
— The metadata of the samplesA tab-delimited file Two columns are required:- -
Sample
to specify the sample names. - -
RNAData
to assign the path of the data to the samples
The path will be read byRead10X()
fromSeurat
- -
rdsfile
— The RDS file with a list of Seurat object
qc
— The QC filter for each sample.This will be passed tosubset(obj, subset=<qc>)
. For examplenFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratPreparing
(
*args
, **kwds
)
→ Proc
Load, prepare and apply QC to data, using Seurat
This process will -
- - Prepare the seurat object
- - Apply QC to the data
- - Integrate the data from different samples
See also
- - https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#standard-pre-processing-workflow-1)
- - https://satijalab.org/seurat/articles/integration_introduction
This process will read the scRNA-seq data, based on the information provided bySampleInfo
, specifically, the paths specified by the RNAData
column.
Those paths should be either paths to directoies containing matrix.mtx
,
barcodes.tsv
and features.tsv
files that can be loaded by
Seurat::Read10X()
,
or paths to h5
files that can be loaded by
Seurat::Read10X_h5()
.
Each sample will be loaded individually and then merged into one Seurat
object, and then perform QC.
In order to perform QC, some additional columns are added to the meta data of the Seurat
object. They are:
precent.mt
: The percentage of mitochondrial genes.percent.ribo
: The percentage of ribosomal genes.precent.hb
: The percentage of hemoglobin genes.percent.plat
: The percentage of platelet genes.
For integration, two routes are available:
- Performing integration on datasets normalized with
SCTransform
- Using
NormalizeData
andFindIntegrationAnchors
/// Note
When using SCTransform
, the default Assay will be set to SCT
in output, rather than RNA
.
If you are using cca
or rpca
interation, the default assay will be integrated
.
///
/// Note
From biopipen
v0.23.0, this requires Seurat
v5.0.0 or higher.
///
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
metafile
— The metadata of the samplesA tab-delimited file Two columns are required:Sample
to specify the sample names.RNAData
to assign the path of the data to the samples The path will be read byRead10X()
fromSeurat
, or the path to the h5 file that can be read byRead10X_h5()
fromSeurat
.
rdsfile
— The RDS file with the Seurat object with all samples integrated.Note that the cell ids are prefixied with sample names. QC plots will be saved in<job.outdir>/plots
.
DoubletFinder
(ns) — Arguments to runDoubletFinder
.See also https://demultiplexing-doublet-detecting-docs.readthedocs.io/en/latest/DoubletFinder.html.- - PCs (type=int): Number of PCs to use for 'doubletFinder' function.
- - doublets (type=float): Number of expected doublets as a proportion of the pool size.
- - pN (type=float): Number of doublets to simulate as a proportion of the pool size.
- - ncores (type=int): Number of cores to use for
DoubletFinder::paramSweep
.
Set toNone
to useenvs.ncores
.
Since parallelization of the function usually exhausts memory, if bigenvs.ncores
does not work
forDoubletFinder
, set this to a smaller number.
FindVariableFeatures
(ns) — Arguments forFindVariableFeatures()
.object
is specified internally, and-
in the key will be replaced with.
.IntegrateLayers
(ns) — Arguments forIntegrateLayers()
.object
is specified internally, and-
in the key will be replaced with.
. Whenuse_sct
isTrue
,normalization-method
defaults toSCT
.- - method (choice): The method to use for integration.
- CCAIntegration: UseSeurat::CCAIntegration
.
- CCA: Same asCCAIntegration
.
- cca: Same asCCAIntegration
.
- RPCAIntegration: UseSeurat::RPCAIntegration
.
- RPCA: Same asRPCAIntegration
.
- rpca: Same asRPCAIntegration
.
- HarmonyIntegration: UseSeurat::HarmonyIntegration
.
- Harmony: Same asHarmonyIntegration
.
- harmony: Same asHarmonyIntegration
.
- FastMNNIntegration: UseSeurat::FastMNNIntegration
.
- FastMNN: Same asFastMNNIntegration
.
- fastmnn: Same asFastMNNIntegration
.
- scVIIntegration: UseSeurat::scVIIntegration
.
- scVI: Same asscVIIntegration
.
- scvi: Same asscVIIntegration
. - -
: See https://satijalab.org/seurat/reference/integratelayers
- - method (choice): The method to use for integration.
NormalizeData
(ns) — Arguments forNormalizeData()
.object
is specified internally, and-
in the key will be replaced with.
.RunPCA
(ns) — Arguments forRunPCA()
.object
andfeatures
is specified internally, and-
in the key will be replaced with.
.- - npcs (type=int): The number of PCs to compute.
For each sample,npcs
will be no larger than the number of columns - 1. - -
: See https://satijalab.org/seurat/reference/runpca
- - npcs (type=int): The number of PCs to compute.
SCTransform
(ns) — Arguments forSCTransform()
.object
is specified internally, and-
in the key will be replaced with.
.- -
return-only-var-genes
: Whether to return only variable genes. - -
min_cells
: The minimum number of cells that a gene must be expressed in to be kept.
A hidden argument ofSCTransform
to filter genes.
If you try to keep all genes in theRNA
assay, you can setmin_cells
to0
and
return-only-var-genes
toFalse
.
See https://github.com/satijalab/seurat/issues/3598#issuecomment-715505537 - -
: See https://satijalab.org/seurat/reference/sctransform
- -
ScaleData
(ns) — Arguments forScaleData()
.object
andfeatures
is specified internally, and-
in the key will be replaced with.
.cache
(type=auto) — Whether to cache the information at different steps.IfTrue
, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning. The cached seurat object will be saved as<signature>.<kind>.RDS
file, where<signature>
is the signature determined by the input and envs of the process. See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues. To not use the cached seurat object, you can either setcache
toFalse
or delete the cached file at<signature>.RDS
in the cache directory.cell_qc
— Filter expression to filter cells, usingtidyrseurat::filter()
. Available QC keys includenFeature_RNA
,nCount_RNA
,percent.mt
,percent.ribo
,percent.hb
, andpercent.plat
.
/// Tip | Example Including the columns added above, all available QC keys includenFeature_RNA
,nCount_RNA
,percent.mt
,percent.ribo
,percent.hb
, andpercent.plat
. For example:
will keep cells with more than 200 genes and less than 5%% mitochondrial genes. ///[SeuratPreparing.envs] cell_qc = "nFeature_RNA > 200 & percent.mt < 5"
cell_qc_per_sample
(flag) — Whether to perform cell QC per sample or not.IfTrue
, the cell QC will be performed per sample, and the QC will be applied to each sample before merging.doublet_detector
(choice) — The doublet detector to use.- - none: Do not use any doublet detector.
- - DoubletFinder: Use
DoubletFinder
to detect doublets. - - doubletfinder: Same as
DoubletFinder
. - - scDblFinder: Use
scDblFinder
to detect doublets. - - scdblfinder: Same as
scDblFinder
.
gene_qc
(ns) — Filter genes.gene_qc
is applied aftercell_qc
.- - min_cells: The minimum number of cells that a gene must be
expressed in to be kept. - - excludes: The genes to exclude. Multiple genes can be specified by
comma separated values, or as a list.
will keep genes that are expressed in at least 3 cells. ///[SeuratPreparing.envs] gene_qc = { min_cells = 3 }
- - min_cells: The minimum number of cells that a gene must be
ncores
(type=int) — Number of cores to use.Used infuture::plan(strategy = "multicore", workers = <ncores>)
to parallelize some Seurat procedures.no_integration
(flag) — Whether to skip integration or not.scDblFinder
(ns) — Arguments to runscDblFinder
.- - dbr (type=float): The expected doublet rate.
- - ncores (type=int): Number of cores to use for
scDblFinder
.
Set toNone
to useenvs.ncores
. - -
: See https://rdrr.io/bioc/scDblFinder/man/scDblFinder.html.
use_sct
(flag) — Whether use SCTransform routine to integrate samples or not.Before the following procedures, theRNA
layer will be split by samples.
IfFalse
, following procedures will be performed in the order: See https://satijalab.org/seurat/articles/seurat5_integration#layers-in-the-seurat-v5-objectand https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
IfTrue
, following procedures will be performed in the order:- *
SCTransform
.
- *
r-bracer
—- check: {{proc.lang}} <(echo "library(bracer)")
r-future
—- check: {{proc.lang}} <(echo "library(future)")
r-seurat
—- check: {{proc.lang}} <(echo "library(Seurat)")
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratClustering
(
*args
, **kwds
)
→ Proc
Determine the clusters of cells without reference using Seurat FindClustersprocedure.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object loaded by SeuratPreparing
rdsfile
— The seurat object with cluster information atseurat_clusters
IfSCTransform
was used, the default Assay will be reset toRNA
.
FindClusters
(ns) — Arguments forFindClusters()
.object
is specified internally, and-
in the key will be replaced with.
. The cluster labels will be saved inseurat_clusters
and prefixed with "c". The first cluster will be "c1", instead of "c0".- - resolution (type=auto): The resolution of the clustering. You can have multiple resolutions as a list or as a string separated by comma.
Ranges are also supported, for example:0.1:0.5:0.1
will generate0.1, 0.2, 0.3, 0.4, 0.5
. The step can be omitted, defaulting to 0.1.
The results will be saved inseurat_clusters_<resolution>
.
The final resolution will be used to define the clusters atseurat_clusters
. - -
: See https://satijalab.org/seurat/reference/findclusters
- - resolution (type=auto): The resolution of the clustering. You can have multiple resolutions as a list or as a string separated by comma.
FindNeighbors
(ns) — Arguments forFindNeighbors()
.object
is specified internally, and-
in the key will be replaced with.
.- - reduction: The reduction to use.
If not provided,sobj@misc$integrated_new_reduction
will be used. - -
: See https://satijalab.org/seurat/reference/findneighbors
- - reduction: The reduction to use.
RunUMAP
(ns) — Arguments forRunUMAP()
.object
is specified internally, and-
in the key will be replaced with.
.dims=N
will be expanded todims=1:N
; The maximal value ofN
will be the minimum ofN
and the number of columns - 1 for each sample.- - dims (type=int): The number of PCs to use
- - reduction: The reduction to use for UMAP.
If not provided,sobj@misc$integrated_new_reduction
will be used. - -
: See https://satijalab.org/seurat/reference/runumap
SCTransform
(ns) — Arguments forSCTransform()
.If you want to re-scale the data by regressing to some variables,Seurat::SCTransform
will be called. If nothing is specified,Seurat::SCTransform
will not be called.- - vars-to-regress: The variables to regress on.
- -
: See https://satijalab.org/seurat/reference/sctransform
ScaleData
(ns) — Arguments forScaleData()
.If you want to re-scale the data by regressing to some variables,Seurat::ScaleData
will be called. If nothing is specified,Seurat::ScaleData
will not be called.- - vars-to-regress: The variables to regress on.
- -
: See https://satijalab.org/seurat/reference/scaledata
cache
(type=auto) — Whether to cache the information at different steps.IfTrue
, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning. The cached seurat object will be saved as<signature>.<kind>.RDS
file, where<signature>
is the signature determined by the input and envs of the process. See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues. To not use the cached seurat object, you can either setcache
toFalse
or delete the cached file at<signature>.RDS
in the cache directory.ncores
(type=int;order=-100) — Number of cores to use.Used infuture::plan(strategy = "multicore", workers = <ncores>)
to parallelize some Seurat procedures. See also: https://satijalab.org/seurat/articles/future_vignette.html
r-dplyr
—- check: {{proc.lang}} <(echo "library(dplyr)")
r-seurat
—- check: {{proc.lang}} <(echo "library(Seurat)")
r-tidyr
—- check: {{proc.lang}} <(echo "library(tidyr)")
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratSubClustering
(
*args
, **kwds
)
→ Proc
Find clusters of a subset of cells.
It's unlike [Seurat::FindSubCluster
], which only finds subclusters of a single
cluster. Instead, it will perform the whole clustering procedure on the subset of
cells. One can use metadata to specify the subset of cells to perform clustering on.
For the subset of cells, the reductions will be re-performed on the subset of cells,
and then the clustering will be performed on the subset of cells. The reduction
will be saved in sobj@reduction$sub_umap_<casename>
of the original object and the
clustering will be saved in the metadata of the original object using the casename as the column name.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object
rdsfile
— The seurat object with the subclustering information.
FindClusters
(ns) — Arguments forFindClusters()
.object
is specified internally, and-
in the key will be replaced with.
. The cluster labels will be prefixed with "s". The first cluster will be "s1", instead of "s0".- - resolution (type=auto): The resolution of the clustering. You can have multiple resolutions as a list or as a string separated by comma.
Ranges are also supported, for example:0.1:0.5:0.1
will generate0.1, 0.2, 0.3, 0.4, 0.5
. The step can be omitted, defaulting to 0.1.
The results will be saved in<casename>_<resolution>
.
The final resolution will be used to define the clusters at<casename>
. - -
: See https://satijalab.org/seurat/reference/findclusters
- - resolution (type=auto): The resolution of the clustering. You can have multiple resolutions as a list or as a string separated by comma.
FindNeighbors
(ns) — Arguments forFindNeighbors()
.object
is specified internally, and-
in the key will be replaced with.
.- - reduction: The reduction to use.
If not provided,sobj@misc$integrated_new_reduction
will be used. - -
: See https://satijalab.org/seurat/reference/findneighbors
- - reduction: The reduction to use.
RunUMAP
(ns) — Arguments forRunUMAP()
.object
is specified internally as the subset object, and-
in the key will be replaced with.
.dims=N
will be expanded todims=1:N
; The maximal value ofN
will be the minimum ofN
and the number of columns - 1 for each sample.- - dims (type=int): The number of PCs to use
- - reduction: The reduction to use for UMAP.
If not provided,sobj@misc$integrated_new_reduction
will be used. - -
: See https://satijalab.org/seurat/reference/runumap
cache
(type=auto) — Whether to cache the information at different steps.IfTrue
, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning. The cached seurat object will be saved as<signature>.<kind>.RDS
file, where<signature>
is the signature determined by the input and envs of the process. See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues. To not use the cached seurat object, you can either setcache
toFalse
or delete the cached file at<signature>.RDS
in the cache directory.cases
(type=json) — The cases to perform subclustering.Keys are the names of the cases and values are the dicts inherited fromenvs
exceptmutaters
andcache
. If empty, a case with namesubcluster
will be created with default parameters.mutaters
(type=json) — The mutaters to mutate the metadata to subset the cells.The mutaters will be applied in the order specified.ncores
(type=int;order=-100) — Number of cores to use.Used infuture::plan(strategy = "multicore", workers = <ncores>)
to parallelize some Seurat procedures.subset
— An expression to subset the cells, will be passed totidyseurat::filter()
.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratClusterStats
(
*args
, **kwds
)
→ Proc
Statistics of the clustering.
Including the number/fraction of cells in each cluster, the gene expression values and dimension reduction plots. It's also possible to perform stats on TCR clones/clusters or other metadata for each T-cell cluster.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
Number of cells in each cluster
[SeuratClusterStats.envs.stats]
# suppose you have nothing set in `envs.stats_defaults`
# otherwise, the settings will be inherited here
nCells_All = { }
{: width="80%" }
Number of cells in each cluster by groups
[SeuratClusterStats.envs.stats]
nCells_Sample = { group-by = "Sample" }
{: width="80%" }
Violin plots for the gene expressions
[SeuratClusterStats.envs.features]
features = "CD4,CD8A"
# Remove the dots in the violin plots
vlnplots = { pt-size = 0, kind = "vln" }
# Don't use the default genes
vlnplots_1 = { features = ["FOXP3", "IL2RA"], pt-size = 0, kind = "vln" }
{: width="80%" } {: width="80%" }
Dimension reduction plot with labels
[SeuratClusterStats.envs.dimplots.Idents]
label = true
label-box = true
repel = true
{: width="80%" }
srtobj
— The seurat object loaded bySeuratClustering
outdir
— The output directory.Different types of plots will be saved in different subdirectories. For example,clustree
plots will be saved inclustrees
subdirectory. For each case inenvs.clustrees
, both the png and pdf files will be saved.
clustrees
(type=json) — The cases for clustree plots.Keys are the names of the plots and values are the dicts inherited fromenv.clustrees_defaults
exceptprefix
. There is no default case forclustrees
.clustrees_defaults
(ns) — The parameters for the clustree plots.- - devpars (ns): The device parameters for the clustree plot.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots. - - prefix: string indicating columns containing clustering information.
The trailing dot is not necessary and will be added automatically.
When_auto
, clustrees will be plotted when there isFindClusters
or
FindClusters.*
in theobj@commands
.
The latter is generated bySeuratSubClustering
.
This will be ignored whenenvs.clustrees
is specified. - -
: Other arguments passed to clustree::clustree()
.
See https://rdrr.io/cran/clustree/man/clustree.html
- - devpars (ns): The device parameters for the clustree plot.
dimplots
(type=json) — The dimensional reduction plots.Keys are the titles of the plots and values are the dicts inherited fromenv.dimplots_defaults
. It can also have other parameters fromSeurat::DimPlot
.dimplots_defaults
(ns) — The default parameters fordimplots
.- - use (choice): The function from which package to use for the plot.
- seurat: UseSeurat::DimPlot
.
- scp: UseSCP::CellDimPlot
.
- scp3d: UseSCP::CellDimPlot3D
.
- Seurat: Same asseurat
.
- SCP: Same asscp
.
- SCP3D: Same asscp3d
. - - ident: The identity to use.
If it is from subclustering (reductionsub_umap_<ident>
exists), this reduction will be used ifreduction
is set todim
orauto
. - - group-by: Same as
ident
if not specified, to define how the points are colored. - - na_group: The group name for NA values, use
None
to ignore NA values.
Whenuse
isscp
, any non-None values will be translated asshow_na = True
forSCP::CellDimPlot
.
You can useshow_na
directly forSCP::CellDimPlot
. This option is ignored whenuse
isscp3d
. - - split-by: The column name in metadata to split the cells into different plots.
Not supported forscp3d
. - - shape-by: The column name in metadata to use as the shape.
Ignored ifuse
isscp
orscp3d
. - - subset: An expression to subset the cells, will be passed to
tidyrseurat::filter()
. - - devpars (ns): The device parameters for the plots.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots. - - reduction (choice): Which dimensionality reduction to use.
- dim: UseSeurat::DimPlot
.
First searches forumap
, thentsne
, thenpca
.
Ifident
is from subclustering,sub_umap_<ident>
will be used.
- auto: Same asdim
- umap: UseSeurat::UMAPPlot
.
- tsne: UseSeurat::TSNEPlot
.
- pca: UseSeurat::PCAPlot
. - - theme_use: The theme to use for the plot.
- -
: See https://satijalab.org/seurat/reference/dimplot
- - use (choice): The function from which package to use for the plot.
features
(type=json) — The plots for features, include gene expressions, and columns from metadata.Keys are the titles of the cases and values are the dicts inherited fromenv.features_defaults
. It can also have other parameters from each Seurat function used bykind
. Note that for argument name with.
, you should use-
instead.features_defaults
(ns) — The default parameters forfeatures
.- - features: The features to plot.
It can be either a string with comma separated features, a list of features, a file path withfile://
prefix with features
(one per line), or an integer to use the top N features fromVariantFeatures(srtobj)
. - - ident: The column name in metadata to use as the identity.
If it is from subclustering (reductionsub_umap_<ident>
exists), the reduction will be used. - - cluster_orderby (type=auto): The order of the clusters to show on the plot.
An expression passed todplyr::summarise()
on the grouped data frame (byseurat_clusters
).
The summary stat will be passed todplyr::arrange()
to order the clusters. It's applied on the whole meta.data before grouping and subsetting.
For example, you can order the clusters by the activation score of
the cluster:desc(mean(ActivationScore, na.rm = TRUE))
, suppose you have a column
ActivationScore
in the metadata.
You may also specify the literal order of the clusters by a list of strings. - - subset: An expression to subset the cells, will be passed to
tidyrseurat::filter()
. - - devpars (ns): The device parameters for the plots. Does not work for
table
.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots. - - plus: The extra elements to add to the
ggplot
object. Does not work fortable
. - - group-by: Group cells in different ways (for example, orig.ident). Works for
ridge
,vln
, anddot
.
It also works forfeature
asshape.by
being passed toSeurat::FeaturePlot
. - - split-by: The column name in metadata to split the cells into different plots.
It works forvln
,feature
, anddot
. - - assay: The assay to use.
- - layer: The layer to use.
- - reduction: The reduction to use. Only works for
feature
. - - section: The section to put the plot in the report.
If not specified, the case title will be used. - - ncol (type=int): The number of columns for the plots.
- - kind (choice): The kind of the plot or table.
- ridge: UseSeurat::RidgePlot
.
- ridgeplot: Same asridge
.
- vln: UseSeurat::VlnPlot
.
- vlnplot: Same asvln
.
- violin: Same asvln
.
- violinplot: Same asvln
.
- feature: UseSeurat::FeaturePlot
.
- featureplot: Same asfeature
.
- dot: UseSeurat::DotPlot
.
- dotplot: Same asdot
.
- bar: Bar plot on an aggregated feature.
The features must be a single feature, which will be either an existing feature or an expression
passed todplyr::summarise()
(grouped byident
) on the existing features to create a new feature.
- barplot: Same asbar
.
- heatmap: UseSeurat::DoHeatmap
.
- avgheatmap: Plot the average expression of the features in each cluster as a heatmap.
- table: The table for the features, only gene expressions are supported.
(supported keys: ident, subset, and features).
- - features: The features to plot.
hists
(type=json) — The cases for histograms.Keys are the names of the plots and values are the dicts inherited fromenv.hists_defaults
. There is no default case.hists_defaults
(ns) — The default parameters for histograms.This will plot histograms for the number of cells alongx
. For example, you can plot the number of cells along cell activity score.- - x: The column name in metadata to plot as the x-axis.
The NA values will be removed.
It could be either numeric or factor/character. - - x_order (list): The order of the x-axis, only works for factor/character
x
.
You can also use it to subsetx
(showing only a subset values ofx
). - - cells_by: A column name in metadata to group the cells.
The NA values will be removed. It should be a factor/character.
if not specified, all cells will be used. - - cells_order (list): The order of the cell groups for the plots.
It should be a list of strings. You can also usecells_orderby
andcells_n
to determine the order. - - cells_orderby: An expression passed to
dplyr::arrange()
to order the cell groups. - - cells_n: The number of cell groups to show.
Ignored ifcells_order
is specified. - - ncol (type=int): The number of columns for the plots, split by
cells_by
. - - subset: An expression to subset the cells, will be passed to
dplyr::filter()
. - - each: Whether to plot each group separately.
- - bins: The number of bins to use, only works for numeric
x
. - - plus (list): The extra elements to add to the
ggplot
object. - - devpars (ns): The device parameters for the plots.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots.
- - x: The column name in metadata to plot as the x-axis.
mutaters
(type=json) — The mutaters to mutate the metadata to subset the cells.The mutaters will be applied in the order specified.ngenes
(type=json) — The number of genes expressed in each cell.Keys are the names of the plots and values are the dicts inherited fromenv.ngenes_defaults
.ngenes_defaults
(ns) — The default parameters forngenes
.The default parameters to plot the number of genes expressed in each cell.- - ident: The column name in metadata to use as the identity.
- - group-by: The column name in metadata to group the cells.
Dodge position will be used to separate the groups. - - split-by: The column name in metadata to split the cells into different plots.
- - subset: An expression to subset the cells, will be passed to
tidyrseurat::filter()
. - - devpars (ns): The device parameters for the plots.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots.
stats
(type=json) — The number/fraction of cells to plot.Keys are the names of the plots and values are the dicts inherited fromenv.stats_defaults
. Here are some examples -
{ "nCells_All": {}, "nCells_Sample": {"group-by": "Sample"}, "fracCells_Sample": {"frac": True, "group-by": "Sample"}, }
stats_defaults
(ns) — The default parameters forstats
.This is to do some basic statistics on the clusters. For more comprehensive analysis, seeRadarPlots
andCellsDistribution
. The parameters from the cases can overwrite the default parameters.- - frac (choice): How to calculate the fraction of cells.
- group: calculate the fraction in each group.
The total fraction of the cells of idents in each group will be 1.
Whengroup-by
is not specified, it will be the same asall
.
- ident: calculate the fraction in each ident.
The total fraction of the cells of groups in each ident will be 1.
Only works whengroup-by
is specified.
- cluster: alias ofident
.
- all: calculate the fraction against all cells.
- none: do not calculate the fraction, use the number of cells instead. - - pie (flag): Also output a pie chart?
- - circos (flag): Also output a circos plot?
- - table (flag): Whether to output a table (in tab-delimited format) and in the report.
- - transpose (flag): Whether to transpose the cluster and group, that is,
using group as the x-axis and cluster to fill the plot.
For circos plot, when transposed, the arrows will be drawn from the idents (byident
) to the
the groups (bygroup-by
).
Only works whengroup-by
is specified. - - position (choice): The position of the bars. Does not work for pie and circos plots.
- stack: Useposition_stack()
.
- fill: Useposition_fill()
.
- dodge: Useposition_dodge()
.
- auto: Usestack
when there are more than 5 groups, otherwise usedodge
. - - ident: The column name in metadata to use as the identity.
- - group-by: The column name in metadata to group the cells.
Does NOT support for pie charts. - - split-by: The column name in metadata to split the cells into different plots.
Does NOT support for circos plots. - - subset: An expression to subset the cells, will be passed to
dplyr::filter()
on metadata. - - circos_labels_rot (flag): Whether to rotate the labels in the circos plot.
In case the labels are too long. - - circos_devpars (ns): The device parameters for the circos plots.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots. - - pie_devpars (ns): The device parameters for the pie charts.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots. - - devpars (ns): The device parameters for the plots.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots.
- - frac (choice): How to calculate the fraction of cells.
r-seurat
—- check: {{proc.lang}} -e "library(Seurat)"
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
ModuleScoreCalculator
(
*args
, **kwds
)
→ Proc
Calculate the module scores for each cell
The module scores are calculated by
Seurat::AddModuleScore()
or Seurat::CellCycleScoring()
for cell cycle scores.
The module scores are calculated as the average expression levels of each program on single cell level, subtracted by the aggregated expression of control feature sets. All analyzed features are binned based on averaged expression, and the control features are randomly selected from each bin.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object loaded bySeuratClustering
rdsfile
— The seurat object with module scores added to the metadata.
defaults
(ns) — The default parameters formodules
.- - features: The features to calculate the scores. Multiple features
should be separated by comma.
You can also specifycc.genes
orcc.genes.updated.2019
to
use the cell cycle genes to calculate cell cycle scores.
If so, three columns will be added to the metadata, including
S.Score
,G2M.Score
andPhase
.
Only one type of cell cycle scores can be calculated at a time. - - nbin (type=int): Number of bins of aggregate expression levels
for all analyzed features. - - ctrl (type=int): Number of control features selected from
the same bin per analyzed feature. - - k (flag): Use feature clusters returned from
DoKMeans
. - - assay: The assay to use.
- - seed (type=int): Set a random seed.
- - search (flag): Search for symbol synonyms for features in
features that don't match features in object? - - keep (flag): Keep the scores for each feature?
Only works for non-cell cycle scores. - - agg (choice): The aggregation function to use.
Only works for non-cell cycle scores.
- mean: The mean of the expression levels
- median: The median of the expression levels
- sum: The sum of the expression levels
- max: The max of the expression levels
- min: The min of the expression levels
- var: The variance of the expression levels
- sd: The standard deviation of the expression levels
- - features: The features to calculate the scores. Multiple features
modules
(type=json) — The modules to calculate the scores.Keys are the names of the expression programs and values are the dicts inherited fromenv.defaults
. Here are some examples -
{ "CellCycle": {"features": "cc.genes.updated.2019"}, "Exhaustion": {"features": "HAVCR2,ENTPD1,LAYN,LAG3"}, "Activation": {"features": "IFNG"}, "Proliferation": {"features": "STMN1,TUBB"} }
CellCycle
, the columnsS.Score
,G2M.Score
andPhase
will be added to the metadata.S.Score
andG2M.Score
are the cell cycle scores for each cell, andPhase
is the cell cycle phase for each cell.
You can also add Diffusion Components (DC) to the modules
{"DC": {"features": 2, "kind": "diffmap"}} will perform diffusion map as a reduction and add the first 2 components as
DC_1
andDC_2
to the metadata.diffmap
is a shortcut fordiffusion_map
. Other key-value pairs will pass todestiny::DiffusionMap()
. You can later plot the diffusion map by usingreduction = "DC"
inenv.dimplots
inSeuratClusterStats
. This requiresSingleCellExperiment
anddestiny
R packages.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
CellsDistribution
(
*args
, **kwds
)
→ Proc
Distribution of cells (i.e. in a TCR clone) from different groupsfor each cluster
This generates a set of pie charts with proportion of cells in each cluster Rows are the cells identities (i.e. TCR clones or TCR clusters), columns are groups (i.e. clinic groups).
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
[CellsDistribution.envs.mutaters]# Add Patient1_Tumor_Expanded column with CDR3.aa that
# expands in Tumor of patient 1
Patient1_Tumor_Expanded = '''
expanded(., region, "Tumor", subset = patient == "Lung1", uniq = FALSE)
'''
[CellsDistribution.envs.cases.Patient1_Tumor_Expanded]
cells_by = "Patient1_Tumor_Expanded"
cells_orderby = "desc(CloneSize)"
group_by = "region"
group_order = [ "Tumor", "Normal" ]
srtobj
— The seurat object in RDS format
outdir
— The output directory.The results for each case will be saved in a subdirectory.
cases
(type=json;order=99) — If you have multiple cases, you can specify them here.Keys are the names of the cases and values are the options above exceptmutaters
. If some options are not specified, the options inenvs
will be used. If no cases are specified, a default case will be used with case nameDEFAULT
.cells_by
— The column name in metadata to group the cells for the rows of the plot.If your cell groups have overlapping cells, you can also use multiple columns, separated by comma (,
). These columns will be concatenated to form the cell groups. For the overlapping cells, they will be counted multiple times for different groups. So make sure the cell group names in different columns are unique.cells_n
(type=int) — The max number of groups to show for each cell group identity (row).Ignored ifcells_order
is specified.cells_order
(list) — The order of the cells (rows) to show on the plotcells_orderby
— An expression passed todplyr::arrange()
to order the cells (rows) of the plot.Only works whencells-order
is not specified. The data frame passed todplyr::arrange()
is grouped bycells_by
before ordering. You can have multiple expressions separated by semicolon (;
). The expessions will be parsed byrlang::parse_exprs()
. 4 extra columns were added to the metadata for ordering the rows in the plot:- *
CloneSize
: The size (number of cells) of clones (identified bycells_by
) - *
CloneGroupSize
: The clone size in each group (identified bygroup_by
) - *
CloneClusterSize
: The clone size in each cluster (identified byseurat_clusters
) - *
CloneGroupClusterSize
: The clone size in each group and cluster (identified bygroup_by
andseurat_clusters
)
- *
cluster_orderby
— The order of the clusters to show on the plot.An expression passed todplyr::summarise()
on the grouped data frame (byseurat_clusters
). The summary stat will be passed todplyr::arrange()
to order the clusters. It's applied on the whole meta.data before grouping and subsetting. For example, you can order the clusters by the activation score of the cluster:desc(mean(ActivationScore, na.rm = TRUE))
, suppose you have a columnActivationScore
in the metadata.descr
— The description of the case, will be shown in the report.devpars
(ns) — The device parameters for the plots of pie charts.- - res (type=int): The resolution of the plots
- - height (type=int): The height of the plots
- - width (type=int): The width of the plots
each
— The column name in metadata to separate the cells into different plots.group_by
— The column name in metadata to group the cells for the columns of the plot.group_order
(list) — The order of the groups (columns) to show on the plothm_devpars
(ns) — The device parameters for the heatmaps.- - res (type=int): The resolution of the heatmaps.
- - height (type=int): The height of the heatmaps.
- - width (type=int): The width of the heatmaps.
mutaters
(type=json) — The mutaters to mutate the metadataKeys are the names of the mutaters and values are the R expressions passed bydplyr::mutate()
to mutate the metadata. There are also also 4 helper functions,expanded
,collapsed
,emerged
andvanished
, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use{"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Tumor_Collapsed_Clones
with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will beNA
. Those functions take following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
group.by
: The column name in metadata to group the cells. - *
idents
: The first group or both groups of cells to compare (value ingroup.by
column). If only the first group is given, the rest of the cells (with non-NA ingroup.by
column) will be used as the second group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
order
: The expression passed todplyr::arrange()
to order intermediate dataframe and get the ids in order accordingly.
The intermediate dataframe includes the following columns:
*<id>
: The ids of clones (i.e.CDR3.aa
).
*<each>
: The values ineach
column.
*ident_1
: The size of clones in the first group.
*ident_2
: The size of clones in the second group.
*.diff
: The difference between the sizes of clones in the first and second groups.
*.sum
: The sum of the sizes of clones in the first and second groups.
*.predicate
: Showing whether the clone is expanded/collapsed/emerged/vanished. - *
include_emerged
: Whether to include the emerged group forexpanded
(only works forexpanded
). Default isFALSE
. - *
include_vanished
: Whether to include the vanished group forcollapsed
(only works forcollapsed
). Default isFALSE
.
top()
to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use{"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Top10_Clones
. The values in this columns for other clones will beNA
. This function takes following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
n
: The number of top clones to return. Default is10
.
If n < 1, it will be treated as the percentage of the size of the group.
Specify0
to get all clones. - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
with_ties
: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default isFALSE
.
- *
overlap
(list) — Plot the overlap of cell groups (values ofcells_by
) in different casesunder the same section. The section must have at least 2 cases, each case should have a singlecells_by
column.prefix_each
(flag) — Whether to prefix theeach
column name to thevalue as the case/section name.section
— The section to show in the report. This allows different cases to be put in the same section in report.Only works wheneach
is not specified. Thesection
is used to collect cases and put the results under the same directory and the same section in report. Wheneach
for a case is specified, thesection
will be ignored and case name will be used assection
. The cases will be the expanded values ineach
column. Whenprefix_each
is True, the column name specified byeach
will be prefixed to each value as directory name and expanded case name.subset
— An expression to subset the cells, will be passed todplyr::filter()
on metadata.This will be applied prior toeach
.
r-dplyr
—- check: {{proc.lang}} -e "library(dplyr)"
r-seurat
—- check: {{proc.lang}} -e "library(Seurat)"
r-tidyr
—- check: {{proc.lang}} -e "library(tidyr)"
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratMetadataMutater
(
*args
, **kwds
)
→ Proc
Mutate the metadata of the seurat object
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
metafile
— Additional metadataA tab-delimited file with columns as meta columns and rows as cells.srtobj
— The seurat object loaded by SeuratPreparing
rdsfile
— The seurat object with the additional metadata
mutaters
(type=json) — The mutaters to mutate the metadata.The key-value pairs will be passed thedplyr::mutate()
to mutate the metadata. There are also also 4 helper functions,expanded
,collapsed
,emerged
andvanished
, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use{"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Tumor_Collapsed_Clones
with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will beNA
. Those functions take following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
group.by
: The column name in metadata to group the cells. - *
idents
: The first group or both groups of cells to compare (value ingroup.by
column). If only the first group is given, the rest of the cells (with non-NA ingroup.by
column) will be used as the second group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
order
: The expression passed todplyr::arrange()
to order intermediate dataframe and get the ids in order accordingly.
The intermediate dataframe includes the following columns:
*<id>
: The ids of clones (i.e.CDR3.aa
).
*<each>
: The values ineach
column.
*ident_1
: The size of clones in the first group.
*ident_2
: The size of clones in the second group.
*.diff
: The difference between the sizes of clones in the first and second groups.
*.sum
: The sum of the sizes of clones in the first and second groups.
*.predicate
: Showing whether the clone is expanded/collapsed/emerged/vanished. - *
include_emerged
: Whether to include the emerged group forexpanded
(only works forexpanded
). Default isFALSE
. - *
include_vanished
: Whether to include the vanished group forcollapsed
(only works forcollapsed
). Default isFALSE
.
top()
to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use{"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Top10_Clones
. The values in this columns for other clones will beNA
. This function takes following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
n
: The number of top clones to return. Default is10
.
If n < 1, it will be treated as the percentage of the size of the group.
Specify0
to get all clones. - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
with_ties
: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default isFALSE
.
- *
r-dplyr
—- check: {{proc.lang}} <(echo "library(dplyr)")
r-seurat
—- check: {{proc.lang}} <(echo "library(Seurat)")
r-tibble
—- check: {{proc.lang}} <(echo "library(tibble)")
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
DimPlots
(
*args
, **kwds
)
→ Proc
Seurat - Dimensional reduction plots
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
configfile
— A toml configuration file with "cases"If this is given,envs.cases
will be overridenname
— The name of the job, used in reportsrtobj
— The seruat object in RDS format
outdir
— The output directory
cases
— The cases for the dim plotsKeys are the names and values are the arguments toSeurat::Dimplots
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
MarkersFinder
(
*args
, **kwds
)
→ Proc
Find markers between different groups of cells
When only group-by
is specified as "seurat_clusters"
in
envs.cases
, the markers will be found for all the clusters.
You can also find the differentially expressed genes between
any two groups of cells by setting group-by
to a different
column name in metadata. Follow envs.cases
for more details.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object loaded bySeuratPreparing
If you have yourSeurat
object prepared by yourself, you can also use it here, but you should make sure that the object has been processed byPrepSCTFindMarkers
if data is not normalized usingSCTransform
.
outdir
— The output directory for the markers and plots
assay
— The assay to use.cache
(type=auto) — Where to cache toFindAllMarkers
results.IfTrue
, cache tooutdir
of the job. IfFalse
, don't cache. Otherwise, specify the directory to cache to. Only works whenuse_presto
isFalse
(presto works fast enough).cases
(type=json) — If you have multiple cases, you can specify themhere. The keys are the names of the cases and the values are the above options exceptncores
andmutaters
. If some options are not specified, the default values specified above will be used. If no cases are specified, the default case will be added with the default values underenvs
with the nameDEFAULT
.dbs
(list) — The dbs to do enrichment analysis for significantmarkers See below for all libraries. https://maayanlab.cloud/Enrichr/#librariesdotplot
(ns) — Arguments forSeurat::DotPlot()
.Use-
to replace.
in the argument name. For example, usegroup-bar
instead ofgroup.bar
. Note thatobject
,features
, andgroup-by
are already specified by this process. So you don't need to specify them here.- - maxgenes (type=int): The maximum number of genes to plot.
- - devpars (ns): The device parameters for the plots.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots. - -
: See https://satijalab.org/seurat/reference/doheatmap
each
— The column name in metadata to separate the cells into differentcases.group-by
— The column name in metadata to group the cells.If onlygroup-by
is specified, andident-1
andident-2
are not specified, markers will be found for all groups in this column in the manner of "group vs rest" comparison.NA
group will be ignored.ident-1
— The first group of cells to compareident-2
— The second group of cells to compareIf not provided, the rest of the cells are used forident-2
.mutaters
(type=json) — The mutaters to mutate the metadataThere are also also 4 helper functions,expanded
,collapsed
,emerged
andvanished
, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use{"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Tumor_Collapsed_Clones
with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will beNA
. Those functions take following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
group.by
: The column name in metadata to group the cells. - *
idents
: The first group or both groups of cells to compare (value ingroup.by
column). If only the first group is given, the rest of the cells (with non-NA ingroup.by
column) will be used as the second group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
order
: The expression passed todplyr::arrange()
to order intermediate dataframe and get the ids in order accordingly.
The intermediate dataframe includes the following columns:
*<id>
: The ids of clones (i.e.CDR3.aa
).
*<each>
: The values ineach
column.
*ident_1
: The size of clones in the first group.
*ident_2
: The size of clones in the second group.
*.diff
: The difference between the sizes of clones in the first and second groups.
*.sum
: The sum of the sizes of clones in the first and second groups.
*.predicate
: Showing whether the clone is expanded/collapsed/emerged/vanished. - *
include_emerged
: Whether to include the emerged group forexpanded
(only works forexpanded
). Default isFALSE
. - *
include_vanished
: Whether to include the vanished group forcollapsed
(only works forcollapsed
). Default isFALSE
.
top()
to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use{"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Top10_Clones
. The values in this columns for other clones will beNA
. This function takes following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
n
: The number of top clones to return. Default is10
.
If n < 1, it will be treated as the percentage of the size of the group.
Specify0
to get all clones. - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
with_ties
: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default isFALSE
.
- *
ncores
(type=int) — Number of cores to use for parallel computing for someSeurat
procedures.- * Used in
future::plan(strategy = "multicore", workers = <ncores>)
to parallelize some Seurat procedures. - * See also: https://satijalab.org/seurat/articles/future_vignette.html
- * Used in
overlap
(json) — The sections to do overlaping analysis, includingVenn diagram and UpSet plot. The Venn diagram and UpSet plot will be plotted for the overlapping of significant markers between different cases. The keys of this option are the names of the sections. The values are a dict of options with keysvenn
andupset
, values will be inherited fromenvs.overlap_defaults
, recursively. You can setenvs.overlap.<section>.venn
toFalse
/None
to disable the Venn diagram for the section. It works wheneach
is specified. In such a case, the sections will be the case names. This does not work for the cases whereident-1
is not specified. In case you want to do such analysis for those cases, you should enumerate the idents in different cases and specify them here.overlap_defaults
(ns) — The default options for overlapping analysis.- - venn (ns): The options for the Venn diagram.
Venn diagram can only be plotted for sections with no more than 4 cases.
- devpars (ns): The device parameters for the plots.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots. - - upset (ns): The options for the UpSet plot.
- devpars (ns): The device parameters for the plots.
- res (type=int): The resolution of the plots.
- height (type=int): The height of the plots.
- width (type=int): The width of the plots.
- - venn (ns): The options for the Venn diagram.
prefix_each
(flag) — Whether to prefix theeach
column name to thevalue as the case/section name.prefix_group
(flag) — When neitherident-1
norident-2
is specified,should we prefix the group name to the section name?rest
(ns) — Rest arguments forSeurat::FindMarkers()
.Use-
to replace.
in the argument name. For example, usemin-pct
instead ofmin.pct
. This only works whenuse_presto
isFalse
.section
— The section name for the report. It must not contain colon (:
).Ignored wheneach
is not specified andident-1
is specified. When neithereach
norident-1
is specified, case name will be used as section name. Ifeach
is specified, the section name will be constructed fromeach
and case name. Thesection
is used to collect cases and put the results under the same directory and the same section in report. Wheneach
for a case is specified, thesection
will be ignored and case name will be used assection
. The cases will be the expanded values ineach
column. Whenprefix_each
is True, the column name specified byeach
will be prefixed to each value as directory name and expanded case name.sigmarkers
— An expression passed todplyr::filter()
to filter thesignificant markers for enrichment analysis. Available variables arep_val
,avg_log2FC
,pct.1
,pct.2
andp_val_adj
. For example,"p_val_adj < 0.05 & abs(avg_log2FC) > 1"
to select markers with adjusted p-value < 0.05 and absolute log2 fold change > 1.subset
— An expression to subset the cells for each case.volcano_genes
(type=auto) — The genes to label in the volcano plot if they aresignificant markers. IfTrue
, all significant markers will be labeled. IfFalse
, no genes will be labeled. Otherwise, specify the genes to label. It could be either a string with comma separated genes, or a list of genes.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
TopExpressingGenes
(
*args
, **kwds
)
→ Proc
Find the top expressing genes in each cluster
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object in RDS format
outdir
— The output directory for the tables and plots
cases
(type=json) — If you have multiple cases, you can specify themhere. The keys are the names of the cases and the values are the above options exceptmutaters
. If some options are not specified, the default values specified above will be used. If no cases are specified, the default case will be added with the default values underenvs
with the nameDEFAULT
.dbs
(list) — The dbs to do enrichment analysis for significantmarkers See below for all libraries. https://maayanlab.cloud/Enrichr/#librarieseach
— The column name in metadata to separate the cells into differentcases. When specified,ident
must be specifiedgroup-by
— The column name in metadata to group the cells.ident
— The group of cells to find the top expressing genes.The cells will be selected by thegroup-by
column with thisident
value in metadata. If not provided, the top expressing genes will be found for all groups of cells in thegroup-by
column.mutaters
(type=json) — The mutaters to mutate the metadatan
(type=int) — The number of top expressing genes to find.prefix_each
(flag) — Whether to prefix theeach
column name to thevalue as the case/section name.section
— The section name for the report.Worked only wheneach
is not specified andident
is specified. Otherwise, the section name will be constructed fromeach
andgroup-by
. IfDEFAULT
, and it's the only section, it not included in the case/section names.subset
— An expression to subset the cells for each case.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
ExprImputation
(
*args
, **kwds
)
→ Proc
This process imputes the dropout values in scRNA-seq data.
It takes the Seurat object as input and outputs the Seurat object with imputed expression data.
Reference:
- - Linderman, George C., Jun Zhao, and Yuval Kluger. "Zero-preserving imputation of scRNA-seq data using low-rank approximation." BioRxiv (2018): 397588.
- - Li, Wei Vivian, and Jingyi Jessica Li. "An accurate and robust imputation method scImpute for single-cell RNA-seq data." Nature communications 9.1 (2018): 997.
- - Dijk, David van, et al. "MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data." BioRxiv (2017): 111591.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
infile
— The input file in RDS format of Seurat object
outfile
— The output file in RDS format of Seurat objectNote that with rmagic and alra, the original default assay will be renamed toRAW
and the imputed RNA assay will be renamed toRNA
and set as default assay.
alra_args
(type=json) — The arguments forRunALRA()
rmagic_args
(ns) — The arguments for rmagic- - python: The python path where magic-impute is installed.
scimpute_args
(ns) — The arguments for scimpute- - drop_thre (type=float): The dropout threshold
- - kcluster (type=int): Number of clusters to use
- - ncores (type=int): Number of cores to use
- - refgene: The reference gene file
tool
(choice) — Either alra, scimpute or rmagic- - alra: Use RunALRA() from Seurat
- - scimpute: Use scImpute() from scimpute
- - rmagic: Use magic() from Rmagic
magic-impute
—- if: {{proc.envs.tool == "rmagic"}}
- check: {{proc.envs.rmagic_args.python}} -c "import magic")
r-dplyr
—- if: {{proc.envs.tool == "scimpute"}}
- check: {{proc.lang}} <(echo "library(dplyr)")
r-rmagic
—- if: {{proc.envs.tool == "rmagic"}}
- check: | {{proc.lang}} <( echo " tryCatch( { setwd(dirname(Sys.getenv('CONDA_PREFIX'))) }, error = function(e) NULL ); library(Rmagic) " )
r-scimpute
—- if: {{proc.envs.tool == "scimpute"}}
- check: {{proc.lang}} <(echo "library(scImpute)")
r-seurat
—- check: {{proc.lang}} <(echo "library(Seurat)")
r-seuratwrappers
—- if: {{proc.envs.tool == "alra"}}
- check: {{proc.lang}} <(echo "library(SeuratWrappers)")
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SCImpute
(
*args
, **kwds
)
→ Proc
Impute the dropout values in scRNA-seq data.
Deprecated. Use ExprImputation
instead.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
groupfile
— The file to subset the matrix or label the cellsCould be an output from ImmunarchFilterinfile
— The input file for imputationEither a SeuratObject or a matrix of count/TPM
outfile
— The output matrix
infmt
— The input format.Eitherseurat
ormatrix
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratFilter
(
*args
, **kwds
)
→ Proc
Filtering cells from a seurat object
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
filters
— The filters to apply. Could be a file or string in TOML, ora python dictionary, with following keys:- - mutaters: Create new columns in the metadata
- - filter: A R expression that will pass to
subset(sobj, subset = ...)
to filter the cells
srtobj
— The seurat object in RDS
outfile
— The filtered seurat object in RDS
invert
— Invert the selection?
r-dplyr
—- check: {{proc.lang}} <(echo "library('dplyr')")
r-seurat
—- check: {{proc.lang}} <(echo "library('Seurat')")
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratSubset
(
*args
, **kwds
)
→ Proc
Subset a seurat object into multiple seruat objects
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object in RDSsubsets
— The subsettings to apply. Could be a file or string in TOML, ora python dictionary, with following keys:- -
: Name of the case
mutaters: Create new columns in the metadata
subset: A R expression that will pass to
subset(sobj, subset = ...)
groupby: The column to group by, each value will be a case
If groupby is given, subset will be ignored, each value
of the groupby column will be a case
- -
outdir
— The output directory with the subset seurat objects
ignore_nas
— Ignore NA values?
r-dplyr
—- check: {{proc.lang}} <(echo "library('dplyr')")
r-seurat
—- check: {{proc.lang}} <(echo "library('Seurat')")
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratSplit
(
*args
, **kwds
)
→ Proc
Split a seurat object into multiple seruat objects
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
by
— The metadata column to split bysrtobj
— The seurat object in RDS
outdir
— The output directory with the subset seurat objects
by
— The metadata column to split byIgnored ifby
is given in the inputrecell
— Rename the cell ids using theby
columnA string of R function taking the original cell ids andby
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
Subset10X
(
*args
, **kwds
)
→ Proc
Subset 10X data, mostly used for testing
Requires r-matrix to load matrix.mtx.gz
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
indir
— The input directory
outdir
— The output directory
feats_to_keep
— The features/genes to keep.The final features list will befeats_to_keep
+nfeats
ncells
— The number of cells to keep.If <=1 then it will be the percentage of cells to keepnfeats
— The number of features to keep.If <=1 then it will be the percentage of features to keepseed
— The seed for random number generator
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratTo10X
(
*args
, **kwds
)
→ Proc
Write a Seurat object to 10X format
using write10xCounts
from DropletUtils
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object in RDS
outdir
— The output directory.Whenenvs.split_by
is specified, the subdirectories will be created for each distinct value of the column. Otherwise, the matrices will be written to the output directory.
version
— The version of 10X format
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
ScFGSEA
(
*args
, **kwds
)
→ Proc
Gene set enrichment analysis for cells in different groups using fgsea
This process allows us to do Gene Set Enrichment Analysis (GSEA) on the expression data, but based on variaties of grouping, including the from the meta data and the scTCR-seq data as well.
The GSEA is done using the fgsea package, which allows to quickly and accurately calculate arbitrarily low GSEA P-values for a collection of gene sets. The fgsea package is based on the fast algorithm for preranked GSEA described in Subramanian et al. 2005.
For each case, the process will generate a table with the enrichment scores for each gene set, and GSEA plots for the top gene sets.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object in RDS format
outdir
— The output directory for the results and plots
cases
(type=json;order=99) — If you have multiple cases, you can specify them here.The keys are the names of the cases and the values are the above options exceptmutaters
. If some options are not specified, the default values specified above will be used. If no cases are specified, the default case will be added with the nameDEFAULT
.each
— The column name in metadata to separate the cells into different subsets to do the analysis.eps
(type=float) — This parameter sets the boundary for calculating the p value.See https://rdrr.io/bioc/fgsea/man/fgseaMultilevel.htmlgmtfile
— The pathways in GMT format, with the gene names/ids in the same format as the seurat object.One could also use a URL to a GMT file. For example, from https://download.baderlab.org/EM_Genesets/current_release/Human/symbol/Pathways/.group-by
— The column name in metadata to group the cells.ident-1
— The first group of cells to compareident-2
— The second group of cells to compare, if not provided, the rest of the cells that are notNA
s ingroup-by
column are used forident-2
.maxsize
(type=int) — Maximal size of a gene set to test. All pathways above the threshold are excluded.method
(choice) — The method to do the preranking.- - signal_to_noise: Signal to noise.
The larger the differences of the means (scaled by the standard deviations);
that is, the more distinct the gene expression is in each phenotype and the more the gene
acts as a "class marker". - - s2n: Alias of signal_to_noise.
- - abs_signal_to_noise: The absolute value of signal_to_noise.
- - abs_s2n: Alias of abs_signal_to_noise.
- - t_test: T test.
Uses the difference of means scaled by the standard deviation and number of samples. - - ratio_of_classes: Also referred to as fold change.
Uses the ratio of class means to calculate fold change for natural scale data. - - diff_of_classes: Difference of class means.
Uses the difference of class means to calculate fold change for nature scale data - - log2_ratio_of_classes: Log2 ratio of class means.
Uses the log2 ratio of class means to calculate fold change for natural scale data.
This is the recommended statistic for calculating fold change for log scale data.
- - signal_to_noise: Signal to noise.
minsize
(type=int) — Minimal size of a gene set to test. All pathways below the threshold are excluded.mutaters
(type=json) — The mutaters to mutate the metadata.The key-value pairs will be passed thedplyr::mutate()
to mutate the metadata. There are also also 4 helper functions,expanded
,collapsed
,emerged
andvanished
, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use{"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Tumor_Collapsed_Clones
with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will beNA
. Those functions take following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
group.by
: The column name in metadata to group the cells. - *
idents
: The first group or both groups of cells to compare (value ingroup.by
column). If only the first group is given, the rest of the cells (with non-NA ingroup.by
column) will be used as the second group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
order
: The expression passed todplyr::arrange()
to order intermediate dataframe and get the ids in order accordingly.
The intermediate dataframe includes the following columns:
*<id>
: The ids of clones (i.e.CDR3.aa
).
*<each>
: The values ineach
column.
*ident_1
: The size of clones in the first group.
*ident_2
: The size of clones in the second group.
*.diff
: The difference between the sizes of clones in the first and second groups.
*.sum
: The sum of the sizes of clones in the first and second groups.
*.predicate
: Showing whether the clone is expanded/collapsed/emerged/vanished. - *
include_emerged
: Whether to include the emerged group forexpanded
(only works forexpanded
). Default isFALSE
. - *
include_vanished
: Whether to include the vanished group forcollapsed
(only works forcollapsed
). Default isFALSE
.
top()
to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use{"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Top10_Clones
. The values in this columns for other clones will beNA
. This function takes following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
n
: The number of top clones to return. Default is10
.
If n < 1, it will be treated as the percentage of the size of the group.
Specify0
to get all clones. - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
with_ties
: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default isFALSE
.
- *
ncores
(type=int) — Number of cores for parallelizationPassed tonproc
offgseaMultilevel()
.prefix_each
(flag) — Whether to prefix theeach
column name to the values as the case/section name.rest
(type=json;order=98) — Rest arguments forfgsea()
See also https://rdrr.io/bioc/fgsea/man/fgseaMultilevel.htmlsection
— The section name for the report. Worked only wheneach
is not specified. Otherwise, the section name will be constructed fromeach
and its value.This allows different cases to be put into the same section in the report. Thesection
is used to collect cases and put the results under the same directory and the same section in report. Wheneach
for a case is specified, thesection
will be ignored and case name will be used assection
. The cases will be the expanded values ineach
column. Whenprefix_each
is True, the column name specified byeach
will be prefixed to each value as directory name and expanded case name.subset
— An expression to subset the cells.top
(type=auto) — Do gsea table and enrich plot for top N pathways.If it is < 1, will apply it topadj
, selecting pathways withpadj
<top
.
bioconductor-fgsea
—- check: {{proc.lang}} -e "library(fgsea)"
r-seurat
—- check: {{proc.lang}} -e "library(seurat)"
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
CellTypeAnnotation
(
*args
, **kwds
)
→ Proc
Annotate the cell clusters. Currently, four ways are supported:
The annotated cell types will replace the original seurat_clusters
column in the metadata,
so that the downstream processes will use the annotated cell types.
The old seurat_clusters
column will be renamed to seurat_clusters_id
.
If you are using ScType
, scCATCH
, or hitype
, a text file containing the mapping from
the old seurat_clusters
to the new cell types will be generated and saved to
cluster2celltype.tsv
under <workdir>/<pipline_name>/CellTypeAnnotation/0/output/
.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
[CellTypeAnnotation.envs]tool = "direct"
cell_types = ["CellType1", "CellType2", "-", "CellType4"]
The cell types will be assigned as:
0 -> CellType1
1 -> CellType2
2 -> 2
3 -> CellType4
sobjfile
— The seurat object
outfile
— The rds file of seurat object with cell type annotated.A text file containing the mapping from the oldseurat_clusters
to the new cell types will be generated and saved tocluster2celltype.tsv
under the job output directory.
cell_types
(list) — The cell types to use for direct annotation.You can use"-"
or""
as the placeholder for the clusters that you want to keep the original cell types (seurat_clusters
). If the length ofcell_types
is shorter than the number of clusters, the remaining clusters will be kept as the original cell types. You can also useNA
to remove the clusters from downstream analysis. This only works whenenvs.newcol
is not specified.
/// Note Iftool
isdirect
andcell_types
is not specified or an empty list, the original cell types will be kept and nothing will be changed. ///celltypist_args
(ns) — The arguments forcelltypist::celltypist()
iftool
iscelltypist
.- - model: The path to model file.
- - python: The python path where celltypist is installed.
- - majority_voting: When true, it refines cell identities within local subclusters after an over-clustering approach
at the cost of increased runtime. - - over_clustering (type=auto): The column name in metadata to use as clusters for majority voting.
Set toFalse
to disable over-clustering. - - assay: When converting a Seurat object to AnnData, the assay to use.
If input is h5seurat, this defaults to RNA.
If input is Seurat object in RDS, this defaults to the default assay.
hitype_db
— The database to use for hitype.Compatible withsctype_db
. See also https://pwwang.github.io/hitype/articles/prepare-gene-sets.html You can also use built-in databases, includinghitypedb_short
,hitypedb_full
, andhitypedb_pbmc3k
.hitype_tissue
— The tissue to use forhitype
.Avaiable tissues should be the first column (tissueType
) ofhitype_db
. If not specified, all rows inhitype_db
will be used.merge
(flag) — Whether to merge the clusters with the same cell types.Otherwise, a suffix will be added to the cell types (ie..1
,.2
, etc).newcol
— The new column name to store the cell types.If not specified, theseurat_clusters
column will be overwritten. If specified, the originalseurat_clusters
column will be kept andIdents
will be kept as the originalseurat_clusters
.outtype
(choice) — The output file type. Currently only works forcelltypist
.An RDS file will be generated for other tools.- - input: Use the same file type as the input.
- - rds: Use RDS file.
- - h5seurat: Use h5seurat file.
- - h5ad: Use AnnData file.
sccatch_args
(ns) — The arguments forscCATCH::findmarkergene()
iftool
issccatch
.- - species: The specie of cells.
- - cancer: If the sample is from cancer tissue, then the cancer type may be defined.
- - tissue: Tissue origin of cells must be defined.
- - marker: The marker genes for cell type identification.
- - if_use_custom_marker (flag): Whether to use custom marker genes. If
True
, nospecies
,cancer
, andtissue
are needed. - -
: Other arguments for scCATCH::findmarkergene()
.
You can pass an RDS file tosccatch_args.marker
to work as custom marker. If so,
if_use_custom_marker
will be set toTRUE
automatically.
sctype_db
— The database to use for sctype.Check examples at https://github.com/IanevskiAleksandr/sc-type/blob/master/ScTypeDB_full.xlsxsctype_tissue
— The tissue to use forsctype
.Avaiable tissues should be the first column (tissueType
) ofsctype_db
. If not specified, all rows insctype_db
will be used.tool
(choice) — The tool to use for cell type annotation.- - sctype: Use
scType
to annotate cell types.
See https://github.com/IanevskiAleksandr/sc-type - - hitype: Use
hitype
to annotate cell types.
See https://github.com/pwwang/hitype - - sccatch: Use
scCATCH
to annotate cell types.
See https://github.com/ZJUFanLab/scCATCH - - celltypist: Use
celltypist
to annotate cell types.
See https://github.com/Teichlab/celltypist - - direct: Directly assign cell types
- - sctype: Use
r-HGNChelper
—- if: {{proc.envs.tool == 'sctype'}}
- check: {{proc.lang}} -e "library(HGNChelper)"
r-dplyr
—- if: {{proc.envs.tool == 'sctype'}}
- check: {{proc.lang}} -e "library(dplyr)"
r-openxlsx
—- if: {{proc.envs.tool == 'sctype'}}
- check: {{proc.lang}} -e "library(openxlsx)"
r-seurat
—- if: {{proc.envs.tool == 'sctype'}}
- check: {{proc.lang}} -e "library(Seurat)"
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
SeuratMap2Ref
(
*args
, **kwds
)
→ Proc
Map the seurat object to reference
See: https://satijalab.org/seurat/articles/integration_mapping.html and https://satijalab.org/seurat/articles/multimodal_reference_mapping.html
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
sobjfile
— The seurat object
outfile
— The rds file of seurat object with cell type annotated.Note that the reduction name will beref.umap
for the mapping. To visualize the mapping, you should useref.umap
as the reduction name.
FindTransferAnchors
(ns) — Arguments forFindTransferAnchors()
- - normalization-method (choice): Name of normalization method used.
- LogNormalize: Log-normalize the data matrix
- SCT: Scale data using the SCTransform method
- auto: Automatically detect the normalization method.
Seeenvs.refnorm
. - - reference-reduction: Name of dimensional reduction to use from the reference if running the pcaproject workflow.
Optionally enables reuse of precomputed reference dimensional reduction. - -
: See https://satijalab.org/seurat/reference/findtransferanchors.
Note that the hyphen (-
) will be transformed into.
for the keys.
- - normalization-method (choice): Name of normalization method used.
MapQuery
(ns) — Arguments forMapQuery()
- - reference-reduction: Name of reduction to use from the reference for neighbor finding
- - reduction-model:
DimReduc
object that contains the umap model. - - refdata (type=json): Extra data to transfer from the reference to the query.
- -
: See https://satijalab.org/seurat/reference/mapquery.
Note that the hyphen (-
) will be transformed into.
for the keys.
MappingScore
(ns) — Arguments forMappingScore()
- -
: See https://satijalab.org/seurat/reference/mappingscore.
Note that the hyphen (-
) will be transformed into.
for the keys.
- -
NormalizeData
(ns) — Arguments forNormalizeData()
- - normalization-method: Normalization method.
- -
: See https://satijalab.org/seurat/reference/normalizedata.
Note that the hyphen (-
) will be transformed into.
for the keys.
SCTransform
(ns) — Arguments forSCTransform()
- - do-correct-umi (flag): Place corrected UMI matrix in assay counts layer?
- - do-scale (flag): Whether to scale residuals to have unit variance?
- - do-center (flag): Whether to center residuals to have mean zero?
- -
: See https://satijalab.org/seurat/reference/sctransform.
Note that the hyphen (-
) will be transformed into.
for the keys.
ident
— The name of the ident for query transferred fromenvs.use
of the reference.mutaters
(type=json) — The mutaters to mutate the metadata.This is helpful when we want to create new columns forsplit_by
.ncores
(type=int;order=-100) — Number of cores to use.Whensplit_by
is used, this will be the number of cores for each object to map to the reference. Whensplit_by
is not used, this is used infuture::plan(strategy = "multicore", workers = <ncores>)
to parallelize some Seurat procedures. See also: https://satijalab.org/seurat/archive/v3.0/future_vignette.htmlref
— The reference seurat object file.Either an RDS file or a h5seurat file that can be loaded bySeurat::LoadH5Seurat()
. The file type is determined by the extension..rds
or.RDS
for RDS file,.h5seurat
or.h5
for h5seurat file.refnorm
(choice) — Normalization method the reference used. The same method will be used for the query.- - NormalizeData: Using
NormalizeData
. - - SCTransform: Using
SCTransform
. - - auto: Automatically detect the normalization method.
If the default assay of reference isSCT
, thenSCTransform
will be used.
- - NormalizeData: Using
skip_if_normalized
— Skip normalization if the query is already normalized.Since the object is supposed to be generated bySeuratPreparing
, it is already normalized. However, a different normalization method may be used. If the reference is normalized by the same method as the query, the normalization can be skipped. Otherwise, the normalization cannot be skipped. The normalization method used for the query set is determined by the default assay. IfSCT
, thenSCTransform
is used; otherwise,NormalizeData
is used. You can set this toFalse
to force re-normalization (with or without the arguments previously used).split_by
— The column name in metadata to split the query into multiple objects.This helps when the original query is too large to process.use
— A column name of metadata from the reference(e.g.celltype.l1
,celltype.l2
) to transfer to the query as the cell types (ident) for downstream analysis. This field is required. If you want to transfer multiple columns, you can useenvs.MapQuery.refdata
.
r-seurat
—- check: {{proc.lang}} -e "library(Seurat)"
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
RadarPlots
(
*args
, **kwds
)
→ Proc
Radar plots for cell proportion in different clusters.
This process generates the radar plots for the clusters of T cells. It explores the proportion of cells in different groups (e.g. Tumor vs Blood) in different T-cell clusters.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
Let's say we have a metadata like this:
Cell | Source | Timepoint | seurat_clusters |
---|---|---|---|
A | Blood | Pre | 0 |
B | Blood | Pre | 0 |
C | Blood | Post | 1 |
D | Blood | Post | 1 |
E | Tumor | Pre | 2 |
F | Tumor | Pre | 2 |
G | Tumor | Post | 3 |
H | Tumor | Post | 3 |
With configurations:
[RadarPlots.envs]
by = "Source"
Then we will have a radar plots like this:
We can use each
to separate the cells into different cases:
[RadarPlots.envs]
by = "Source"
each = "Timepoint"
Then we will have two radar plots, one for Pre
and one for Post
:
Using cluster_order
to change the order of the clusters and show only the first 3 clusters:
[RadarPlots.envs]
by = "Source"
cluster_order = ["2", "0", "1"]
breaks = [0, 50, 100] # also change the breaks
n. /
srtobj
— The seurat object in RDS format
outdir
— The output directory for the plots
bar_devpars
(ns) — The parameters forpng()
for the barplot- - res (type=int): The resolution of the plot
- - height (type=int): The height of the plot
- - width (type=int): The width of the plot
breakdown
— An additional column with groups to break down the cellsdistribution in each cluster. For example, if you want to see the distribution of the cells in each cluster in different samples. In this case, you should have multiple values in eachby
. These values won't be plotted in the radar plot, but a barplot will be generated with the mean value of each group and the error bar.breaks
(list;itype=int) — breaks of the radar plots, from 0 to 100.If not given, the breaks will be calculated automatically.by
— Which column to use to separate the cells in different groups.NA
s will be ignored. For example, If you have a column namedSource
that marks the source of the cells, and you want to separate the cells intoTumor
andBlood
groups, you can setby
toSource
. The there will be two curves in the radar plot, one forTumor
and one forBlood
.cases
(type=json) — The cases for the multiple radar plots.Keys are the names of the cases and values are the arguments for the plots (each
,by
,order
,breaks
,direction
,ident
,cluster_order
anddevpars
). If not cases are given, a default case will be used, with the keyDEFAULT
. The keys must be valid string as part of the file name.cluster_order
(list) — The order of the clusters.You may also use it to filter the clusters. If not given, all clusters will be used. If the cluster names are integers, use them directly for the order, even though a prefixCluster
is added on the plot.colors
— The colors for the groups inby
. If not specified,the default colors will be used. Multiple colors can be separated by comma (,
). You can specifybiopipen
to use thebiopipen
palette.devpars
(ns) — The parameters forpng()
- - res (type=int): The resolution of the plot
- - height (type=int): The height of the plot
- - width (type=int): The width of the plot
direction
(choice) — Direction to calculate the percentages.- - inter-cluster: the percentage of the cells in all groups
in each cluster (percentage adds up to 1 for each cluster). - - intra-cluster: the percentage of the cells in all clusters.
(percentage adds up to 1 for each group).
- - inter-cluster: the percentage of the cells in all groups
each
— A column with values to separate all cells in different casesWhen specified, the case will be expanded to multiple cases for each value in the column. If specified,section
will be ignored, and the case name will be used as the section name.ident
— The column name of the cluster information.mutaters
(type=json) — Mutaters to mutate the metadata of theseurat object. Keys are the column names and values are the expressions to mutate the columns. These new columns will be used to define your cases.order
(list) — The order of the values inby
. You can also limit(filter) the values we have inby
. For example, if columnSource
has valuesTumor
,Blood
,Spleen
, and you only want to plotTumor
andBlood
, you can setorder
to["Tumor", "Blood"]
. This will also haveTumor
as the first item in the legend andBlood
as the second item.prefix_each
(flag) — Whether to prefix theeach
column name to the values as thecase/section name.section
— If you want to put multiple cases into a same sectionin the report, you can set this option to the name of the section. Only used in the report.subset
— The subset of the cells to do the analysis.test
(choice) — The test to use to calculate the p values.If there are more than 2 groups inby
, the p values will be calculated pairwise group by group. Only works whenbreakdown
is specified andby
has 2 groups or more.- - wilcox: Wilcoxon rank sum test
- - t: T test
- - none: No test will be performed
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
MetaMarkers
(
*args
, **kwds
)
→ Proc
Find markers between three or more groups of cells, using one-way ANOVAor Kruskal-Wallis test.
Sometimes, you may want to find the markers for cells from more than 2 groups. In this case, you can use this process to find the markers for the groups and do enrichment analysis for the markers. Each marker is examined using either one-way ANOVA or Kruskal-Wallis test. The p values are adjusted using the specified method. The significant markers are then used for enrichment analysis using enrichr api.
Other than the markers and the enrichment analysis as outputs, this process also generates violin plots for the top 10 markers.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
srtobj
— The seurat object loaded bySeuratPreparing
outdir
— The output directory for the markers
cases
(type=json) — If you have multiple cases, you can specify themhere. The keys are the names of the cases and the values are the above options exceptncores
andmutaters
. If some options are not specified, the default values specified above will be used. If no cases are specified, the default case will be added with the default values underenvs
with the nameDEFAULT
.dbs
(list) — The dbs to do enrichment analysis for significantmarkers See below for all libraries. https://maayanlab.cloud/Enrichr/#librarieseach
— The column name in metadata to separate the cells into different cases.group-by
— The column name in metadata to group the cells.If onlygroup-by
is specified, andidents
are not specified, markers will be found for all groups in this column.NA
group will be ignored.idents
— The groups of cells to compare, values should be in thegroup-by
column.method
(choice) — The method for the test.- - anova: One-way ANOVA
- - kruskal: Kruskal-Wallis test
mutaters
(type=json) — The mutaters to mutate the metadataThe key-value pairs will be passed thedplyr::mutate()
to mutate the metadata. There are also also 4 helper functions,expanded
,collapsed
,emerged
andvanished
, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use{"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Tumor_Collapsed_Clones
with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will beNA
. Those functions take following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
group.by
: The column name in metadata to group the cells. - *
idents
: The first group or both groups of cells to compare (value ingroup.by
column). If only the first group is given, the rest of the cells (with non-NA ingroup.by
column) will be used as the second group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
order
: The expression passed todplyr::arrange()
to order intermediate dataframe and get the ids in order accordingly.
The intermediate dataframe includes the following columns:
*<id>
: The ids of clones (i.e.CDR3.aa
).
*<each>
: The values ineach
column.
*ident_1
: The size of clones in the first group.
*ident_2
: The size of clones in the second group.
*.diff
: The difference between the sizes of clones in the first and second groups.
*.sum
: The sum of the sizes of clones in the first and second groups.
*.predicate
: Showing whether the clone is expanded/collapsed/emerged/vanished. - *
include_emerged
: Whether to include the emerged group forexpanded
(only works forexpanded
). Default isFALSE
. - *
include_vanished
: Whether to include the vanished group forcollapsed
(only works forcollapsed
). Default isFALSE
.
top()
to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use{"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"}
to create a new column in metadata namedPatient1_Top10_Clones
. The values in this columns for other clones will beNA
. This function takes following arguments:- *
df
: The metadata data frame. You can use the.
to refer to it. - *
id
: The column name in metadata for the group ids (i.e.CDR3.aa
). - *
n
: The number of top clones to return. Default is10
.
If n < 1, it will be treated as the percentage of the size of the group.
Specify0
to get all clones. - *
compare
: Either a (numeric) column name (i.e.Clones
) in metadata to compare between groups, or.n
to compare the number of cells in each group.
If numeric column is given, the values should be the same for all cells in the same group.
This will not be checked (only the first value is used).
It is helpful to useClones
to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
Also if you havesubset
set orNA
s ingroup.by
column, you should use.n
to compare the number of cells in each group. - *
subset
: An expression to subset the cells, will be passed todplyr::filter()
. Default isTRUE
(no filtering). - *
each
: A column name (without quotes) in metadata to split the cells.
Each comparison will be done for each value in this column (typically each patient or subject). - *
uniq
: Whether to return unique ids or not. Default isTRUE
. IfFALSE
, you can mutate the meta data frame with the returned ids. For example,df |> mutate(expanded = expanded(...))
. - *
debug
: Return the data frame with intermediate columns instead of the ids. Default isFALSE
. - *
with_ties
: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default isFALSE
.
- *
ncores
(type=int) — Number of cores to use to parallelize for genesp_adjust
(choice) — The method to adjust the p values, which can be used to filter the significant markers.See also https://rdrr.io/r/stats/p.adjust.html- - holm: Holm-Bonferroni method
- - hochberg: Hochberg method
- - hommel: Hommel method
- - bonferroni: Bonferroni method
- - BH: Benjamini-Hochberg method
- - BY: Benjamini-Yekutieli method
- - fdr: FDR method of Benjamini-Hochberg
- - none: No adjustment
prefix_each
(flag) — Whether to add theeach
value as prefix to the case name.section
— The section name for the report.Worked only wheneach
is not specified. Otherwise, the section name will be constructed fromeach
andgroup-by
. IfDEFAULT
, and it's the only section, it not included in the case/section names. Thesection
is used to collect cases and put the results under the same directory and the same section in report. Wheneach
for a case is specified, thesection
will be ignored and case name will be used assection
. The cases will be the expanded values ineach
column. Whenprefix_each
is True, the column name specified byeach
will be prefixed to each value as directory name and expanded case name.sigmarkers
— An expression passed todplyr::filter()
to filter thesignificant markers for enrichment analysis. The default isp.value < 0.05
. Ifmethod = 'anova'
, the variables that can be used for filtering are:sumsq
,meansq
,statistic
,p.value
andp_adjust
. Ifmethod = 'kruskal'
, the variables that can be used for filtering are:statistic
,p.value
andp_adjust
.subset
— The subset of the cells to do the analysis.An expression passed todplyr::filter()
.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
Seurat2AnnData
(
*args
, **kwds
)
→ Proc
Convert seurat object to AnnData
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
sobjfile
— The seurat object file, in RDS or h5seurat format
outfile
— The AnnData file
assay
— The assay to use for AnnData.If not specified, the default assay will be used.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
AnnData2Seurat
(
*args
, **kwds
)
→ Proc
Convert AnnData to seurat object
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
adfile
— The AnnData file
outfile
— The seurat object file in RDS format
assay
— The assay to use to convert to seurat object.dotplot_check
(type=auto) — Whether to do a check withSeurat::DotPlot
to see if the conversion is successful. Set toFalse
to disable the check. IfTrue
, top 10 variable genes will be used for the check. You can give a list of genes or a string of genes with comma (,
) separated to use for the check. Only works forouttype = 'rds'
.outtype
(choice) — The output file type.- - rds: RDS file
- - h5seurat: h5seurat file
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
ScSimulation
(
*args
, **kwds
)
→ Proc
Simulate single-cell data using splatter.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
seed
— The seed for the simulationYou could also use string as the seed, and the seed will be generated bydigest::digest2int()
. So this could also work as a unique identifier for the simulation (ie. Sample ID).
outfile
— The output Seurat object/SingleCellExperiment in RDS format
method
(choice) — which simulation method to use. Options are:- - single: produces a single population
- - groups: produces distinct groups (eg. cell types), or
- - paths: selects cells from continuous trajectories (eg. differentiation processes)
ncells
(type=int) — The number of cells to simulatengenes
(type=int) — The number of genes to simulatenspikes
(type=int) — The number of spike-ins to simulateWhenngenes
,ncells
, andnspikes
are not specified, the default params frommockSCE()
will be used. By default,ngenes = 2000
,ncells = 200
, andnspikes = 100
.outtype
(choice) — The output file type.- - seurat: Seurat object
- - singlecellexperiment: SingleCellExperiment object
- - sce: alias for
singlecellexperiment
params
(ns) — Other parameters for simulation.The parameters are initializedsplitEstimate(mockSCE())
and then updated with the given parameters. See https://rdrr.io/bioc/splatter/man/SplatParams.html. Hyphens (-
) will be transformed into dots (.
) for the keys.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
CellCellCommunication
(
*args
, **kwds
)
→ Proc
Cell-cell communication inference
This is implemented based on LIANA, which is a Python package for cell-cell communication inference and provides a list of existing methods including CellPhoneDB, Connectome, log2FC, NATMI, SingleCellSignalR, Rank_Aggregate, Geometric Mean, scSeqComm, and CellChat.
You can also try python -c 'import liana; liana.mt.show_methods()'
to see the methods available.
Note that this process does not do any visualization. You can use CellCellCommunicationPlots
to visualize the results.
Reference:
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
sobjfile
— The seurat object file in RDS or h5seurat format or AnnData file.
outfile
— The output file with the 'liana_res' data frame.Stats are provided for both ligand and receptor entities, more specifically: ligand and receptor are the two entities that potentially interact. As a reminder, CCC events are not limited to secreted signalling, but we refer to them as ligand and receptor for simplicity. Also, in the case of heteromeric complexes, the ligand and receptor columns represent the subunit with minimum expression, while *complex corresponds to the actual complex, with subunits being separated by . source and target columns represent the source/sender and target/receiver cell identity for each interaction, respectively- *
*_props
: represents the proportion of cells that express the entity.
By default, any interactions in which either entity is not expressed in above 10% of cells per cell type
is considered as a false positive, under the assumption that since CCC occurs between cell types, a sufficient
proportion of cells within should express the genes. - *
*_means
: entity expression mean per cell type. - *
lr_means
: mean ligand-receptor expression, as a measure of ligand-receptor interaction magnitude. - *
cellphone_pvals
: permutation-based p-values, as a measure of interaction specificity.
- *
— Other arguments for the method.The arguments are passed to the method directly. See the method documentation for more details and alsohelp(liana.mt.<method>.__call__)
in Python.assay
— The assay to use for the analysis.Only works for Seurat object.expr_prop
(type=float) — Minimum expression proportion for the ligands andreceptors (+ their subunits) in the corresponding cell identities. Set to 0 to return unfiltered results.groupby
— The column name in metadata to group the cells.Typically, this column should be the cluster id.method
(choice) — The method to use for cell-cell communication inference.- - CellPhoneDB: Use CellPhoneDB method.
Magnitude Score: lr_means; Specificity Score: cellphone_pvals. - - Connectome: Use Connectome method.
- - log2FC: Use log2FC method.
- - NATMI: Use NATMI method.
- - SingleCellSignalR: Use SingleCellSignalR method.
- - Rank_Aggregate: Use Rank_Aggregate method.
- - Geometric_Mean: Use Geometric Mean method.
- - scSeqComm: Use scSeqComm method.
- - CellChat: Use CellChat method.
- - cellphonedb: alias for
CellPhoneDB
- - connectome: alias for
Connectome
- - log2fc: alias for
log2FC
- - natmi: alias for
NATMI
- - singlesignaler: alias for
SingleCellSignalR
- - rank_aggregate: alias for
Rank_Aggregate
- - geometric_mean: alias for
Geometric_Mean
- - scseqcomm: alias for
scSeqComm
- - cellchat: alias for
CellChat
- - CellPhoneDB: Use CellPhoneDB method.
min_cells
(type=int) — Minimum cells (per cell identity if grouped bygroupby
)to be considered for downstream analysis.n_perms
(type=int) — Number of permutations for the permutation test.Relevant only for permutation-based methods (e.g.,CellPhoneDB
). If0
is passed, no permutation testing is performed.ncores
(type=int) — The number of cores to use.rscript
— The path to the Rscript executable used to convert RDS file to AnnData.ifin.sobjfile
is an RDS file, it will be converted to AnnData file (h5ad). You needSeurat
,SeuratDisk
anddigest
installed.seed
(type=int) — The seed for the random number generator.species
(choice) — The species of the cells.- - human: Human cells, the 'consensus' resource will be used.
- - mouse: Mouse cells, the 'mouseconsensus' resource will be used.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process
biopipen.ns.scrna.
CellCellCommunicationPlots
(
*args
, **kwds
)
→ Proc
Visualization for cell-cell communication inference.
R package CCPlotR
is used to visualize
the results.
cache
— Should we detect whether the jobs are cached?desc
— The description of the process. Will use the summary fromthe docstring by default.dirsig
— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs
— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth
— How deep to update the envs when subclassed.error_strategy
— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export
— When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks
— How many jobs to run simultaneously?input
— The keys for the input channelinput_data
— The input data (will be computed for dependent processes)lang
— The language for the script to run. Should be the path to theinterpreter iflang
is not in$PATH
.name
— The name of the process. Will use the class name by default.nexts
— Computed fromrequires
to build the process relationshipsnum_retries
— How many times to retry to jobs once error occursorder
— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()
output
— The output keys for the output channel(the data will be computed)output_data
— The output data (to pass to the next processes)plugin_opts
— Options for process-level pluginsrequires
— The dependency processesscheduler
— The scheduler to run the jobsscheduler_opts
— The options for the schedulerscript
— The script template for the processsubmission_batch
— How many jobs to be submited simultaneouslytemplate
— Define the template engine to use.This could be either a template engine or a dict with keyengine
indicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Template
object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template
. You can subclasspipen.template.Template
to use your own template engine.
cccfile
— The output file fromCellCellCommunication
or a tab-separated file with the following columns:source
,target
,ligand
,receptor
, andscore
. If so,in.expfile
can be provided whereexp_df
is needed.expfile
— The expression file with the expression of ligands and receptors.Columns include:cell_type
,gene
andmean_exp
.
outdir
— The output directory for the plots.
cases
(type=json) — The cases for the plots.The keys are the names of the cases and the values are the arguments for the plots. The arguments include:- * kind: one of
arrow
,circos
,dotplot
,heatmap
,network
,
andsigmoid
. - * devpars: The parameters for
png()
for the plot, includingres
,
width
, andheight
. - * section: The section name for the report to group the plots.
- *
: Other arguments for cc_<kind>
function inCCPlotR
.
See the documentation for more details.
Or you can use?CCPlotR::cc_<kind>
in R.
- * kind: one of
score_col
— The column name in the input file that contains the score, ifthe input file is fromCellCellCommunication
. Two alias columns are added in the result file ofCellCellCommunication
,mag_score
andspec_score
, which are the magnitude and specificity scores.subset
— An expression to pass todplyr::filter()
to subset the ccc data.
__init_subclass__
(
)
— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc
(
proc
,name
,desc
,envs
,envs_depth
,cache
,export
,error_strategy
,num_retries
,forks
,input_data
,order
,plugin_opts
,requires
,scheduler
,scheduler_opts
,submission_batch
)
(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc
(
)
— GC process for the process to save memory after it's done</>init
(
)
— Init all other properties and jobs</>log
(
level
,msg
,*args
,logger
)
— Log message for the process</>run
(
)
— Run the process</>
pipen.proc.
ProcMeta
(
name
, bases
, namespace
, **kwargs
)
Meta class for Proc
__call__
(
cls
,*args
,**kwds
)
(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__
(
cls
,instance
)
— Override for isinstance(instance, cls).</>__repr__
(
cls
)
(str) — Representation for the Proc subclasses</>__subclasscheck__
(
cls
,subclass
)
— Override for issubclass(subclass, cls).</>register
(
cls
,subclass
)
— Register a virtual subclass of an ABC.</>
register
(
cls
, subclass
)
Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__
(
cls
, instance
)
Override for isinstance(instance, cls).
__subclasscheck__
(
cls
, subclass
)
Override for issubclass(subclass, cls).
__repr__
(
cls
)
→ strRepresentation for the Proc subclasses
__call__
(
cls
, *args
, **kwds
)
Make sure Proc subclasses are singletons
*args
(Any) — and**kwds
(Any) — Arguments for the constructor
The Proc instance
from_proc
(
proc
, name=None
, desc=None
, envs=None
, envs_depth=None
, cache=None
, export=None
, error_strategy=None
, num_retries=None
, forks=None
, input_data=None
, order=None
, plugin_opts=None
, requires=None
, scheduler=None
, scheduler_opts=None
, submission_batch=None
)
Create a subclass of Proc using another Proc subclass or Proc itself
proc
(Type) — The Proc subclassname
(str, optional) — The new name of the processdesc
(str, optional) — The new description of the processenvs
(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth
(int, optional) — How deep to update the envs when subclassed.cache
(bool, optional) — Whether we should check the cache for the jobsexport
(bool, optional) — When True, the results will be exported to<pipeline.outdir>
Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processeserror_strategy
(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries
(int, optional) — How many times to retry to jobs once error occursforks
(int, optional) — New forks for the new processinput_data
(Any, optional) — The input data for the process. Only when this processis a start processorder
(int, optional) — The order to execute the new processplugin_opts
(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires
(Sequence, optional) — The required processes for the new processscheduler
(str, optional) — The new shedular to run the new processscheduler_opts
(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch
(int, optional) — How many jobs to be submited simultaneously
The new process class
__init_subclass__
(
)
Do the requirements inferring since we need them to build up theprocess relationship
init
(
)
Init all other properties and jobs
gc
(
)
GC process for the process to save memory after it's done
log
(
level
, msg
, *args
, logger=<LoggerAdapter pipen.core (WARNING)>
)
Log message for the process
level
(int | str) — The log level of the recordmsg
(str) — The message to log*args
— The arguments to format the messagelogger
(LoggerAdapter, optional) — The logging logger
run
(
)
Run the process