SeuratPreparing¶
Load, prepare and apply QC to data, using Seurat
This process will - - Prepare the seurat object - Apply QC to the data - Integrate the data from different samples
See also - https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#standard-pre-processing-workflow-1) - https://satijalab.org/seurat/articles/integration_introduction
This process will read the scRNA-seq data, based on the information provided by
SampleInfo
, specifically, the paths specified by the RNAData
column.
Those paths should be either paths to directoies containing matrix.mtx
,
barcodes.tsv
and features.tsv
files that can be loaded by
Seurat::Read10X()
,
or paths to h5
files that can be loaded by
Seurat::Read10X_h5()
.
Each sample will be loaded individually and then merged into one Seurat
object, and then perform QC.
In order to perform QC, some additional columns are added to the meta data of the Seurat
object. They are:
precent.mt
: The percentage of mitochondrial genes.percent.ribo
: The percentage of ribosomal genes.precent.hb
: The percentage of hemoglobin genes.percent.plat
: The percentage of platelet genes.
For integration, two routes are available:
- Performing integration on datasets normalized with
SCTransform
- Using
NormalizeData
andFindIntegrationAnchors
Note
When using SCTransform
, the default Assay will be set to SCT
in output, rather than RNA
.
If you are using cca
or rpca
interation, the default assay will be integrated
.
Note
From biopipen
v0.23.0, this requires Seurat
v5.0.0 or higher.
See also Preparing the input.
Environment Variables¶
ncores
(type=int
): Default:1
.
Number of cores to use.
Used infuture::plan(strategy = "multicore", workers = <ncores>)
to parallelize some Seurat procedures.-
cell_qc
: Filter expression to filter cells, usingtidyrseurat::filter()
.
Available QC keys includenFeature_RNA
,nCount_RNA
,percent.mt
,percent.ribo
,percent.hb
, andpercent.plat
.Example
Including the columns added above, all available QC keys include
nFeature_RNA
,nCount_RNA
,percent.mt
,percent.ribo
,percent.hb
, andpercent.plat
. For example:will keep cells with more than 200 genes and less than 5%% mitochondrial genes.[SeuratPreparing.envs] cell_qc = "nFeature_RNA > 200 & percent.mt < 5"
-
cell_qc_per_sample
(flag
): Default:False
.
Whether to perform cell QC per sample or not.
IfTrue
, the cell QC will be performed per sample, and the QC will be applied to each sample before merging. -
gene_qc
(ns
): Filter genes.
gene_qc
is applied aftercell_qc
.min_cells
: Default:0
.
The minimum number of cells that a gene must be expressed in to be kept.-
excludes
: Default:[]
.
The genes to exclude. Multiple genes can be specified by comma separated values, or as a list.Example
will keep genes that are expressed in at least 3 cells.[SeuratPreparing.envs] gene_qc = { min_cells = 3 }
use_sct
(flag
): Default:False
.
Whether use SCTransform routine to integrate samples or not.
Before the following procedures, theRNA
layer will be split by samples.
If
False
, following procedures will be performed in the order:
*NormalizeData
.
*FindVariableFeatures
.
*ScaleData
.
See https://satijalab.org/seurat/articles/seurat5_integration#layers-in-the-seurat-v5-object and https://satijalab.org/seurat/articles/pbmc3k_tutorial.htmlIf
True
, following procedures will be performed in the order:
*SCTransform
.
See https://satijalab.org/seurat/articles/seurat5_integration#perform-streamlined-one-line-integrative-analysis -
no_integration
(flag
): Default:False
.
Whether to skip integration or not. NormalizeData
(ns
): Arguments forNormalizeData()
.
object
is specified internally, and-
in the key will be replaced with.
.
FindVariableFeatures
(ns
): Arguments forFindVariableFeatures()
.
object
is specified internally, and-
in the key will be replaced with.
.
ScaleData
(ns
): Arguments forScaleData()
.
object
andfeatures
is specified internally, and-
in the key will be replaced with.
.<more>
: See https://satijalab.org/seurat/reference/scaledata
RunPCA
(ns
): Arguments forRunPCA()
.
object
andfeatures
is specified internally, and-
in the key will be replaced with.
.npcs
(type=int
): The number of PCs to compute.
For each sample,npcs
will be no larger than the number of columns - 1.<more>
: See https://satijalab.org/seurat/reference/runpca
SCTransform
(ns
): Arguments forSCTransform()
.
object
is specified internally, and-
in the key will be replaced with.
.return-only-var-genes
: Whether to return only variable genes.min_cells
: The minimum number of cells that a gene must be expressed in to be kept.
A hidden argument ofSCTransform
to filter genes.
If you try to keep all genes in theRNA
assay, you can setmin_cells
to0
andreturn-only-var-genes
toFalse
.
See https://github.com/satijalab/seurat/issues/3598#issuecomment-715505537<more>
: See https://satijalab.org/seurat/reference/sctransformreturn-only-var-genes
: Default:True
.min_cells
: Default:5
.
IntegrateLayers
(ns
): Arguments forIntegrateLayers()
.
object
is specified internally, and-
in the key will be replaced with.
.
Whenuse_sct
isTrue
,normalization-method
defaults toSCT
.method
(choice
): Default:harmony
.
The method to use for integration.CCAIntegration
: UseSeurat::CCAIntegration
.CCA
: Same asCCAIntegration
.cca
: Same asCCAIntegration
.RPCAIntegration
: UseSeurat::RPCAIntegration
.RPCA
: Same asRPCAIntegration
.rpca
: Same asRPCAIntegration
.HarmonyIntegration
: UseSeurat::HarmonyIntegration
.Harmony
: Same asHarmonyIntegration
.harmony
: Same asHarmonyIntegration
.FastMNNIntegration
: UseSeurat::FastMNNIntegration
.FastMNN
: Same asFastMNNIntegration
.fastmnn
: Same asFastMNNIntegration
.scVIIntegration
: UseSeurat::scVIIntegration
.scVI
: Same asscVIIntegration
.scvi
: Same asscVIIntegration
.
<more>
: See https://satijalab.org/seurat/reference/integratelayers
doublet_detector
(choice
): Default:none
.
The doublet detector to use.none
: Do not use any doublet detector.DoubletFinder
: UseDoubletFinder
to detect doublets.doubletfinder
: Same asDoubletFinder
.scDblFinder
: UsescDblFinder
to detect doublets.scdblfinder
: Same asscDblFinder
.
DoubletFinder
(ns
): Arguments to runDoubletFinder
.
See also https://demultiplexing-doublet-detecting-docs.readthedocs.io/en/latest/DoubletFinder.html.PCs
(type=int
): Default:10
.
Number of PCs to use for 'doubletFinder' function.doublets
(type=float
): Default:0.075
.
Number of expected doublets as a proportion of the pool size.pN
(type=float
): Default:0.25
.
Number of doublets to simulate as a proportion of the pool size.ncores
(type=int
): Default:1
.
Number of cores to use forDoubletFinder::paramSweep
.
Set toNone
to useenvs.ncores
.
Since parallelization of the function usually exhausts memory, if bigenvs.ncores
does not work forDoubletFinder
, set this to a smaller number.
scDblFinder
(ns
): Arguments to runscDblFinder
.dbr
(type=float
): Default:0.075
.
The expected doublet rate.ncores
(type=int
): Default:1
.
Number of cores to use forscDblFinder
.
Set toNone
to useenvs.ncores
.<more>
: See https://rdrr.io/bioc/scDblFinder/man/scDblFinder.html.
cache
(type=auto
): Default:/tmp
.
Whether to cache the information at different steps.
IfTrue
, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning.
The cached seurat object will be saved as<signature>.<kind>.RDS
file, where<signature>
is the signature determined by the input and envs of the process.
See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues.
To not use the cached seurat object, you can either setcache
toFalse
or delete the cached file at<signature>.RDS
in the cache directory.
Metadata¶
Here is the demonstration of basic metadata for the Seurat
object. Future
processes will use it and/or add more metadata to the Seurat
object.