SeuratPreparing

SeuratPreparing¶

Load, prepare and apply QC to data, using Seurat

This process will - - Prepare the seurat object - Apply QC to the data - Integrate the data from different samples

This process will read the scRNA-seq data, based on the information provided by SampleInfo, specifically, the paths specified by the RNAData column.
Those paths should be either paths to directoies containing matrix.mtx, barcodes.tsv and features.tsv files that can be loaded by Seurat::Read10X(), or paths to h5 files that can be loaded by Seurat::Read10X_h5().

Each sample will be loaded individually and then merged into one Seurat object, and then perform QC.

In order to perform QC, some additional columns are added to the meta data of the Seurat object. They are:

precent.mt: The percentage of mitochondrial genes.
percent.ribo: The percentage of ribosomal genes.
precent.hb: The percentage of hemoglobin genes.
percent.plat: The percentage of platelet genes.

For integration, two routes are available:

Note

When using SCTransform, the default Assay will be set to SCT in output, rather than RNA.
If you are using cca or rpca interation, the default assay will be integrated.

Note

From biopipen v0.23.0, this requires Seurat v5.0.0 or higher.

Input¶

metafile: The metadata of the samples A tab-delimited file Two columns are required:
Sample to specify the sample names.
RNAData to assign the path of the data to the samples The path will be read by Read10X() from Seurat, or the path to the h5 file that can be read by Read10X_h5() from Seurat.

Output¶

rdsfile: Default: {{in.metafile | stem}}.seurat.RDS.
The RDS file with the Seurat object with all samples integrated.
Note that the cell ids are preficed with sample names QC plots will be saved in <job.outdir>/before-qc and <job.outdir>/after-qc.

Environment Variables¶

ncores (type=int): Default: 1.
Number of cores to use.
Used in future::plan(strategy = "multicore", workers = <ncores>) to parallelize some Seurat procedures.
cell_qc: Filter expression to filter cells, using tidyrseurat::filter().
Available QC keys include nFeature_RNA, nCount_RNA, percent.mt, percent.ribo, percent.hb, and percent.plat.
Example

Including the columns added above, all available QC keys include nFeature_RNA, nCount_RNA, percent.mt, percent.ribo, percent.hb, and percent.plat. For example:
```
[SeuratPreparing.envs]
cell_qc = "nFeature_RNA > 200 & percent.mt < 5"
```
will keep cells with more than 200 genes and less than 5%% mitochondrial genes.
cell_qc_per_sample (flag): Default: False.
Whether to perform cell QC per sample or not.
If True, the cell QC will be performed per sample, and the QC will be applied to each sample before merging.
gene_qc (ns): Filter genes.
gene_qc is applied after cell_qc.
- min_cells: Default: 0.
  The minimum number of cells that a gene must be expressed in to be kept.
- excludes: Default: [].
  The genes to exclude. Multiple genes can be specified by comma separated values, or as a list.
  Example
```
[SeuratPreparing.envs]
gene_qc = { min_cells = 3 }
```
  will keep genes that are expressed in at least 3 cells.
  - use_sct (flag): Default: False.
    Whether use SCTransform routine to integrate samples or not.
    Before the following procedures, the RNA layer will be split by samples.
If False, following procedures will be performed in the order:
* NormalizeData.
* FindVariableFeatures.
* ScaleData.
See https://satijalab.org/seurat/articles/seurat5_integration#layers-in-the-seurat-v5-object and https://satijalab.org/seurat/articles/pbmc3k_tutorial.html

If True, following procedures will be performed in the order:
* SCTransform.
See https://satijalab.org/seurat/articles/seurat5_integration#perform-streamlined-one-line-integrative-analysis
no_integration (flag): Default: False.
Whether to skip integration or not.
NormalizeData (ns): Arguments for NormalizeData().
object is specified internally, and - in the key will be replaced with ..
- <more>: See https://satijalab.org/seurat/reference/normalizedata
FindVariableFeatures (ns): Arguments for FindVariableFeatures().
object is specified internally, and - in the key will be replaced with ..
- <more>: See https://satijalab.org/seurat/reference/findvariablefeatures
ScaleData (ns): Arguments for ScaleData().
object and features is specified internally, and - in the key will be replaced with ..
- <more>: See https://satijalab.org/seurat/reference/scaledata
RunPCA (ns): Arguments for RunPCA().
object and features is specified internally, and - in the key will be replaced with ..
- npcs (type=int): The number of PCs to compute.
  For each sample, npcs will be no larger than the number of columns - 1.
- <more>: See https://satijalab.org/seurat/reference/runpca
SCTransform (ns): Arguments for SCTransform().
object is specified internally, and - in the key will be replaced with ..
- return-only-var-genes: Whether to return only variable genes.
- min_cells: The minimum number of cells that a gene must be expressed in to be kept.
  A hidden argument of SCTransform to filter genes.
  If you try to keep all genes in the RNA assay, you can set min_cells to 0 and return-only-var-genes to False.
  See https://github.com/satijalab/seurat/issues/3598#issuecomment-715505537
- <more>: See https://satijalab.org/seurat/reference/sctransform
- return-only-var-genes: Default: True.
- min_cells: Default: 5.
IntegrateLayers (ns): Arguments for IntegrateLayers().
object is specified internally, and - in the key will be replaced with ..
When use_sct is True, normalization-method defaults to SCT.
- method (choice): Default: harmony.
  The method to use for integration.
  - CCAIntegration: Use Seurat::CCAIntegration.
  - CCA: Same as CCAIntegration.
  - cca: Same as CCAIntegration.
  - RPCAIntegration: Use Seurat::RPCAIntegration.
  - RPCA: Same as RPCAIntegration.
  - rpca: Same as RPCAIntegration.
  - HarmonyIntegration: Use Seurat::HarmonyIntegration.
  - Harmony: Same as HarmonyIntegration.
  - harmony: Same as HarmonyIntegration.
  - FastMNNIntegration: Use Seurat::FastMNNIntegration.
  - FastMNN: Same as FastMNNIntegration.
  - fastmnn: Same as FastMNNIntegration.
  - scVIIntegration: Use Seurat::scVIIntegration.
  - scVI: Same as scVIIntegration.
  - scvi: Same as scVIIntegration.
- <more>: See https://satijalab.org/seurat/reference/integratelayers
doublet_detector (choice): Default: none.
The doublet detector to use.
- none: Do not use any doublet detector.
- DoubletFinder: Use DoubletFinder to detect doublets.
- doubletfinder: Same as DoubletFinder.
- scDblFinder: Use scDblFinder to detect doublets.
- scdblfinder: Same as scDblFinder.
DoubletFinder (ns): Arguments to run DoubletFinder.
See also https://demultiplexing-doublet-detecting-docs.readthedocs.io/en/latest/DoubletFinder.html.
- PCs (type=int): Default: 10.
  Number of PCs to use for 'doubletFinder' function.
- doublets (type=float): Default: 0.075.
  Number of expected doublets as a proportion of the pool size.
- pN (type=float): Default: 0.25.
  Number of doublets to simulate as a proportion of the pool size.
- ncores (type=int): Default: 1.
  Number of cores to use for DoubletFinder::paramSweep.
  Set to None to use envs.ncores.
  Since parallelization of the function usually exhausts memory, if big envs.ncores does not work for DoubletFinder, set this to a smaller number.
scDblFinder (ns): Arguments to run scDblFinder.
- dbr (type=float): Default: 0.075.
  The expected doublet rate.
- ncores (type=int): Default: 1.
  Number of cores to use for scDblFinder.
  Set to None to use envs.ncores.
- <more>: See https://rdrr.io/bioc/scDblFinder/man/scDblFinder.html.
cache (type=auto): Default: /tmp.
Whether to cache the information at different steps.
If True, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning.
The cached seurat object will be saved as <signature>.<kind>.RDS file, where <signature> is the signature determined by the input and envs of the process.
See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues.
To not use the cached seurat object, you can either set cache to False or delete the cached file at <signature>.RDS in the cache directory.

Metadata¶

Here is the demonstration of basic metadata for the Seurat object. Future processes will use it and/or add more metadata to the Seurat object.

SeuratPreparing-metadata