SeuratPreparing

Load, prepare and apply QC to data, using Seurat

This process will - - Prepare the seurat object - Apply QC to the data - Integrate the data from different samples

See also - https://satijalab.org/seurat/articles/pbmc3k_tutorial.html#standard-pre-processing-workflow-1) - https://satijalab.org/seurat/articles/integration_introduction

This process will read the scRNA-seq data, based on the information provided by SampleInfo, specifically, the paths specified by the RNAData column.
Those paths should be either paths to directoies containing matrix.mtx, barcodes.tsv and features.tsv files that can be loaded by Seurat::Read10X(), or paths to h5 files that can be loaded by Seurat::Read10X_h5().

Each sample will be loaded individually and then merged into one Seurat object, and then perform QC.

In order to perform QC, some additional columns are added to the meta data of the Seurat object. They are:

  • precent.mt: The percentage of mitochondrial genes.
  • percent.ribo: The percentage of ribosomal genes.
  • precent.hb: The percentage of hemoglobin genes.
  • percent.plat: The percentage of platelet genes.

For integration, two routes are available:

Note

When using SCTransform, the default Assay will be set to SCT in output, rather than RNA.
If you are using cca or rpca interation, the default assay will be integrated.

Note

From biopipen v0.23.0, this requires Seurat v5.0.0 or higher.

See also Preparing the input.

Environment Variables

  • ncores (type=int): Default: 1.
    Number of cores to use.
    Used in future::plan(strategy = "multicore", workers = <ncores>) to parallelize some Seurat procedures.
  • cell_qc: Filter expression to filter cells, using tidyrseurat::filter().
    Available QC keys include nFeature_RNA, nCount_RNA, percent.mt, percent.ribo, percent.hb, and percent.plat.

    Example

    Including the columns added above, all available QC keys include nFeature_RNA, nCount_RNA, percent.mt, percent.ribo, percent.hb, and percent.plat. For example:

    [SeuratPreparing.envs]
    cell_qc = "nFeature_RNA > 200 & percent.mt < 5"
    
    will keep cells with more than 200 genes and less than 5%% mitochondrial genes.

  • cell_qc_per_sample (flag): Default: False.
    Whether to perform cell QC per sample or not.
    If True, the cell QC will be performed per sample, and the QC will be applied to each sample before merging.

  • gene_qc (ns): Filter genes.
    gene_qc is applied after cell_qc.

    • min_cells: Default: 0.
      The minimum number of cells that a gene must be expressed in to be kept.
    • excludes: Default: [].
      The genes to exclude. Multiple genes can be specified by comma separated values, or as a list.

      Example

      [SeuratPreparing.envs]
      gene_qc = { min_cells = 3 }
      
      will keep genes that are expressed in at least 3 cells.

      • use_sct (flag): Default: False.
        Whether use SCTransform routine to integrate samples or not.
        Before the following procedures, the RNA layer will be split by samples.

    If False, following procedures will be performed in the order:
    * NormalizeData.
    * FindVariableFeatures.
    * ScaleData.
    See https://satijalab.org/seurat/articles/seurat5_integration#layers-in-the-seurat-v5-object and https://satijalab.org/seurat/articles/pbmc3k_tutorial.html

    If True, following procedures will be performed in the order:
    * SCTransform.
    See https://satijalab.org/seurat/articles/seurat5_integration#perform-streamlined-one-line-integrative-analysis

  • no_integration (flag): Default: False.
    Whether to skip integration or not.

  • NormalizeData (ns): Arguments for NormalizeData().
    object is specified internally, and - in the key will be replaced with ..
  • FindVariableFeatures (ns): Arguments for FindVariableFeatures().
    object is specified internally, and - in the key will be replaced with ..
  • ScaleData (ns): Arguments for ScaleData().
    object and features is specified internally, and - in the key will be replaced with ..
  • RunPCA (ns): Arguments for RunPCA().
    object and features is specified internally, and - in the key will be replaced with ..
  • SCTransform (ns): Arguments for SCTransform().
    object is specified internally, and - in the key will be replaced with ..
  • IntegrateLayers (ns): Arguments for IntegrateLayers().
    object is specified internally, and - in the key will be replaced with ..
    When use_sct is True, normalization-method defaults to SCT.
    • method (choice): Default: harmony.
      The method to use for integration.
      • CCAIntegration: Use Seurat::CCAIntegration.
      • CCA: Same as CCAIntegration.
      • cca: Same as CCAIntegration.
      • RPCAIntegration: Use Seurat::RPCAIntegration.
      • RPCA: Same as RPCAIntegration.
      • rpca: Same as RPCAIntegration.
      • HarmonyIntegration: Use Seurat::HarmonyIntegration.
      • Harmony: Same as HarmonyIntegration.
      • harmony: Same as HarmonyIntegration.
      • FastMNNIntegration: Use Seurat::FastMNNIntegration.
      • FastMNN: Same as FastMNNIntegration.
      • fastmnn: Same as FastMNNIntegration.
      • scVIIntegration: Use Seurat::scVIIntegration.
      • scVI: Same as scVIIntegration.
      • scvi: Same as scVIIntegration.
    • <more>: See https://satijalab.org/seurat/reference/integratelayers
  • doublet_detector (choice): Default: none.
    The doublet detector to use.
    • none: Do not use any doublet detector.
    • DoubletFinder: Use DoubletFinder to detect doublets.
    • doubletfinder: Same as DoubletFinder.
    • scDblFinder: Use scDblFinder to detect doublets.
    • scdblfinder: Same as scDblFinder.
  • DoubletFinder (ns): Arguments to run DoubletFinder.
    See also https://demultiplexing-doublet-detecting-docs.readthedocs.io/en/latest/DoubletFinder.html.
    • PCs (type=int): Default: 10.
      Number of PCs to use for 'doubletFinder' function.
    • doublets (type=float): Default: 0.075.
      Number of expected doublets as a proportion of the pool size.
    • pN (type=float): Default: 0.25.
      Number of doublets to simulate as a proportion of the pool size.
    • ncores (type=int): Default: 1.
      Number of cores to use for DoubletFinder::paramSweep.
      Set to None to use envs.ncores.
      Since parallelization of the function usually exhausts memory, if big envs.ncores does not work for DoubletFinder, set this to a smaller number.
  • scDblFinder (ns): Arguments to run scDblFinder.
  • cache (type=auto): Default: /tmp.
    Whether to cache the information at different steps.
    If True, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning.
    The cached seurat object will be saved as <signature>.<kind>.RDS file, where <signature> is the signature determined by the input and envs of the process.
    See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues.
    To not use the cached seurat object, you can either set cache to False or delete the cached file at <signature>.RDS in the cache directory.

Metadata

Here is the demonstration of basic metadata for the Seurat object. Future processes will use it and/or add more metadata to the Seurat object.

SeuratPreparing-metadata