CellRanger pipeline
Including two pipelines: CellRangerCountPipeline
and CellRangerVdjPipeline
.
Pipeline overview
Each pipeline contains two processes.
CellRangerCount
/CellRangerVdj
: Run cellranger count/vdj on each sample.CellRangerSummary
: Summarize the results from each sample.
Input files
Each sample should have a set of fastq files, in the format of:
[
# fastq files for sample 1
[
"sample1_S1_L001_R1_001.fastq.gz",
"sample1_S1_L001_R2_001.fastq.gz",
"sample1_S1_L002_R1_001.fastq.gz",
"sample1_S1_L002_R2_001.fastq.gz",
],
# fastq files for sample 2
[
"sample2_S1_L001_R1_001.fastq.gz",
"sample2_S1_L001_R2_001.fastq.gz",
"sample2_S1_L002_R1_001.fastq.gz",
"sample2_S1_L002_R2_001.fastq.gz",
],
...
]
If the ids cannot be inferred from the fastq file names, or you want to use a different id than the inferred one, you can specify the ids in the input:
input = [
# fastq files for sample 1
[
"sample1_S1_L001_R1_001.fastq.gz",
"sample1_S1_L001_R2_001.fastq.gz",
"sample1_S1_L002_R1_001.fastq.gz",
"sample1_S1_L002_R2_001.fastq.gz",
],
# fastq files for sample 2
[
"sample2_S1_L001_R1_001.fastq.gz",
"sample2_S1_L001_R2_001.fastq.gz",
"sample2_S1_L002_R1_001.fastq.gz",
"sample2_S1_L002_R2_001.fastq.gz",
],
...
]
ids = [
"sampleA",
"sampleB",
...
]
Configurations
input
: The input fastq files for each sample. SeeInput files
for details.ids
: The ids for each sample. SeeInput files
for details.
Other than the input, you should provide other configurations to the processes to each individual process. Check the documentation of each process for more details.
Reference
To run the pipeline, you need to provide the reference genome for the cellranger pipeline. You can provide the reference genome in the configuration:
[CellRangerCount.envs]
ref = "/path/to/reference"
To obtain the reference genome, please refer to:
- https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-inputs-overview#count
- https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-inputs-overview#vdj
You may also make your own reference by cellranger mkref
for gene expression. See the cellranger documentation:
- https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/inputs/cr-3p-references
Also check out docker/cellranger_pipeline/make-examples.sh to see how the references are prepared.
Docker image
You can use the docker image [biopipen/cellranger-pipeline
] to run the pipeline. The image contains
the cellranger software and the biopipen package. It is also built with an example dataset for you to
test the pipeline:
/example/example-data/Sample_X_S1_L001_R1_001.fastq.gz
/example/example-data/Sample_X_S1_L001_R2_001.fastq.gz
/example/example-data/Sample_Y_S1_L001_R1_001.fastq.gz
/example/example-data/Sample_Y_S1_L001_R2_001.fastq.gz
/example/example-data/Sample_Y_S1_L002_R1_001.fastq.gz
/example/example-data/Sample_Y_S1_L002_R2_001.fastq.gz
A sample configuration file is also provided at /biopipen/docker/cellranger_pipeline/CellrangerCountPipeline.config.toml
.
Note that the docker image was not built with the reference genome. You need to provide the reference genome by yourself.