Pipseeker pipeline
Pipeline overview
Each pipeline contains two processes.
PipseekerFull: Run pipseeker full on each sample.PipseekerSummary: Summarize the results from each sample.
Input files
Each sample should have a set of fastq files, in the format of:
[
# fastq files for sample 1
[
"sample1_R1.fastq.gz",
"sample1_R2.fastq.gz",
],
# fastq files for sample 2
[
"sample2_R1.fastq.gz",
"sample2_R2.fastq.gz",
],
...
]
If the ids cannot be inferred from the fastq file names, or you want to use a different id than the inferred one, you can specify the ids in the input:
input = [
# fastq files for sample 1
[
"sample1_R1.fastq.gz",
"sample1_R2.fastq.gz",
],
# fastq files for sample 2
[
"sample2_R1.fastq.gz",
"sample2_R2.fastq.gz",
],
...
]
ids = [
"sampleA",
"sampleB",
...
]
Configurations
input: The input fastq files for each sample. SeeInput filesfor details.ids: The ids for each sample. SeeInput filesfor details.
Other than the input, you should provide other configurations to the processes to each individual process. Check the documentation of each process for more details.
Reference
To run the pipeline, you need to provide the reference genome for the pipseeker pipeline. You can provide the reference genome in the configuration:
[PipseekerFull.envs]
ref = "/path/to/reference"
To obtain the reference genome, please refer to:
You can also provide cell type annotation reference:
Docker image
You can use the docker image biopipen/pipseeker-pipeline to run the pipeline. The image contains
the pipseeker software and the biopipen package. It is also built with an example dataset for you to
test the pipeline:
/example/example-data/Sample1_Sub_R1.fastq.gz
/example/example-data/Sample1_Sub_R2.fastq.gz
/example/example-data/Sample2_Sub_R1.fastq.gz
/example/example-data/Sample2_Sub_R2.fastq.gz
A reference for test is also provided at /example/example-data/STAR_test_index.
A sample configuration file is also provided at /biopipen/docker/pipseeker_pipeline/PipseekerPipeline.config.toml.
Note that the docker image was not built with the reference genome. You need to provide the reference genome by yourself.