Getting started

You can find the nessary files and source code of this tutorial in the example repository.

In this tutorial we will show you how to run the immunopipe pipeline on a small dataset of 6 patients from 3 groups: colitis (n=2), non-colitis(n=2) and control(n=2). The dataset is part of the data used in the publication below:

We are using a small subset of the data to make the tutorial run faster. The full dataset can be downloaded from Gene Expression Omnibus (GEO) GSE144469.

Download and prepare the data

The data can be downloaded and prepared by running the following commands:

# Clone the example repository
git clone https://github.com/pwwang/immunopipe-example.git

# Enter the example directory
cd immunopipe-example

# Download and prepare the data
bash prepare-data.sh
# The data from GSE144469 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE144469)
# will be downloaded and extracted into:
#
#   ./prepared-data/C1
#   ./prepared-data/C2
#   ...
#

You may also check other files in the data/ directory, especially the samples.txt file, which contains the sample information for the dataset we prepared above.

Prepare the configuration file

To run the pipeline, we need to prepare a configuration file (recommended) or pass the arguments directly via command line. Here we will use the configuration file. See also Configurations for more details.

As explained in the Configurations page, we can provide a configuration file with a minimal set of configuration items to get the pipeline running. The only required configuration item is the input file for the SampleInfo process. However, here we want to give the pipeline a different name and output directory to distinguish it from other runs with a different set of configurations.

The configuration file shall be in the TOML format. We can create a file named ImmunopipeMinimal.config.toml with the following content:

name = "ImmunopipeMinimal"
outdir = "minimal"

[SampleInfo.in]
infile = [ "data/samples.txt" ]

Run the pipeline

The easiest way to run the pipeline is to run it within the docker container. We can use the following command to run the pipeline with the configuration file we just created:

docker run \
    --rm -w /workdir -v .:/workdir \
    justold/immunopipe:master \
    @ImmunopipeMinimal.config.toml
singularity run \
    --pwd /workdir -B .:/workdir,/tmp -c -e --writable-tmpfs \
    docker://justold/immunopipe:master \
    @ImmunopipeMinimal.config.toml
apptainer run \
    --pwd /workdir -B .:/workdir,/tmp -c -e --unsquash --writable-tmpfs \
    docker://justold/immunopipe:master \
    @ImmunopipeMinimal.config.toml

Tip

docker, singularity and apptainer commands map the current directory (.) to the /workdir directory in the container. To get the detailed directory structure in the container, please refer to the The directory structure in the container.

Tip

If you want to install and run the pipeline without docker, please refer to the Installation and Running the pipeline pages for more details.

Note

You need at least 16GB of memory to run the pipeline with the example dataset and minimal configuration.

You may also need to decrease ncores of some processes to avoid running out of memory. For example:

[SeuratClusteringOfAllCells.envs]
- ncores = 8
+ ncores = 4

Check the results

With that "minimal" configuration file, only a subset of the processes will be run. See also Enabling/Disabling processes. The results will be saved in the minimal directory. You can also check the reports at minimal/REPORTS/index.html with a web browser.

You can also visit the following link to see the reports of the pipeline we just ran:

http://imp.pwwang.com/minimal/REPORTS/index.html

Next steps

You may read through this documentation to learn more about the pipeline and how to configure it. There is also a configuration file, named Immunopipe.config.toml in the example repository, with more processes enabled. You can use it to run the pipeline with the dataset prepared above. Check out the following link for the reports:

http://imp.pwwang.com/output/REPORTS/index.html

Note

The results provided by this example configuration files are for demonstration purpose only. They are not intended to be used for any scientific analysis.

You may also want to try other routes of the pipeline with the prepared data. These routes are defined in:

  • ImmunopipeMinimalNoTCR.config.toml: The configuration for minimal analyses without scTCR-seq data.
  • ImmunopipeMinimalSupervised.config.toml: The configuration for minimal analyses with supervised clustering of T cells.
  • ImmunopipeNoTCR.config.toml: The configuration for full analyses without scTCR-seq data.
  • ImmunopipeWSNoTCR.config.toml: The configuration for full analyses without scTCR-seq data, but with selection of T cells.
  • ImmunopipeSupervised.config.toml: The configuration for full analyses with supervised clustering of T cells.

Also check out the gallery for more real-world examples.