FAQ

FAQ¶

immunopipe command not found?

Please make sure if you have installed immunopipe on the right python. If you have used pip to install immunopipe, make sure the pip is associated with the right python.

You may try /path/to/python -m pip install -U immunopipe to ensure immunopipe is installed with the python you wanted.

If immunopipe still can't be found from command line, try /path/to/python -m immunopipe.

Why I am getting "Error writing to connection: No space left on device" while running ImmunarchLoading?

If you are running the pipeline and it complains about "No space left on device" at ImmunarchLoading, and you are pretty sure that your working directory is way from full, it is likely that your temporary directory does not have enough space. This is because that the ImmunarchLoading process will create a temporary directory to store the intermediate files, and the default temporary directory is /tmp. Make sure that you have enough space in /tmp or you can change the temporary directory by setting the environment variable of the process: envs.tmpdir.

It is also likely that you are running the pipeline in a docker container and the docker container does not have enough space in /tmp. In such case, you can try to run the pipeline with the -v option of docker to local directory to /tmp in the container. For example:

docker run --rm -w /workdir -v .:/workdir -v path/to/tmp:/tmp \
#                                         ^^^^^^^^^^^^^^^^^^^
    justold/immunopipe:<tag> @config.toml

If you are using singularity/apptainer, you can try to use the -B option to bind the local directory to /tmp in the container.

SingularityApptainer

singularity run \
    --pwd /workdir -B .:/workdir -c -e --writable-tmpfs \
    -B path/to/tmp:/tmp \
  # ^^^^^^^^^^^^^^^^^^^
    docker://justold/immunopipe:<tag> \
    @config.toml

apptainer run \
    --pwd /workdir -B .:/workdir -c -e --unsquash --writable-tmpfs \
    -B path/to/tmp:/tmp \
  # ^^^^^^^^^^^^^^^^^^^
    docker://justold/immunopipe:<tag> \
    @config.toml

Why does the pipeline stop at SeuratClusteringOfAllCells and family without a clear error message?

This is likely because that the pipeline is running out of memory. The SeuratClusteringOfAllCells and family processes (e.g. SeuratClustering) will run the a series of Seurat functions to perform the clustering, especially the IntegrateData and FindIntegrationAnchors functions, and IntegrateLayers with Seurat v5.

Please see the following issues for more details:

Also check out the tips by the Seurat team:

https://satijalab.org/seurat/articles/integration_large_datasets

Two possible solutions are:

Use reduction = "rpca" for FindIntegrationAnchors under [SeuratClusteringOfAllCells.envs.FindIntegrationAnchors].
Use Reference-based integration reference = [1, 2] for FindIntegrationAnchors under [SeuratClusteringOfAllCells.envs.FindIntegrationAnchors].
For Seurat v5, use corresponding parameters for IntegrateLayers.

Tip

You can also pass a list of sample names instead of the sample indices. For example, reference = ["sample1", "sample2"] under [SeuratPreparing.envs.IntegrateLayers] to use sample1 and sample2 as the reference samples.

See also description about IntegrateLayers here.

Can I run one of the processes from the pipeline separately if I have the input files prepared?

Only for some of the processes. immunopipe depends on biopipen. Most of the processes in immunopipe are subclasses of processes in biopipen. You can run the processes in biopipen separately by:

pipen run scrna SeuratClustering [options]

Note that only the processes from biopipen can be run separately. The processes in immunopipe are not designed to be run separately. For example, the SeuratClusteringOfAllCells process in immunopipe is a subclass of the SeuratClustering process in biopipen. It's specialized for the immunopipe pipeline. If you want to run a similar process separately, you should use the SeuratClustering process in biopipen instead.

Like immunopipe, you can also either provide a configuration file:

pipen run scrna SeuratClustering @config.toml

or specify the options in the command line:

pipen run scrna SeuratClustering --in.srtobj path/to/srtobj.RDS ...

You can also use the -h/--help option to see the brief options of the process, or use -h+/--help+ to see the full options of the process.

How to run the pipeline on a cluster?

To run the pipeline on a cluster, it's recommended to install the pipeline locally so that the cluster nodes can access the pipeline.

immunopipe is built on top of pipen and xqute. A set of schedulers are supported by default. These schedulers are:

local: Run the pipeline locally.
slurm: Run the pipeline on a slurm cluster.
sge: Run the pipeline on a sge cluster.
ssh: Run the pipeline on a remote host via ssh.

The scheduler can be specified via scheduler_opts for the whole pipeline or for a specific process. For example, to run the whole pipeline on a slurm cluster, you can use the following configuration file:

scheduler = "slurm"

[scheduler_opts]
sbatch_partition = "1-day"

To run a specific process on a slurm cluster, you can use the following configuration file:

[<Process>]
scheduler = "slurm"

[<Process>.scheduler_opts]
sbatch_partition = "1-day"

You can also use profiles to switch between different schedulers. See also https://pwwang.github.io/pipen/configurations/#profiles

Unlike the pipeline installed locally, using a doker image to run the pipeline on a cluster, we need to run the whole pipeline as a job. For example, to run the pipeline on a slurm cluster using apptainer, you can use slurm to submit the job:

srun <srun options> \
    apptainer run --pwd /workdir -B /path/to/workdir:/workdir,/tmp -c -e --unsquash --writable-tmpfs \
    -B /path/to/tmp:/tmp \
    docker://justold/immunopipe:<tag> \
    @config.toml

If you are using docker and its alternatives, please also refer to: https://slurm.schedmd.com/containers.html

Do I have to re-run the entire pipeline if I want to change some parameters?

If you want to change some parameters for a specific process, you just modify the configuration file and re-run the pipeline. The pipeline will detect the changes and re-run the necessary processes. For example, if you are changing some environment variables for ScFGSEA, the prior processes, such as the ones for clustering and differential expression analysis, will be cached and will not be re-run.

Why I am getting this error when running with apptainer: FATAL: no SIF writable overlay partition found in /tmp/apptainer_cache_xxx/...?

You may need to add --unsquash option for apptainer run.

How can I use data with soft links while using docker image to run the pipeline?

The container does not have access to the host filesystem directly. You need to mount the directory containing the data to the container.

For example, if your real data is under /path/to/data, you can mount it to /data in the container (using -v /path/to/data:/data option for docker or -B /path/to/data:/data option for singularity or apptainer).

Then you can use /data in the container to access the data under /path/to/data on the host. Also remember to change the path of RNAData and TCRData/BCRData in the file (e.g. samples.txt) that is passed to SampleInfo process.

Other than /data, there are other directories that you can use for mounting inside the container, including /mnt and /tmp, in case your want to mount multiple directories.

Why I am getting disk quota exceeded error while pulling the docker image using apptainer with still plenty of space on the disk?

It's probably because that the cache directory of apptainer is full. You can try to use a different cache directory by setting the environment variable APPTAINER_CACHEDIR to a different directory. For example:

export APPTAINER_CACHEDIR=/path/to/cache
apptainer pull justold/immunopipe:<tag>

Unable to fork: Cannot allocate memory or long vectors not supported yet during clustering using Seurat?

This is likely because that the pipeline is running out of memory. The SeuratClusteringOfAllCells and family processes (e.g. SeuratClustering) will run the a series of Seurat functions to perform the clustering, especially the IntegrateData and FindIntegrationAnchors functions, and IntegrateLayers with Seurat v5.

You can try to set envs.ncores to a smaller number to reduce the memory usage. For example:

[SeuratClusteringOfAllCells.envs]
ncores = 4  # instead of 16

The other strategy is to use Reference-based integration reference = [1, 2] for IntegrateLayers with method rpca or cca. See also description about IntegrateLayers here. For example:

[SeuratPreparing.envs.IntegrateLayers]
method = "rpca"
reference = [1, 2]  # You can also use sample names instead of indices