CDR3Clustering¶
Cluster the TCR/BCR clones by their CDR3 sequences
You can disable this by remving the whole sections of
CDR3Clustering in the config file.
This process is used to cluster TCR/BCR clones based on their CDR3 sequences.
It uses either
Zhang, Hongyi, Xiaowei Zhan, and Bo Li.
"GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation." Nature communications 12.1 (2021): 1-11.
Or ClusTCR
Sebastiaan Valkiers, Max Van Houcke, Kris Laukens, Pieter Meysman, ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity, Bioinformatics, 2021.
Both methods are based on the
Faiss Clustering Library,
for efficient similarity search and clustering of dense vectors, so both methods
yield similar results.
A text file will be generated with the cluster assignments for each cell, together
with the immunarch object (in R) with the cluster assignments at CDR3_Clsuter
column. This information will then be merged to a Seurat object for further
downstream analysis.
The cluster assignments are prefixed with S_ or M_ to indicate whether a
cluster has only one unique CDR3 sequence or multiple CDR3 sequences.
Note that a cluster with S_ prefix may still have multiple cells, as the same
CDR3 sequence may be shared by multiple cells.
Input¶
screpfile: The TCR/BCR data object loaded byscRepertoire::CombineTCR(),scRepertoire::CombineBCR()orscRepertoire::CombineExpression()
Output¶
outfile: Default:{{in.screpfile | stem}}.tcr_clustered.qs.
ThescRepertoireobject in qs with TCR/BCR cluster information.
ColumnCDR3_Clusterwill be added to the metadata.
Environment Variables¶
type(choice): Default:auto.
The type of the data.TCR: T cell receptor dataBCR: B cell receptor dataauto: Automatically detect the type from the data.
Try to find TRB or IGH genes in the CTgene column to determine whether it is TCR or BCR data.
tool(choice): Default:GIANA.
The tool used to do the clustering, either GIANA or ClusTCR.
For GIANA, using TRBV mutations is not supportedGIANA: by Li lab at UT Southwestern Medical CenterClusTCR: by Sebastiaan Valkiers, etc
python: Default:python.
The path of python withGIANA's dependencies installed or withclusTCRinstalled. Depending on thetoolyou choose.within_sample(flag): Default:True.
Whether to cluster the TCR/BCR clones within each sample.
Whenin.screpfileis aSeuratobject, the samples are marked by theSamplecolumn in the metadata.args(type=json): Default:{}.
The arguments for the clustering tool For GIANA, they will be passed topython GIAna.pySee https://github.com/s175573/GIANA#usage.
For ClusTCR, they will be passed toclustcr.Clustering(...)See https://svalkiers.github.io/clusTCR/docs/clustering/how-to-use.html#clustering.chain(choice): Default:both.
The TCR/BCR chain to use for clustering.heavy: The heavy chain, TRB for TCR, IGH for BCR.
For TCR, TRB is the second sequence inCTaa, separated by_if input is a Seurat object; otherwise, it is extracted from thecdr3_aa2column.
For BCR, IGH is the first sequence inCTaa, separated by_if input is a Seurat object; otherwise, it is extracted from thecdr3_aa1column.light: The light chain, TRA for TCR, IGL/IGK for BCR.
For TCR, TRA is the first sequence inCTaa, separated by_if input is a Seurat object; otherwise, it is extracted from thecdr3_aa1column.
For BCR, IGL/IGK is the second sequence inCTaa, separated by_if input is a Seurat object; otherwise, it is extracted from thecdr3_aa2column.TRA: Only the TRA chain for TCR (light chain).TRB: Only the TRB chain for TCR (heavy chain).IGH: Only the IGH chain for BCR (heavy chain).IGLK: Only the IGL/IGK chain for BCR (light chain).both: Both sequences from the heavy and light chains (CTaa column).