Single Cell TCR Presets and Tag Pattern Set Up. #1336
-
Hello, I am utilizing a scTCR protocol where it uses dual indexing to barcode each cells. Basically, the protocol states, "the amplification reaction contains flanking rhPCR primers (Supplementary Table 2) that incorporate index sequences into the final amplification products and append the P5 and P7 sequences needed for Illumina-based sequencing. By using distinct index sequences for different single-cell libraries, a single PCR step specifically amplifies CDR3 segments and introduces barcodes for each sample. The 768 primers listed in Supplementary Table 2 enable dual indexing for pools of up to 384 samples." The amplicon resembles read structure demonstrated in the generic amplicon preset guides. In this case would it suffice to present just the index file, I1 and I2 without the tag pattern? |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
Hi, does this protocol imply that you have a distinct pair of FASTQ files for every sorted cell? |
Beta Was this translation helpful? Give feedback.
-
Yes I believe that is the logic behind the protocol. |
Beta Was this translation helpful? Give feedback.
-
If you have multiple pairs of FASTQ files representing single-cell data, you can use the following preset: mixcr analyze generic-lt-single-cell-amplicon \
--species hsa \
--rna \
--floating-left-alignment-boundary \
--floating-right-alignment-boundary C \
input_sample1_{{CELL:a}}_{{R}}.fastq.gz \
result Note that in the input file pattern
For example, here is a list of input filenames that will be aggregated by the pattern above: > ls
input_sample1_A1_R1.fastq.gz
input_sample1_A1_R2.fastq.gz
input_sample1_A2_R1.fastq.gz
input_sample1_A2_R2.fastq.gz
input_sample1_A3_R1.fastq.gz
input_sample1_A3_R2.fastq.gz
input_sample1_A4_R1.fastq.gz
input_sample1_A4_R2.fastq.gz
... A1, A2, A3, and A4 will be used as cell barcodes. Again, if you can share some data obtained using the protocol from this publication, I can create and test a dedicated preset for this data. I tried to inquire about the raw data from the authors a few months ago, but I was unable to get a reply. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Hi, If you don't see certain chains in some of the wells, this issue requires a deeper investigation. It's quite common with single-cell data for things not to go according to plan. Sometimes chains do not amplify even though the primers were added, and sometimes cross-well contamination during sorting or PCR leads to multiple chains being identified in a single cell. Additionally, some cells do biologically express two different TRA chains. The preset mentioned above utilizes filtering in a way that retains only those clones whose cumulative frequency is 95% or more for every cell and each chain (TRAD/TRB/TRG/IGH/IGL), as measured by the number of reads containing 'CDR3'. However, this shouldn't lead to the disappearance of all clones belonging to one type of chain. Since we are awaiting your data, I will investigate these issues manually and tailor the preset for optimal performance, addressing each protocol's specific issues. To answer your questions:
As soon as we receive your data, I will prepare the preset for you and add it to the development version. Also, please share the file names that raise your concern (where you were able to identify both chains with the separated FASTQ files approach but not with the new MiXCR preset), so I will pay close attention to them. Sincerely, |
Beta Was this translation helpful? Give feedback.
-
Hi, The command below produces reliable results:
Below is a table displaying the number of aligned reads per chain for cells where only one of the chains (TRA or TRB) has been assembled. Open table
From this data, we can observe that cells missing one of the chains tend to have a low number of reads for these chains. Sometimes, if the reads are of good quality and consistent in their clone sequence, MiXCR manages to recover a clone. Also, if we see a significant number of reads for one chain from the same cell and only a few for the other, it suggests that the latter might not be trustworthy. Given the protocol pipeline, we would typically expect similar coverage for both chains in every cell. An exception is the cell 003-1_S193. It has a significant number of reads for both TRA and TRB. However, a detailed examination of the TRA clone reveals the absence of a complete CDR3 sequence because the end of the V gene (CAVR) is trimmed, so no cysteine is present. By adjusting certain parameters, we can potentially lower the quality thresholds to capture more clones (e.g., for 004-7_S247 where the number of reads for TRA is significant).
Similar table will look like that:
I believe that these results are reasonable for these cells. While it's feasible to further tweak the parameters, it may introduce more noise into the data. Please let me know if you have any other questions. Sincerely, |
Beta Was this translation helpful? Give feedback.
Hi,
First of all, we have improved the consensus algorithm. To be more precise, it now better handles this protocol. Please use the latest development version, which you can download from here.
The command below produces reliable results:
Below is a table displaying the number of aligned reads per chain for cells where only one of the chains (TRA or TRB) has been assembled.
Open table