Analyze paired-chain FASTA data with custom sequence identifiers #1893
-
Hi, Thank you for creating this great tool! I have a FASTA file from a completed study containing heavy and light chain sequences for each cell. The sequence identifiers (sequence header lines) follow the pattern ">numeric-cellid_contig_ig[hk]", where the 'numeric-cellid' are 19 digits that uniquely identify a cell. I'd like to use MiXCR to align and clonotype these sequences in a paired-chain fashion. Specifically, I need guidance on how to use the Sample Tag to analyze this data effectively. To accomplish this, I believe I need to: Is this the correct approach for analyzing paired-chain data from a FASTA file with custom identifiers in the sequence header line? Any guidance or examples for handling this type of data would be greatly appreciated. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
Hi, do the sequences you have cover the full VDJRegion of the receptor? |
Beta Was this translation helpful? Give feedback.
-
Hi, I've been trying different patterns to analyze the data by cell. According to the documentation, I should be able to tag the samples directly from the sequencing read headers using regex. Also, I made a sample_table.tsv that looks like this and ran the following code.
However, it returned this error:
How can I use the regex to extract the cell ID from the FASTA sequence header line? Thanks for helping me! |
Beta Was this translation helpful? Give feedback.
Hi, sorry for the delay. I’ve created a custom preset for you. Unzip the attached YAML file and place it in the ~/.mixcr/presets/ folder, or simply keep it in the directory where you run MiXCR.
To run the preset use:
There’s no need to specify additional parameters, as the pattern is already included.
By default, the species is set to Human, but you can change it using the
--species
parameter if needed.fasta-single-cell-preset.yaml.zip