Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBA phasing problems #29

Open
mnfrandre opened this issue Jan 9, 2025 · 9 comments
Open

HBA phasing problems #29

mnfrandre opened this issue Jan 9, 2025 · 9 comments

Comments

@mnfrandre
Copy link

Dear Xiao Chen,

Thanks for your good phasing tools!
I was using Paraphase 3.1.1 to call genotypes for HBA genes. Alpha3.7 deletion and anti3.7 genotypes performed well (with hba_del and hba_dup repectively), However genotypes for 4.2 deletion and duplication could not be called. Here is my IGV screenshot:
CTFR_test2_HBA

Apparently, this sample has a HBA2 deletion(4.2 deletion) with breakpoints supported. Any ideas for this problem?

Thank you very much!

@mnfrandre
Copy link
Author

Also, attached is my hg39 mapped bam file.
hba4.2del_test.zip

@xiao-chen-xc
Copy link
Collaborator

Hi @mnfrandre Paraphase is currently not designed to pick up the 4.2 deletion. I'll see if it's possible to add it into Paraphase. Would you mind re-uploading your bam file? I downloaded it but was not able to unzip it.

@mnfrandre
Copy link
Author

hba4.2del_test.hg38.zip

@xiao-chen-xc Many thanks!

@xiao-chen-xc
Copy link
Collaborator

This data is very useful. Thanks @mnfrandre! I have added 4.2 deletion and duplication to HBA calling in Paraphase. This will be available in Paraphase Version 3.2, which I plan to release in about 7-10 days.

@mnfrandre
Copy link
Author

@xiao-chen-xc Good news! Thanks for your job!

@xiao-chen-xc
Copy link
Collaborator

Hi @mnfrandre Version 3.2 is released. Please try it and let me know if anything can be improved.

@mnfrandre
Copy link
Author

Hi @xiao-chen-xc

Sorry for a late response. I tried paraphase(v3.2.1) with my samples. Now I can gentype 4.2 deletion and duplication, even with combined genotypes! Here is a sample with genotype of -α3.7/αααanti4.2.

Image

However, I have a new question about allele links for version 3.2. I got results of "alleles_final" in json output for v3.1, but found empty values for v3.2.1. Any helps to figure out "alleles_final" ?

The attached file is the test data above.
alpha3.7del4.2dup.zip

@xiao-chen-xc
Copy link
Collaborator

Hi @mnfrandre, there is a check in v3.2.1 that if all haplotypes are phased onto one allele, the allele call will be empty (because we expect two alleles). This usually happens when one haplotype is present at two identical two copies and it links all other haplotypes into one allele. In your case it's the homologyhap1 - it has twice the depth as other haplotypes, indicating that it's on both chromosomes. For the hba region, this could unfortunately be quite common as we are phasing very short regions - not enough SNPs to differentiate both alleles.

I think the fix to this is to omit homology haplotypes during the step to phase alleles (homology haplotypes are not biologically meaningful anyway). I implemented that and got "alleles_final": [ [ "hba_3p7delhap1", "hba_4p2duphap1" ], [ "hba_hba2hap1", "hba_hba1hap1" ] ]. This is really cool. I had thought that the 3p7 deletion and the 4p2 duplication are on different alleles but they turn out to be in cis. So the genotype in this sample should be aa/aa instead. I'll add the code changes to the next release.

In the meantime, it's possible to figure out the alleles by looking at the hap_links field of the JSON. It tells you which haplotypes are linked to which haplotypes by reads.

Lastly, thank you for pointing me to these wonderful data. I have not had access to 4p2 variants before. It's fascinating to see these different SV configurations. Please definitely let me know if you run into any other problems. I'm sure there exist SV configurations that the I have not considered.

@mnfrandre
Copy link
Author

Hi @xiao-chen-xc Thanks for your detailed explanation! Very excting results!
Yes, the above sample is very rare but a real world case. I have encountered several complex genotypes at HBA regions, which could be hard to haplotyping. Also, the 3p7 deletion and 3p7 duplication, seemed as mutually exclusive genotypes , could be on the same allele in certain cases. So it is really helpful if paraphase could figure out the real alleles.

Again, thanks for your works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants