Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancies in Ancestry Dosage Estimates and SNP Count Between elai-mt (v1.21) and elai (v0.99) #3

Open
Caoyu819 opened this issue Feb 20, 2025 · 6 comments

Comments

@Caoyu819
Copy link

Hi,

Thank you for taking the time to read my message.

I have been inferring the ancestry dosage of my plant using elai-mt (v1.21) recently and have compared the results with the analysis from an earlier version of elai (v0.99).

However, I have observed some discrepancies between the two versions. Specifically, the estimated ancestry dosage for each SNP differs between the two versions (with elai-mt generally producing smaller estimates than elai), and the number of SNPs reported in the sampleID.snpinfo.txt file also varies. In general, elai-mt reports more SNPs than elai.

Here are the commands I used for both analyses:

For elai-mt (v1.21):

singularity exec -B $ROOT:/opt/ $IMAGE/debian/debian11.sif /opt/bin/elai-mt \ -g $dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g $dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g $dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi \ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 -nthreads 2 \ -exclude-nopos -exclude-miss 0 -exclude-maf 0.05 -exclude-miss1

For elai (v0.99):

/public/home/maize/software/elai/src/elai -g $dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g $dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g $dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi \ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 \ -exclude-nopos -exclude-miss 0 -exclude-maf 0.05 -exclude-miss1
I have attached the output files from both versions for your reference.

Additionally, I noticed that the maf column in the sampleID.snpinfo.txt file generated by elai-mt contains only three values (0, 0.5, or 1), which doesn't seem to make sense to me. Could this be due to something I missed in the parameter settings?

I would greatly appreciate any suggestions or guidance you could offer.

Thank you for your consideration.

Best wishes,
Yu

sent2github.zip

@haplotype
Copy link
Owner

haplotype commented Feb 20, 2025 via email

@Caoyu819
Copy link
Author

Hi,

Thank you very much for your prompt response.

Apologies for the earlier mistake in attaching the wrong files for elai-mt. I have now attached the correct files. Thank you for your patience.

sent2github.zip

I have run both versions with the same input files and filtering parameters, using --exclude-maf 0.05 and --exclude-miss 0. However, the number of SNPs in the snpinfo.txt file of elai-mt is slightly higher than in elai.

Regarding the discrepancy in snpinfo.txt, I understand that allele frequencies in elai-mt are estimated using the cohort alone. When you have a single diploid sample, you would expect allele frequency estimates of 0, 0.5, or 1. In elai, on the other hand, allele frequencies are calculated by taking into account the training samples.

Thanks for the clarification. I have a follow-up question based on your response: If I have 30 diploid samples in both parent panels and only one diploid sample in the admixed population (which I need to run), does this mean that the allele frequencies in elai are calculated based on 30*2+1 samples, while in elai-mt, they are only calculated from the 1 sample? If so, why are SNPs with a minor allele frequency (MAF) of 0.0 still used in the analysis in elai-mt instead of being filtered out with --exclude-maf 0.05?

Lastly, since I intend to run a single diploid sample each time and am more concerned with the features of each individual sample, would it make sense for me to perform SNP filtering prior to running elai/elai-mt, so that I don’t need to use --exclude-maf and --exclude-miss for further filtering during the analysis?

Thank you again for your help.

Best regards,
Yu.

@haplotype
Copy link
Owner

haplotype commented Feb 21, 2025 via email

@Caoyu819
Copy link
Author

Ok, thanks a lot for all your valuable response!

Best regards,
Yu

@Caoyu819
Copy link
Author

Hi,

Apologies for disturbing you again.

I attempted to run elai (v0.99) on a dataset containing more SNPs (331,393 SNPs, with --exclude-miss 0.1) compared to a previous dataset (46,755 SNPs, with --exclude-miss 0). However, the results were unusual, and it seems the MCMC did not converge after 20 steps (-s 20). Please refer to the attached files for more details.

elai-v0.99_exMiss0.1.zip

The command I used to run ELAI (v0.99) is as follows:
/public/home/maize/software/elai/src/elai -g $dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g $dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g $dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi \ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 \ -exclude-nopos -exclude-miss 0.1 -exclude-maf 0.05 -exclude-miss1

I also tried the following command by adding an extra dash before options like -exclude-miss, and using -mixgen to replace -mg:
/public/home/maize/software/elai/src/elai -g $dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g $dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g $dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi \ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mixgen 6000 \ --exclude-nopos --exclude-miss 0.1 --exclude-maf 0.05 --exclude-miss1

However, the results were identical, and it seems that something went wrong in both cases.

Could it be that the 331,393 SNPs are too large for ELAI (v0.99) to handle? Or could there be an issue with the parameters I'm using?

Thank you again for your time. Any advice or suggestions would be greatly appreciated.

Best wishes,
Yu

@haplotype
Copy link
Owner

haplotype commented Feb 21, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants