-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancies in Ancestry Dosage Estimates and SNP Count Between elai-mt (v1.21) and elai (v0.99) #3
Comments
Hi, thanks for the bug report.
From the log file in your attachment, you used different parameter values
(0.01 vs 0.05) in --exclude-maf for elai-mt and elai.
Please use the same value and rerun to see if the inconsistency persists.
As for the discrepancy in snpinfo.txt, allele frequencies in elai-mt were
estimated from cohort alone. When you have one diploid sample, you expect
0,0.5,1 to be the allele frequency estimates. In elai, the allele
frequencies were estimated by taking in account training samples. Sorry
for the inconsistency.
Thank you.
Grant
…On Thu, Feb 20, 2025 at 2:20 AM Caoyu819 ***@***.***> wrote:
Hi,
Thank you for taking the time to read my message.
I have been inferring the ancestry dosage of my plant using elai-mt
(v1.21) recently and have compared the results with the analysis from an
earlier version of elai (v0.99).
However, I have observed some discrepancies between the two versions.
Specifically, the estimated ancestry dosage for each SNP differs between
the two versions (with elai-mt generally producing smaller estimates than
elai), and the number of SNPs reported in the sampleID.snpinfo.txt file
also varies. In general, elai-mt reports more SNPs than elai.
Here are the commands I used for both analyses:
For elai-mt (v1.21):
singularity exec -B $ROOT:/opt/ $IMAGE/debian/debian11.sif
/opt/bin/elai-mt \ -g $dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p
10 \ -g $dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g
$dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi
\ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 -nthreads 2 \ -exclude-nopos
-exclude-miss 0 -exclude-maf 0.05 -exclude-miss1
For elai (v0.99):
/public/home/maize/software/elai/src/elai -g
$dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g
$dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g
$dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi
\ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 \ -exclude-nopos
-exclude-miss 0 -exclude-maf 0.05 -exclude-miss1
I have attached the output files from both versions for your reference.
Additionally, I noticed that the maf column in the sampleID.snpinfo.txt
file generated by elai-mt contains only three values (0, 0.5, or 1), which
doesn't seem to make sense to me. Could this be due to something I missed
in the parameter settings?
I would greatly appreciate any suggestions or guidance you could offer.
Thank you for your consideration.
Best wishes,
Yu
sent2github.zip
<https://github.com/user-attachments/files/18882272/sent2github.zip>
—
Reply to this email directly, view it on GitHub
<#3>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACWHWL3WCM54AA5WL5DOSPD2QV63DAVCNFSM6AAAAABXQASWG6VHI2DSMVQWIX3LMV43ASLTON2WKOZSHA3DKMRZGUYTEMQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
[image: Caoyu819]*Caoyu819* created an issue (haplotype/ELAI#3)
<#3>
Hi,
Thank you for taking the time to read my message.
I have been inferring the ancestry dosage of my plant using elai-mt
(v1.21) recently and have compared the results with the analysis from an
earlier version of elai (v0.99).
However, I have observed some discrepancies between the two versions.
Specifically, the estimated ancestry dosage for each SNP differs between
the two versions (with elai-mt generally producing smaller estimates than
elai), and the number of SNPs reported in the sampleID.snpinfo.txt file
also varies. In general, elai-mt reports more SNPs than elai.
Here are the commands I used for both analyses:
For elai-mt (v1.21):
singularity exec -B $ROOT:/opt/ $IMAGE/debian/debian11.sif
/opt/bin/elai-mt \ -g $dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p
10 \ -g $dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g
$dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi
\ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 -nthreads 2 \ -exclude-nopos
-exclude-miss 0 -exclude-maf 0.05 -exclude-miss1
For elai (v0.99):
/public/home/maize/software/elai/src/elai -g
$dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g
$dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g
$dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi
\ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 \ -exclude-nopos
-exclude-miss 0 -exclude-maf 0.05 -exclude-miss1
I have attached the output files from both versions for your reference.
Additionally, I noticed that the maf column in the sampleID.snpinfo.txt
file generated by elai-mt contains only three values (0, 0.5, or 1), which
doesn't seem to make sense to me. Could this be due to something I missed
in the parameter settings?
I would greatly appreciate any suggestions or guidance you could offer.
Thank you for your consideration.
Best wishes,
Yu
sent2github.zip
<https://github.com/user-attachments/files/18882272/sent2github.zip>
—
Reply to this email directly, view it on GitHub
<#3>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACWHWL3WCM54AA5WL5DOSPD2QV63DAVCNFSM6AAAAABXQASWG6VHI2DSMVQWIX3LMV43ASLTON2WKOZSHA3DKMRZGUYTEMQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi, Thank you very much for your prompt response. Apologies for the earlier mistake in attaching the wrong files for elai-mt. I have now attached the correct files. Thank you for your patience. I have run both versions with the same input files and filtering parameters, using --exclude-maf 0.05 and --exclude-miss 0. However, the number of SNPs in the snpinfo.txt file of elai-mt is slightly higher than in elai.
Thanks for the clarification. I have a follow-up question based on your response: If I have 30 diploid samples in both parent panels and only one diploid sample in the admixed population (which I need to run), does this mean that the allele frequencies in elai are calculated based on 30*2+1 samples, while in elai-mt, they are only calculated from the 1 sample? If so, why are SNPs with a minor allele frequency (MAF) of 0.0 still used in the analysis in elai-mt instead of being filtered out with --exclude-maf 0.05? Lastly, since I intend to run a single diploid sample each time and am more concerned with the features of each individual sample, would it make sense for me to perform SNP filtering prior to running elai/elai-mt, so that I don’t need to use --exclude-maf and --exclude-miss for further filtering during the analysis? Thank you again for your help. Best regards, |
Filter SNP beforehand is a good idea.
Your dataset is small and you should use ELAI.
ELAI-mt was not extensively tested for input and my focus was on making
sure multithreading EM run was consistent with ELAI. I will try to fix the
bug the opportunity presents itself, but this has low priority.
…On Thu, Feb 20, 2025 at 21:46 Caoyu819 ***@***.***> wrote:
Hi,
Thank you very much for your prompt response.
Apologies for the earlier mistake in attaching the wrong files for
elai-mt. I have now attached the correct files. Thank you for your patience.
sent2github.zip
<https://github.com/user-attachments/files/18898211/sent2github.zip>
I have run both versions with the same input files and filtering
parameters, using --exclude-maf 0.05 and --exclude-miss 0. However, the
number of SNPs in the snpinfo.txt file of elai-mt is slightly higher than
in elai.
Regarding the discrepancy in snpinfo.txt, I understand that allele
frequencies in elai-mt are estimated using the cohort alone. When you have
a single diploid sample, you would expect allele frequency estimates of 0,
0.5, or 1. In elai, on the other hand, allele frequencies are calculated by
taking into account the training samples.
Thanks for the clarification. I have a follow-up question based on your
response: If I have 30 diploid samples in both parent panels and only one
diploid sample in the admixed population (which I need to run), does this
mean that the allele frequencies in elai are calculated based on 30*2+1
samples, while in elai-mt, they are only calculated from the 1 sample? If
so, why are SNPs with a minor allele frequency (MAF) of 0.0 still used in
the analysis in elai-mt instead of being filtered out with --exclude-maf
0.05?
Lastly, since I intend to run a single diploid sample each time and am
more concerned with the features of each individual sample, would it make
sense for me to perform SNP filtering prior to running elai/elai-mt, so
that I don’t need to use --exclude-maf and --exclude-miss for further
filtering during the analysis?
Thank you again for your help.
Best regards,
Yu.
—
Reply to this email directly, view it on GitHub
<#3 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACWHWL3M6XAO5ENNKPIO52L2Q2HRTAVCNFSM6AAAAABXQASWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZTGIYTOMRTGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: Caoyu819]*Caoyu819* left a comment (haplotype/ELAI#3)
<#3 (comment)>
Hi,
Thank you very much for your prompt response.
Apologies for the earlier mistake in attaching the wrong files for
elai-mt. I have now attached the correct files. Thank you for your patience.
sent2github.zip
<https://github.com/user-attachments/files/18898211/sent2github.zip>
I have run both versions with the same input files and filtering
parameters, using --exclude-maf 0.05 and --exclude-miss 0. However, the
number of SNPs in the snpinfo.txt file of elai-mt is slightly higher than
in elai.
Regarding the discrepancy in snpinfo.txt, I understand that allele
frequencies in elai-mt are estimated using the cohort alone. When you have
a single diploid sample, you would expect allele frequency estimates of 0,
0.5, or 1. In elai, on the other hand, allele frequencies are calculated by
taking into account the training samples.
Thanks for the clarification. I have a follow-up question based on your
response: If I have 30 diploid samples in both parent panels and only one
diploid sample in the admixed population (which I need to run), does this
mean that the allele frequencies in elai are calculated based on 30*2+1
samples, while in elai-mt, they are only calculated from the 1 sample? If
so, why are SNPs with a minor allele frequency (MAF) of 0.0 still used in
the analysis in elai-mt instead of being filtered out with --exclude-maf
0.05?
Lastly, since I intend to run a single diploid sample each time and am
more concerned with the features of each individual sample, would it make
sense for me to perform SNP filtering prior to running elai/elai-mt, so
that I don’t need to use --exclude-maf and --exclude-miss for further
filtering during the analysis?
Thank you again for your help.
Best regards,
Yu.
—
Reply to this email directly, view it on GitHub
<#3 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACWHWL3M6XAO5ENNKPIO52L2Q2HRTAVCNFSM6AAAAABXQASWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZTGIYTOMRTGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Ok, thanks a lot for all your valuable response! Best regards, |
Hi, Apologies for disturbing you again. I attempted to run elai (v0.99) on a dataset containing more SNPs (331,393 SNPs, with --exclude-miss 0.1) compared to a previous dataset (46,755 SNPs, with --exclude-miss 0). However, the results were unusual, and it seems the MCMC did not converge after 20 steps (-s 20). Please refer to the attached files for more details. The command I used to run ELAI (v0.99) is as follows: I also tried the following command by adding an extra dash before options like -exclude-miss, and using -mixgen to replace -mg: However, the results were identical, and it seems that something went wrong in both cases. Could it be that the 331,393 SNPs are too large for ELAI (v0.99) to handle? Or could there be an issue with the parameters I'm using? Thank you again for your time. Any advice or suggestions would be greatly appreciated. Best wishes, |
Try -C 2 -c 10 -mg 1000
The -c specify number of lower layer clusters, corresponding to number of
ancestral haplotypes. you can try a larger number if the computation is not
a problem.
You can also try -w 20 -s 0, which invoke linear approximation for the EM
algorithm. It’s much faster.
…On Fri, Feb 21, 2025 at 04:29 Caoyu819 ***@***.***> wrote:
Hi,
Apologies for disturbing you again.
I attempted to run elai (v0.99) on a dataset containing more SNPs (331,393
SNPs, with --exclude-miss 0.1) compared to a previous dataset (46,755 SNPs,
with --exclude-miss 0). However, the results were unusual, and it seems the
MCMC did not converge after 20 steps (-s 20). Please refer to the attached
files for more details.
elai-v0.99_exMiss0.1.zip
<https://github.com/user-attachments/files/18904161/elai-v0.99_exMiss0.1.zip>
The command I used to run ELAI (v0.99) is as follows:
/public/home/maize/software/elai/src/elai -g
$dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g
$dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g
$dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi
\ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 \ -exclude-nopos
-exclude-miss 0.1 -exclude-maf 0.05 -exclude-miss1
I also tried the following command by adding an extra dash before options
like -exclude-miss, and using -mixgen to replace -mg:
/public/home/maize/software/elai/src/elai -g
$dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g
$dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g
$dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi
\ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mixgen 6000 \ --exclude-nopos
--exclude-miss 0.1 --exclude-maf 0.05 --exclude-miss1
However, the results were identical, and it seems that something went
wrong in both cases.
Could it be that the 331,393 SNPs are too large for ELAI (v0.99) to
handle? Or could there be an issue with the parameters I'm using?
Thank you again for your time. Any advice or suggestions would be greatly
appreciated.
Best wishes,
Yu
—
Reply to this email directly, view it on GitHub
<#3 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACWHWL6UYRHYQFPEQ7S4KAL2Q3WXFAVCNFSM6AAAAABXQASWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZUGAZTEOBSHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: Caoyu819]*Caoyu819* left a comment (haplotype/ELAI#3)
<#3 (comment)>
Hi,
Apologies for disturbing you again.
I attempted to run elai (v0.99) on a dataset containing more SNPs (331,393
SNPs, with --exclude-miss 0.1) compared to a previous dataset (46,755 SNPs,
with --exclude-miss 0). However, the results were unusual, and it seems the
MCMC did not converge after 20 steps (-s 20). Please refer to the attached
files for more details.
elai-v0.99_exMiss0.1.zip
<https://github.com/user-attachments/files/18904161/elai-v0.99_exMiss0.1.zip>
The command I used to run ELAI (v0.99) is as follows:
/public/home/maize/software/elai/src/elai -g
$dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g
$dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g
$dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi
\ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 \ -exclude-nopos
-exclude-miss 0.1 -exclude-maf 0.05 -exclude-miss1
I also tried the following command by adding an extra dash before options
like -exclude-miss, and using -mixgen to replace -mg:
/public/home/maize/software/elai/src/elai -g
$dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g
$dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g
$dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi
\ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mixgen 6000 \ --exclude-nopos
--exclude-miss 0.1 --exclude-maf 0.05 --exclude-miss1
However, the results were identical, and it seems that something went
wrong in both cases.
Could it be that the 331,393 SNPs are too large for ELAI (v0.99) to
handle? Or could there be an issue with the parameters I'm using?
Thank you again for your time. Any advice or suggestions would be greatly
appreciated.
Best wishes,
Yu
—
Reply to this email directly, view it on GitHub
<#3 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACWHWL6UYRHYQFPEQ7S4KAL2Q3WXFAVCNFSM6AAAAABXQASWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZUGAZTEOBSHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi,
Thank you for taking the time to read my message.
I have been inferring the ancestry dosage of my plant using elai-mt (v1.21) recently and have compared the results with the analysis from an earlier version of elai (v0.99).
However, I have observed some discrepancies between the two versions. Specifically, the estimated ancestry dosage for each SNP differs between the two versions (with elai-mt generally producing smaller estimates than elai), and the number of SNPs reported in the sampleID.snpinfo.txt file also varies. In general, elai-mt reports more SNPs than elai.
Here are the commands I used for both analyses:
For elai-mt (v1.21):
singularity exec -B $ROOT:/opt/ $IMAGE/debian/debian11.sif /opt/bin/elai-mt \ -g $dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g $dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g $dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi \ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 -nthreads 2 \ -exclude-nopos -exclude-miss 0 -exclude-maf 0.05 -exclude-miss1
For elai (v0.99):
/public/home/maize/software/elai/src/elai -g $dir_inp/gt.mex_30.biallelic.chr1.recode.geno.txt -p 10 \ -g $dir_inp/gt.par_30.biallelic.chr1.recode.geno.txt -p 11 \ -g $dir_inp/chr1/${index}.chr1.recode.geno.txt -p 1 \ -pos $dir_inp/chr1.posi \ -o ${index}.chr1 \ -s 20 -C 2 -c 2 -mg 6000 \ -exclude-nopos -exclude-miss 0 -exclude-maf 0.05 -exclude-miss1
I have attached the output files from both versions for your reference.
Additionally, I noticed that the maf column in the sampleID.snpinfo.txt file generated by elai-mt contains only three values (0, 0.5, or 1), which doesn't seem to make sense to me. Could this be due to something I missed in the parameter settings?
I would greatly appreciate any suggestions or guidance you could offer.
Thank you for your consideration.
Best wishes,
Yu
sent2github.zip
The text was updated successfully, but these errors were encountered: