Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --timeout-second and retry to callVariant #844

Merged
merged 3 commits into from
Feb 8, 2024
Merged

Conversation

zhuchcn
Copy link
Member

@zhuchcn zhuchcn commented Feb 8, 2024

Description

I have a transcript with 375 SNV/Indels and can not finish in hours. The limiting step is calling mislceaved peptides from the PCG, after cleavage. Setting a global max_variants_per_node doesn't make a lot sense to me, so I implemented a timeout function. So for each transcript, if it can't be finished in certain time, it will stop and retry with a lower max_variants_per_node (and additional_variants_per_misc).

The --timeout-seconds is added to callVariant and defaults to 30 minutes.

The --max-variants-per-node and --additional-variants-per-misc can accept multiple values now, and they will be used as the "retry strategy". And if we run out of the --max-variants-per-node values, it will continue retry with the previous value minus 1 until 0 and raise an error. For example, with --max-variants-per-node 7 5, the retries will be 7 -> 5 -> 4 -> 3 -> 2 -> 1 -> error (which should never happen).

--additional-variants-per-misc is slightly different. If we run out of values, 0 will be used. So by default it will be 2 -> 0

Closes #...

Checklist

  • This PR does NOT contain PHI or germline genetic data. A repo may need to be deleted if such data is uploaded. Disclosing PHI is a major problem.
  • This PR does NOT contain molecular files, compressed files, output files such as images (e.g. .png, .jpeg), .pdf, .RData, .xlsx, .doc, .ppt, or other non-plain-text files. To automatically exclude such files using a .gitignore file, see here for example.
  • I have read the code review guidelines and the code review best practice on GitHub check-list.
  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
  • I have added the major changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.
  • All test cases passed locally.

@zhuchcn zhuchcn marked this pull request as ready for review February 8, 2024 17:08
@zhuchcn zhuchcn requested a review from lydiayliu February 8, 2024 17:08
@lydiayliu
Copy link
Collaborator

Interesting, can you input something like --max-variants-per-node 7,7,7,5,5,5,3,3,3,2,2,2,1,1,1 and --additional-variants-per-misc 2,1,0,2,1,0,2,1,0,2,1,0,2,1,0 to get a grid search effect?

@zhuchcn
Copy link
Member Author

zhuchcn commented Feb 8, 2024

You can, but this retry mechanism is only awakened when a transcript is timed out. Setting too many retry cycles will probably make the total run time very long. For example, if it ends up being finishable with 5-0, you will have to run 5 times before it. If we set --timeout-seconds to 900, it will take 90 minutes to finish this one (which isn't super bad).

Btw, I made these two arguments to accept multiple values, so you don't need to comma. Like this:

--max-variants-per-node 7 7 7 5 5 5 --additional-variants-per-misc 2 1 0 2 1 0

@zhuchcn zhuchcn merged commit 01dea81 into main Feb 8, 2024
2 checks passed
@zhuchcn zhuchcn deleted the czhu-fix-call-variant branch February 8, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants