Skip to content

feat: added requeue option to client #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 27, 2024
Merged

Conversation

slaube-jgu
Copy link
Collaborator

@slaube-jgu slaube-jgu commented Aug 15, 2024

added option for users to indicate the sbatch requeue flag if the cluster configuration does no re-queuing on default

this is a boolean flag. We wait for the interface_plugin to be fixed.

Summary by CodeRabbit

  • New Features
    • Introduced a new option to allow requeuing of preempted or failed jobs in the job execution settings.
    • Added a --requeue flag to the job submission command for enhanced control over job handling.

@slaube-jgu slaube-jgu requested a review from cmeesters August 15, 2024 09:15
Copy link
Contributor

coderabbitai bot commented Aug 15, 2024

Walkthrough

The recent changes introduce a requeue boolean field to the ExecutorSettings class, allowing users to specify whether preempted or failed jobs should be requeued. This field is initialized with a default value of False and integrates with the SLURM job scheduler by adding the --requeue flag to the sbatch command when enabled. The logic in the run_job method has been updated to reflect this new functionality.

Changes

Files Change Summary
snakemake_executor_plugin_slurm/__init__.py Introduced a requeue boolean field to ExecutorSettings, defaulting to False, and modified run_job to append --requeue to the command if the flag is enabled.
tests/tests.py Renamed TestWorkflowsBase to TestWorkflows, updated get_executor_settings to return ExecutorSettings, and added TestWorkflowsRequeue subclass to test the requeue functionality.

Possibly related PRs

  • feat: in job stability #137: This PR involves modifications to the warn_on_jobcontext method in the ExecutorSettings class, which may interact with job submission processes, although it does not directly relate to the new requeue functionality.
  • fix: logfile quoting and scancel error handling #140: This PR alters the run_job method in the ExecutorSettings class, which is directly related to job submission and may be impacted by the new requeue feature introduced in the main PR.

Poem

🐰 In code we hop, with settings bright,
A requeue flag brings jobs to light.
When failures strike, don’t fear the fall,
With SLURM’s help, we’ll stand up tall!
So here’s to changes, swift and neat,
Let’s celebrate with a happy beat! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cmeesters cmeesters marked this pull request as ready for review August 15, 2024 11:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6dad273 and 42b4471.

Files selected for processing (1)
  • snakemake_executor_plugin_slurm/init.py (2 hunks)
Additional comments not posted (2)
snakemake_executor_plugin_slurm/__init__.py (2)

43-54: LGTM! The requeue field is well-defined.

The requeue field is correctly added with a default value of False and includes helpful metadata for documentation.


157-158: LGTM! The --requeue flag integration is correct.

The run_job method correctly appends the --requeue flag based on the requeue setting, enhancing job management flexibility.

@cmeesters
Copy link
Member

@johanneskoester I requested your review, because locally testworkflows run just fine and our change yields valid Python types, where the CI thinks a None-Type is given. Ideas?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 42b4471 and 4184610.

Files selected for processing (1)
  • snakemake_executor_plugin_slurm/init.py (2 hunks)
Additional comments not posted (2)
snakemake_executor_plugin_slurm/__init__.py (2)

45-56: LGTM!

The code changes are approved. The new requeue field is correctly added to the ExecutorSettings class with appropriate metadata.


160-162: LGTM!

The code changes are approved. The --requeue argument is correctly appended to the sbatch command based on the requeue setting.

To verify the impact of the requeue setting on the job execution process, please run the following script:

Verification successful

Verification Successful

The code correctly appends the --requeue flag to the sbatch command when the requeue setting is enabled. This implementation is verified and aligns with the expected behavior.

  • Location: snakemake_executor_plugin_slurm/__init__.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the impact of the `requeue` setting on the job execution process.

# Test: Search for the `--requeue` flag in the `sbatch` command. Expect: The flag is present only when the `requeue` setting is enabled.
rg --type python -A 5 $'if self.settings.requeue:\n\s+call \+= " --requeue"'

Length of output: 268


Script:

#!/bin/bash
# Description: Verify the impact of the `requeue` setting on the job execution process.

# Test: Search for the `--requeue` flag in the `sbatch` command. Expect: The flag is present only when the `requeue` setting is enabled.
rg --type python --multiline -A 5 $'if self.settings.requeue:\n\s+call \+= " --requeue"'

Length of output: 119


Script:

#!/bin/bash
# Description: Verify the impact of the `requeue` setting on the job execution process.

# Test: Search for the `--requeue` flag in the `sbatch` command. Expect: The flag is present only when the `requeue` setting is enabled.
rg --multiline -g '*.py' -A 5 $'if self.settings.requeue:\n\s+call \+= " --requeue"'

Length of output: 604

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4184610 and 5774a85.

Files selected for processing (1)
  • snakemake_executor_plugin_slurm/init.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • snakemake_executor_plugin_slurm/init.py

@cmeesters
Copy link
Member

The tests still fail - while working locally, the CI claims

FAILED tests/tests.py::TestWorkflowsBase::test_simple_workflow - AttributeError: 'NoneType' object has no attribute 'requeue'

@cmeesters
Copy link
Member

ah, wanted to look into this CI issue - turns out, I cannot find this branch. WTF???

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

Line range hint 45-160: Consider adding a test for the new requeue feature

The implementation of the new requeue option looks good and addresses the PR objective. To ensure the feature works as expected and to prevent potential regressions in the future, consider adding a test case that verifies the correct behavior of the requeue option.

This test could:

  1. Create a mock ExecutorSettings instance with requeue=True.
  2. Call the run_job method with these settings.
  3. Assert that the generated sbatch command includes the --requeue option.

Adding such a test would increase the robustness of this new feature.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 71e9f12 and 1e34bac.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/init.py (2 hunks)
🔇 Additional comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)

45-56: LGTM: New requeue option added correctly

The new requeue boolean field in the ExecutorSettings class is well-implemented. It follows the existing pattern for other fields, has a clear help text explaining its purpose and effect, and uses a default value of False to ensure backward compatibility.


158-160: LGTM: requeue option integrated correctly

The requeue option is properly integrated into the run_job method. It correctly checks the workflow settings before adding the --requeue option to the sbatch command. The placement and implementation of this new feature are appropriate and align with the existing code structure.


Line range hint 1-1000: Summary: New requeue feature implemented correctly

The changes in this file successfully implement the new requeue feature for the SLURM executor plugin. The implementation is correct, minimal, and consistent with the existing codebase. It addresses the PR objective of allowing users to specify the SBATCH requeue flag.

Key points:

  1. The requeue option is added to the ExecutorSettings class with appropriate metadata.
  2. The run_job method is updated to include the --requeue flag when the option is enabled.
  3. The changes are backward-compatible and do not introduce any apparent issues.

A suggestion has been made to add a test for this new feature to ensure its correct behavior and prevent future regressions.

Note: The CI issues mentioned in the PR comments do not appear to be directly related to these changes and may require separate investigation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
tests/tests.py (3)

8-15: LGTM: TestWorkflows class changes are appropriate.

The renaming of the class from TestWorkflowsBase to TestWorkflows and the update to get_executor_settings method are consistent with the PR objectives. The method now returns a concrete ExecutorSettings instance, which is an improvement over returning None.

Consider adding a comment explaining why the default ExecutorSettings() is used here, to provide context for future developers.


18-20: LGTM: TestWorkflowsRequeue class is well-implemented.

The new TestWorkflowsRequeue class correctly inherits from TestWorkflows and overrides the get_executor_settings method to test the new requeue functionality. This aligns well with the PR objectives.

Consider adding a docstring to the TestWorkflowsRequeue class explaining its purpose and how it differs from the base TestWorkflows class. This would improve code documentation and make the intent clearer for future developers.


Line range hint 1-20: Consider expanding test coverage for requeue functionality.

The changes in this file provide a good foundation for testing the new requeue option. However, to ensure comprehensive coverage of the new functionality, consider adding the following:

  1. Test cases that specifically verify the behavior when requeue is set to True vs False.
  2. Edge cases, such as testing with multiple job submissions or with different job statuses (e.g., preempted, failed).
  3. Integration tests that verify the correct flags are passed to the SLURM scheduler when requeue is enabled.

These additional tests would help ensure the robustness of the new feature and catch potential issues early in the development process.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f123fd9 and b2a44de.

📒 Files selected for processing (1)
  • tests/tests.py (1 hunks)
🔇 Additional comments (1)
tests/tests.py (1)

5-5: LGTM: Import statement for ExecutorSettings is correct.

The import of ExecutorSettings from snakemake_executor_plugin_slurm is necessary for the changes in the get_executor_settings methods and is placed correctly in the import section.

@johanneskoester johanneskoester merged commit b0ff160 into main Sep 27, 2024
5 checks passed
@johanneskoester johanneskoester deleted the feat/requeue-flag branch September 27, 2024 14:31
johanneskoester pushed a commit that referenced this pull request Sep 27, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.11.0](v0.10.2...v0.11.0)
(2024-09-27)


### Features

* added requeue option to client
([#136](#136))
([b0ff160](b0ff160))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@cmeesters
Copy link
Member

I see. Cool! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants