Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable ray tests for latest torch release #2328

Merged
merged 2 commits into from
Feb 12, 2025
Merged

disable ray tests for latest torch release #2328

merged 2 commits into from
Feb 12, 2025

Conversation

winglian
Copy link
Collaborator

Description

Disabling ray tests on latest torch 2.6.0 for now. Per @erictang000

it looks like the issue is actually with triton 3.2.0 and ray.
Bug details:
deepspeed uses the @triton.autotuner decorator, which gets initialized at import time in triton 3.2.0,
they add logic to the autotuner that checks torch.cuda.is_available() in the autotuner constructor
the ray TrainTrainable cpu only actor, which packages environments for the RayTrainWorker gpu actors, 
tries to import deepspeed but torch.cuda.is_available() is false, and the new triton version doesn't like that

Fixing triton to 3.1.0 will fix the issue for now, but this might cause other issues with torch 2.6.0 since triton==3.2.0 
was a dependency. We're looking into the best way to resolve this either on our end or through putting an issue up 
on deepspeed to allow a way to avoid the offending triton autotuner code.

@winglian winglian merged commit 3004631 into main Feb 12, 2025
14 checks passed
@winglian winglian deleted the no-ray-260 branch February 12, 2025 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant