-
-
Notifications
You must be signed in to change notification settings - Fork 8.1k
[Bugifx] Remove TritonPlaceholder from sys.modules #17317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Isotr0py <2037008807@qq.com>
vllm/triton_utils/importing.py
Outdated
"Ignore import error when loading " \ | ||
"%s: %s", module_info.name, e) | ||
continue | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem seems more like TritonPlaceholder is a non-standard way of doing things, it's generally not OK to monkey patch a module like that. I don't know why we needed TritonPlaceholder -- is there something else we can do there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why we needed TritonPlaceholder
TritonPlaceholder is introduced in #15099 (comment) to avoid unintentional triton import error on non-GPU backend when import some module contains unused Triton ops (triton import has broken them several times).
The root issue is that TritonPlaceholder is added to sys.modules
before torch.compile
calling, which cause the torch triton check _is_triton_available
return negative True.
In torch2.7, I think we can simply hack _is_triton_available
together with TritonPlaceholder, if torch has migrated all triton checks to use this method. However, unfortunately, torch2.6 still has some triton checks not using this method, which is always a conflict with TritonPlaceholder
: https://github.com/pytorch/pytorch/blob/1eba9b3aa3c43f86f4a2c807ac8e12c4a7767340/torch/_inductor/runtime/triton_heuristics.py#L52-L58
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution is further implement the TritonPlaceholder
to be compatible with torch's triton check. Let me try to refactor the TritonPlaceholder
with this way....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution is further implement the
TritonPlaceholder
to be compatible with torch's triton check. Let me try to refactor theTritonPlaceholder
with this way....
@Isotr0py sorry for this bug, I noticed this bug, too. I think we can simulate the behavior of torch._inductor
when it can't find triton in TritonPlaceholder
, welcome any discuss or assign to solve this.
https://github.com/pytorch/pytorch/blob/1eba9b3aa3c43f86f4a2c807ac8e12c4a7767340/torch/_inductor/runtime/triton_heuristics.py#L81-L94
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't vLLM have try: import triton except:
blocks whenever it imports triton? Putting a placeholder that is not actually triton into sys.modules breaks all third-party libraries that are doing standard things with triton (e.g. try: import triton except:
, or even just accessing fields that are not in your placeholder)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't vLLM have try: import triton except: blocks whenever it imports triton?
I found that even if it is a script that only contains triton functions, it will be referenced by more files, scattering in all the project. Modifying these files will introduce greater changes.
Hmmm, refer to #15099 (comment), using try: import triton except:
blocks will introduce great changes across the whole project, and it's also possible to make non-GPU backend broken again once developer forgot to add this block when they implement a triton function.
I prefer to simulate the behavior of torch._inductor.runtime
when it can't find triton in TritonPlaceholder, because it looks like that the triton check of torch._inductor
is also putting some placeholders when it can't find triton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer not monkey-patching torch._inductor.runtime
.
I prefer to simulate the behavior of torch._inductor.runtime when it can't find triton in TritonPlaceholder, because it looks like that the triton check of torch._inductor is also putting some placeholders when it can't find triton.
There's a difference between putting a placeholder locally (that is what torch._inductor does) vs globally (that is what vLLM does, by modifying sys.modules["triton"]). The difference is that local placeholders don't break other libraries, while the global placeholder (sys.modules["triton"]) does. We should fix this if we can. One idea is that we refactor all of vLLM's triton imports to import triton from a single file in vLLM that does the try-catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer not monkey-patching
torch._inductor.runtime
.I prefer to simulate the behavior of torch._inductor.runtime when it can't find triton in TritonPlaceholder, because it looks like that the triton check of torch._inductor is also putting some placeholders when it can't find triton.
There's a difference between putting a placeholder locally (that is what torch._inductor does) vs globally (that is what vLLM does, by modifying sys.modules["triton"]). The difference is that local placeholders don't break other libraries, while the global placeholder (sys.modules["triton"]) does. We should fix this if we can. One idea is that we refactor all of vLLM's triton imports to import triton from a single file in vLLM that does the try-catch.
This looks reseanable, and using try-catch would be safer. However importing triton from single file in vLLM will break the code static check.
And it indeed need much more changes in vLLM, and we should make dummy decorators for triton.jit
, etc. You can see #17446 as a reference, I happened to keep the previous code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm thinking if we can introduce some import hooks instead of putting placeholders to sys.modules
. So that we can limit the placeholder only be used under vllm project.
Signed-off-by: Isotr0py <2037008807@qq.com>
vllm/triton_utils/importing.py
Outdated
# Replace triton submodules with dummy objects to keep compatibility with | ||
# torch triton check | ||
triton_modules_with_objects = { | ||
"triton.compiler": ["CompiledKernel"], | ||
"triton.runtime.autotuner": ["OutOfResources"], | ||
"triton.runtime.jit": ["KernelInterface"], | ||
} | ||
for module_name, dummy_objects in triton_modules_with_objects.items(): | ||
sys.modules[module_name] = TritonModulePlaceholder( | ||
module_name, dummy_objects=dummy_objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MengqingCao We can simulate the behavior of torch._inductor.runtim.triton_heuristics
, but it's a little bit inflexible, is there any better method to achieve it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use a recursive method to define the attrs of the placeholder. However it's difficult to define its func __call__
, because the triton related placeholders in torch._inductor.runtim.triton_heuristics
are different (some of them are assigned None
, and the others are Object
). And maybe there exsists other type of placeholders in torch inductor.
class TritonModulePlaceholder(types.ModuleType):
def __init__(
self,
name: str,
dummy_objects: Optional[list[str]] = None,
):
super().__init__(name)
if dummy_objects is not None:
for obj_name in dummy_objects:
setattr(self, obj_name, object)
def __getattr__(self, name):
return TritonModulePlaceholder(f"{self.__name__}.{name}")
Do you have any idea based on this? @Isotr0py
This PR (commit 8eccfbb) fixes |
Signed-off-by: Isotr0py <2037008807@qq.com>
@zou3519 @MengqingCao Since lots of non-GPU backend is broken due to this issue currently. I decided to remove I keep the from vllm.triton_utils.importing import HAS_TRITON, TritonPlaceholder, TritonLanguagePlaceholder
if HAS_TRITON:
import triton
import triton.language as tl
else:
triton = TritonPlaceholder()
tl = TritonLanguagePlaceholder() Anyway, we can leave this modification to be done in #17446. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial fix looks good to me, thank you
Okay, I will make this change in #17446 after merging this PR. LGTM now, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the test plan with this partial fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the non-GPU breakage first.
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
[Bugifx] Remove TritonPlaceholder from sys.modules (vllm-project#17317)
FIX #17309 (link existing issues this PR will resolve)
torch._inductor.runtime
to trigger all torch triton check before placeholder introduction intentionally._is_triton_available
but torch2.6 also used a bad triton check intriton_heuristics.py
, which will always be affected by placeholder 😢: https://github.com/pytorch/pytorch/blob/1eba9b3aa3c43f86f4a2c807ac8e12c4a7767340/torch/_inductor/runtime/triton_heuristics.py#L52-L58Tested on x86 CPU environment by uninstalling triton intentionally: