-
Notifications
You must be signed in to change notification settings - Fork 585
Implement GLU using internal views to avoid copying #11295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11295
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 77 PendingAs of commit 07621c9 with merge base 0e35c30 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op. To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter. ghstack-source-id: fac946b ghstack-comment-id: 2932190540 Pull-Request-resolved: #11295
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op. To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter. ghstack-source-id: d101928 ghstack-comment-id: 2932190540 Pull-Request-resolved: #11295
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op. To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter. ghstack-source-id: acf03bb ghstack-comment-id: 2932190540 Pull-Request-resolved: #11295
second_half, | ||
utils::SupportedTensorDtypes::FLOATHBF16, | ||
out, | ||
utils::internal::SupportNoncontiguousTensors()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why didn't you pass the support_non_contiguous_tensors here as a template parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the argument exists because specifying it as a template parameter is messy; see other thread
const Tensor& b, | ||
SupportedTensorDtypes b_dtypes, | ||
const Tensor& out, | ||
SupportNoncontiguousTensors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to add this overload, with this confusing anonymous argument, rather than adding a template parameter with a default value = false (i.e. just pass along the support_noncontiguous_tensors template parameter, in the original implementation above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a template parameter with a default value = false
I tried this first, but AFAICT there's nowhere to put that template parameter -- template parameters with defaults need to go at the end, and then specifying a value for the parameter would require you to explicitly specify the type Op
, which requires uglifying your code significantly if Op
is a lambda type.
confusing anonymous argument
this type of thing is in the standard. For example, std::in_place_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thankfully this is in namespace internal
, so if we come up with a better idea, we can fix it later. Given that, I will go ahead and land based on Gregory's accept.
noting that CI is green |
[ghstack-poisoned]
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op. To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter. ghstack-source-id: 56a0405 ghstack-comment-id: 2932190540 Pull-Request-resolved: #11295
…#11464) This reverts commit 1f52982. Breaking internal tests: [D75980858](https://www.internalfb.com/diff/D75980858)
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing the ATen that uses views entirely internally to the op.
To support this, I added
support_noncontiguous_tensors
as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter.