Skip to content

Implement GLU using internal views to avoid copying #11295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 4, 2025

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Jun 2, 2025

GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing the ATen that uses views entirely internally to the op.

To support this, I added support_noncontiguous_tensors as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter.

swolchok added 2 commits June 2, 2025 12:38
[ghstack-poisoned]
[ghstack-poisoned]
@swolchok
Copy link
Contributor Author

swolchok commented Jun 2, 2025

Stack from ghstack (oldest at bottom):

@swolchok swolchok requested a review from manuelcandales as a code owner June 2, 2025 19:38
Copy link

pytorch-bot bot commented Jun 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11295

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 77 Pending

As of commit 07621c9 with merge base 0e35c30 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok added a commit that referenced this pull request Jun 2, 2025
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op.

To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter.


ghstack-source-id: fac946b
ghstack-comment-id: 2932190540
Pull-Request-resolved: #11295
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 2, 2025
@swolchok swolchok requested a review from GregoryComer June 2, 2025 19:39
swolchok added 2 commits June 3, 2025 12:35
[ghstack-poisoned]
[ghstack-poisoned]
swolchok added a commit that referenced this pull request Jun 3, 2025
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op.

To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter.

ghstack-source-id: d101928
ghstack-comment-id: 2932190540
Pull-Request-resolved: #11295
[ghstack-poisoned]
swolchok added a commit that referenced this pull request Jun 3, 2025
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op.

To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter.

ghstack-source-id: acf03bb
ghstack-comment-id: 2932190540
Pull-Request-resolved: #11295
second_half,
utils::SupportedTensorDtypes::FLOATHBF16,
out,
utils::internal::SupportNoncontiguousTensors());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why didn't you pass the support_non_contiguous_tensors here as a template parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the argument exists because specifying it as a template parameter is messy; see other thread

const Tensor& b,
SupportedTensorDtypes b_dtypes,
const Tensor& out,
SupportNoncontiguousTensors) {
Copy link
Contributor

@manuelcandales manuelcandales Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to add this overload, with this confusing anonymous argument, rather than adding a template parameter with a default value = false (i.e. just pass along the support_noncontiguous_tensors template parameter, in the original implementation above)

Copy link
Contributor Author

@swolchok swolchok Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a template parameter with a default value = false

I tried this first, but AFAICT there's nowhere to put that template parameter -- template parameters with defaults need to go at the end, and then specifying a value for the parameter would require you to explicitly specify the type Op, which requires uglifying your code significantly if Op is a lambda type.

confusing anonymous argument

this type of thing is in the standard. For example, std::in_place_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thankfully this is in namespace internal, so if we come up with a better idea, we can fix it later. Given that, I will go ahead and land based on Gregory's accept.

@swolchok swolchok requested a review from manuelcandales June 4, 2025 17:26
@swolchok
Copy link
Contributor Author

swolchok commented Jun 4, 2025

noting that CI is green

Base automatically changed from gh/swolchok/443/head to main June 4, 2025 17:26
[ghstack-poisoned]
swolchok added a commit that referenced this pull request Jun 4, 2025
GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op.

To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter.

ghstack-source-id: 56a0405
ghstack-comment-id: 2932190540
Pull-Request-resolved: #11295
@swolchok swolchok added the release notes: ops & kernels Changes to the opset and any new / changed kernel implementations label Jun 4, 2025
@swolchok swolchok merged commit 1f52982 into main Jun 4, 2025
96 checks passed
@swolchok swolchok deleted the gh/swolchok/444/head branch June 4, 2025 20:13
lucylq added a commit that referenced this pull request Jun 7, 2025
lucylq added a commit that referenced this pull request Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: ops & kernels Changes to the opset and any new / changed kernel implementations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants