Implement GLU using internal views to avoid copying #11295

swolchok · 2025-06-02T19:38:18Z

GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing the ATen that uses views entirely internally to the op.

To support this, I added support_noncontiguous_tensors as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter.

[ghstack-poisoned]

swolchok · 2025-06-02T19:38:19Z

Stack from ghstack (oldest at bottom):

-> Implement GLU using internal views to avoid copying #11295

pytorch-bot · 2025-06-02T19:38:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11295

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 77 Pending

As of commit 07621c9 with merge base 0e35c30 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op. To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter. ghstack-source-id: fac946b ghstack-comment-id: 2932190540 Pull-Request-resolved: #11295

[ghstack-poisoned]

GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op. To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter. ghstack-source-id: d101928 ghstack-comment-id: 2932190540 Pull-Request-resolved: #11295

[ghstack-poisoned]

GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op. To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter. ghstack-source-id: acf03bb ghstack-comment-id: 2932190540 Pull-Request-resolved: #11295

manuelcandales · 2025-06-04T15:38:47Z

kernels/portable/cpu/op_glu.cpp

+        second_half,
+        utils::SupportedTensorDtypes::FLOATHBF16,
+        out,
+        utils::internal::SupportNoncontiguousTensors());


why didn't you pass the support_non_contiguous_tensors here as a template parameter?

the argument exists because specifying it as a template parameter is messy; see other thread

manuelcandales · 2025-06-04T15:41:00Z

kernels/portable/cpu/util/elementwise_util.h

+    const Tensor& b,
+    SupportedTensorDtypes b_dtypes,
+    const Tensor& out,
+    SupportNoncontiguousTensors) {


Why do you need to add this overload, with this confusing anonymous argument, rather than adding a template parameter with a default value = false (i.e. just pass along the support_noncontiguous_tensors template parameter, in the original implementation above)

adding a template parameter with a default value = false

I tried this first, but AFAICT there's nowhere to put that template parameter -- template parameters with defaults need to go at the end, and then specifying a value for the parameter would require you to explicitly specify the type Op, which requires uglifying your code significantly if Op is a lambda type.

confusing anonymous argument

this type of thing is in the standard. For example, std::in_place_t.

thankfully this is in namespace internal, so if we come up with a better idea, we can fix it later. Given that, I will go ahead and land based on Gregory's accept.

swolchok · 2025-06-04T17:26:38Z

noting that CI is green

[ghstack-poisoned]

GLU requires slicing the input Tensor into two halves. Currently, we accomplish this by copying; ExecuTorch does not support views in general because it requires Tensors to be contiguous. However, nothing stops us from implementing [the ATen that uses views](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/GatedLinearUnit.cpp#L35) entirely internally to the op. To support this, I added `support_noncontiguous_tensors` as an optional template argument to BroadcastIndexesRange and plumbed it through to the elementwise_util functions as an optional SupportNonContiguousTensors parameter. ghstack-source-id: 56a0405 ghstack-comment-id: 2932190540 Pull-Request-resolved: #11295

This reverts commit 1f52982.

…#11464) This reverts commit 1f52982. Breaking internal tests: [D75980858](https://www.internalfb.com/diff/D75980858)

swolchok added 2 commits June 2, 2025 12:38

Update

7768a07

[ghstack-poisoned]

Update

3ff791c

[ghstack-poisoned]

swolchok requested a review from manuelcandales as a code owner June 2, 2025 19:38

swolchok mentioned this pull request Jun 2, 2025

Make op_glu_test input asymmetric #11294

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 2, 2025

swolchok requested a review from GregoryComer June 2, 2025 19:39

swolchok added 2 commits June 3, 2025 12:35

Update

cf337e7

[ghstack-poisoned]

Update

666d961

[ghstack-poisoned]

Update

089e5ce

[ghstack-poisoned]

manuelcandales reviewed Jun 4, 2025

View reviewed changes

swolchok requested a review from manuelcandales June 4, 2025 17:26

Base automatically changed from gh/swolchok/443/head to main June 4, 2025 17:26

GregoryComer approved these changes Jun 4, 2025

View reviewed changes

rebase to pacify github

07621c9

[ghstack-poisoned]

swolchok added the release notes: ops & kernels Changes to the opset and any new / changed kernel implementations label Jun 4, 2025

swolchok merged commit 1f52982 into main Jun 4, 2025
96 checks passed

swolchok deleted the gh/swolchok/444/head branch June 4, 2025 20:13

lucylq added a commit that referenced this pull request Jun 7, 2025

Revert "Implement GLU using internal views to avoid copying (#11295)"

f02b75c

This reverts commit 1f52982.

lucylq added a commit that referenced this pull request Jun 7, 2025

Revert "Implement GLU using internal views to avoid copying (#11295)" (…

0e6eef8

…#11464) This reverts commit 1f52982. Breaking internal tests: [D75980858](https://www.internalfb.com/diff/D75980858)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement GLU using internal views to avoid copying #11295

Implement GLU using internal views to avoid copying #11295

Uh oh!

swolchok commented Jun 2, 2025

Uh oh!

swolchok commented Jun 2, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 2, 2025 •

edited

Loading

Uh oh!

manuelcandales Jun 4, 2025

Uh oh!

swolchok Jun 4, 2025

Uh oh!

manuelcandales Jun 4, 2025 •

edited

Loading

Uh oh!

swolchok Jun 4, 2025 •

edited

Loading

Uh oh!

swolchok Jun 4, 2025

Uh oh!

swolchok commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Implement GLU using internal views to avoid copying #11295

Implement GLU using internal views to avoid copying #11295

Uh oh!

Conversation

swolchok commented Jun 2, 2025

Uh oh!

swolchok commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11295

⏳ No Failures, 77 Pending

Uh oh!

manuelcandales Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

swolchok Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

manuelcandales Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swolchok Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swolchok Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

swolchok commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

swolchok commented Jun 2, 2025 •

edited

Loading

pytorch-bot bot commented Jun 2, 2025 •

edited

Loading

manuelcandales Jun 4, 2025 •

edited

Loading

swolchok Jun 4, 2025 •

edited

Loading