You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source-fabric/advanced/compile.rst
+18-1
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ Speed up models by compiling them
3
3
#################################
4
4
5
5
Compiling your PyTorch model can result in significant speedups, especially on the latest generations of GPUs.
6
-
This guide shows you how to apply ``torch.compile`` correctly in your code.
6
+
This guide shows you how to apply `torch.compile<https://pytorch.org/docs/stable/generated/torch.compile.html>`_ correctly in your code.
7
7
8
8
.. note::
9
9
@@ -223,6 +223,9 @@ On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically a
223
223
Numbers produced with NVIDIA A100 SXM4 40GB, PyTorch 2.2.0, CUDA 12.1.
224
224
225
225
226
+
If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html#excessive-recompilation>`_ for further investigation.
227
+
228
+
226
229
----
227
230
228
231
@@ -301,4 +304,18 @@ However, should you have issues compiling DDP and FSDP models, you can opt out o
301
304
model = fabric.setup(model, _reapply_compile=False)
302
305
303
306
307
+
----
308
+
309
+
310
+
********************
311
+
Additional Resources
312
+
********************
313
+
314
+
Here are a few resources for further reading after you complete this tutorial:
315
+
316
+
- `PyTorch 2.0 Paper <https://pytorch.org/blog/pytorch-2-paper-tutorial/>`_
317
+
- `GenAI with PyTorch 2.0 blog post series <https://pytorch.org/blog/accelerating-generative-ai-4/>`_
318
+
- `Training Production AI Models with PyTorch 2.0 <https://pytorch.org/blog/training-production-ai-models/>`_
319
+
- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach <https://pytorch.org/blog/empowering-models-performance/>`_
Copy file name to clipboardexpand all lines: docs/source-pytorch/advanced/compile.rst
+20-3
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ Speed up models by compiling them
3
3
#################################
4
4
5
5
Compiling your LightningModule can result in significant speedups, especially on the latest generations of GPUs.
6
-
This guide shows you how to apply ``torch.compile`` correctly in your code.
6
+
This guide shows you how to apply `torch.compile<https://pytorch.org/docs/stable/generated/torch.compile.html>`_ correctly in your code.
7
7
8
8
.. note::
9
9
@@ -192,6 +192,8 @@ However, when this is not possible, you can request PyTorch to compile the code
192
192
A model compiled with ``dynamic=True`` will typically be slower than a model compiled with static shapes, but it will avoid the extreme cost of recompilation every iteration.
193
193
On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically and you should no longer need to set this.
194
194
195
+
If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html#excessive-recompilation>`_ for further investigation.
196
+
195
197
196
198
----
197
199
@@ -251,9 +253,9 @@ Always compare the speed and memory usage of the compiled model against the orig
251
253
Limitations
252
254
***********
253
255
254
-
There are a few limitations you should be aware of when using ``torch.compile`` in conjunction with the Trainer:
256
+
There are a few limitations you should be aware of when using ``torch.compile`` **in conjunction with the Trainer**:
255
257
256
-
* ``torch.compile`` currently does not get reapplied over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment.
258
+
* The Trainer currently does not reapply ``torch.compile`` over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment.
257
259
This limitation will be lifted in the future.
258
260
259
261
* In some cases, using ``self.log()`` in your LightningModule will cause compilation errors.
@@ -270,4 +272,19 @@ There are a few limitations you should be aware of when using ``torch.compile``
270
272
self.model = torch.compile(self.model)
271
273
...
272
274
275
+
276
+
----
277
+
278
+
279
+
********************
280
+
Additional Resources
281
+
********************
282
+
283
+
Here are a few resources for further reading after you complete this tutorial:
284
+
285
+
- `PyTorch 2.0 Paper <https://pytorch.org/blog/pytorch-2-paper-tutorial/>`_
286
+
- `GenAI with PyTorch 2.0 blog post series <https://pytorch.org/blog/accelerating-generative-ai-4/>`_
287
+
- `Training Production AI Models with PyTorch 2.0 <https://pytorch.org/blog/training-production-ai-models/>`_
288
+
- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach <https://pytorch.org/blog/empowering-models-performance/>`_
0 commit comments