Skip to content

Commit d56ab95

Browse files
committed
Merge branch 'master' into removal/colossal
2 parents be57bb1 + 63188f9 commit d56ab95

File tree

236 files changed

+1998
-1359
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

236 files changed

+1998
-1359
lines changed

.github/workflows/ci-examples-app.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ jobs:
6767
run: python .actions/assistant.py replace_oldest_ver
6868

6969
- name: pip wheels cache
70-
uses: actions/cache/restore@v3
70+
uses: actions/cache/restore@v4
7171
with:
7272
path: ${{ env.PYPI_CACHE_DIR }}
7373
key: pypi_wheels

.github/workflows/ci-tests-app.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ jobs:
7373
run: python .actions/assistant.py replace_oldest_ver
7474

7575
- name: pip wheels cache
76-
uses: actions/cache/restore@v3
76+
uses: actions/cache/restore@v4
7777
with:
7878
path: ${{ env.PYPI_CACHE_DIR }}
7979
key: pypi_wheels

.github/workflows/ci-tests-fabric.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ jobs:
114114
done
115115
116116
- name: pip wheels cache
117-
uses: actions/cache/restore@v3
117+
uses: actions/cache/restore@v4
118118
with:
119119
path: ${{ env.PYPI_CACHE_DIR }}
120120
key: pypi_wheels

.github/workflows/ci-tests-pytorch.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ jobs:
120120
cat requirements/pytorch/base.txt
121121
122122
- name: pip wheels cache
123-
uses: actions/cache/restore@v3
123+
uses: actions/cache/restore@v4
124124
with:
125125
path: ${{ env.PYPI_CACHE_DIR }}
126126
key: pypi_wheels
@@ -161,7 +161,7 @@ jobs:
161161
cache-key: "pypi_wheels"
162162

163163
- name: Cache datasets
164-
uses: actions/cache@v3
164+
uses: actions/cache@v4
165165
with:
166166
path: Datasets
167167
key: pl-dataset

.github/workflows/code-checks.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
python-version: "3.10.6"
3535

3636
- name: Mypy cache
37-
uses: actions/cache@v3
37+
uses: actions/cache@v4
3838
with:
3939
path: .mypy_cache
4040
key: mypy-${{ hashFiles('requirements/typing.txt') }}

.github/workflows/docs-build.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ jobs:
8080
pip install lai-sphinx-theme -U -f ${PYPI_LOCAL_DIR}
8181
8282
- name: pip wheels cache
83-
uses: actions/cache/restore@v3
83+
uses: actions/cache/restore@v4
8484
with:
8585
path: ${{ env.PYPI_CACHE_DIR }}
8686
key: pypi_wheels

.pre-commit-config.yaml

+3-1
Original file line numberDiff line numberDiff line change
@@ -84,10 +84,12 @@ repos:
8484
- flake8-return
8585

8686
- repo: https://github.com/astral-sh/ruff-pre-commit
87-
rev: "v0.1.15"
87+
rev: "v0.2.0"
8888
hooks:
8989
- id: ruff
9090
args: ["--fix", "--preview"]
91+
- id: ruff-format
92+
args: ["--preview"]
9193

9294
- repo: https://github.com/executablebooks/mdformat
9395
rev: 0.7.17

docs/source-fabric/fundamentals/convert.rst

+15
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,21 @@ Check out our before-and-after example for `image classification <https://github
9090
----
9191

9292

93+
****************
94+
Optional changes
95+
****************
96+
97+
Here are a few optional upgrades you can make to your code, if applicable:
98+
99+
- Replace ``torch.save()`` and ``torch.load()`` with Fabric's :doc:`save and load methods <../guide/checkpoint/checkpoint>`.
100+
- Replace collective operations from ``torch.distributed`` (barrier, broadcast, etc.) with Fabric's :doc:`collective methods <../advanced/distributed_communication>`.
101+
- Use Fabric's :doc:`no_backward_sync() context manager <../advanced/gradient_accumulation>` if you implemented gradient accumulation.
102+
- Initialize your model under the :doc:`init_module() <../advanced/model_init>` context manager.
103+
104+
105+
----
106+
107+
93108
**********
94109
Next steps
95110
**********

docs/source-pytorch/cli/lightning_cli_advanced_3.rst

+10
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,7 @@ Since the init parameters of the model have as a type hint a class, in the confi
197197
decoder: Instance of a module for decoding
198198
"""
199199
super().__init__()
200+
self.save_hyperparameters()
200201
self.encoder = encoder
201202
self.decoder = decoder
202203

@@ -216,6 +217,13 @@ If the CLI is implemented as ``LightningCLI(MyMainModel)`` the configuration wou
216217
217218
It is also possible to combine ``subclass_mode_model=True`` and submodules, thereby having two levels of ``class_path``.
218219

220+
.. tip::
221+
222+
By having ``self.save_hyperparameters()`` it becomes possible to load the model from a checkpoint. Simply do
223+
``ModelClass.load_from_checkpoint("path/to/checkpoint.ckpt")``. In the case of using ``subclass_mode_model=True``,
224+
then load it like ``LightningModule.load_from_checkpoint("path/to/checkpoint.ckpt")``. ``save_hyperparameters`` is
225+
optional and can be safely removed if there is no need to load from a checkpoint.
226+
219227

220228
Fixed optimizer and scheduler
221229
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -279,6 +287,7 @@ An example of a model that uses two optimizers is the following:
279287
class MyModel(LightningModule):
280288
def __init__(self, optimizer1: OptimizerCallable, optimizer2: OptimizerCallable):
281289
super().__init__()
290+
self.save_hyperparameters()
282291
self.optimizer1 = optimizer1
283292
self.optimizer2 = optimizer2
284293
@@ -318,6 +327,7 @@ that uses dependency injection for an optimizer and a learning scheduler is:
318327
scheduler: LRSchedulerCallable = torch.optim.lr_scheduler.ConstantLR,
319328
):
320329
super().__init__()
330+
self.save_hyperparameters()
321331
self.optimizer = optimizer
322332
self.scheduler = scheduler
323333

docs/source-pytorch/conf.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ def _load_py_module(name: str, location: str) -> ModuleType:
9292
assist_local.AssistantCLI.pull_docs_files(
9393
gh_user_repo="Lightning-AI/lightning-Habana",
9494
target_dir="docs/source-pytorch/integrations/hpu",
95-
checkout="refs/tags/1.3.0",
95+
checkout="refs/tags/1.4.0",
9696
)
9797

9898
# Copy strategies docs as single pages
@@ -610,4 +610,5 @@ def package_list_from_file(file):
610610
"https://deepgenerativemodels.github.io/assets/slides/cs236_lecture11.pdf",
611611
"https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html",
612612
"https://www.microsoft.com/en-us/research/blog/zero-infinity-and-deepspeed-unlocking-unprecedented-model-scale-for-deep-learning-training/", # noqa: E501
613+
"https://stackoverflow.com/questions/66640705/how-can-i-install-grpcio-on-an-apple-m1-silicon-laptop",
613614
]

docs/source-pytorch/debug/debugging_basic.rst

+10-10
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,11 @@ this generate a table like:
114114

115115
.. code-block:: text
116116
117-
| Name | Type | Params
118-
----------------------------------
119-
0 | net | Sequential | 132 K
120-
1 | net.0 | Linear | 131 K
121-
2 | net.1 | BatchNorm1d | 1.0 K
117+
| Name | Type | Params | Mode
118+
-------------------------------------------
119+
0 | net | Sequential | 132 K | train
120+
1 | net.0 | Linear | 131 K | train
121+
2 | net.1 | BatchNorm1d | 1.0 K | train
122122
123123
To add the child modules to the summary add a :class:`~lightning.pytorch.callbacks.model_summary.ModelSummary`:
124124

@@ -162,10 +162,10 @@ With the input array, the summary table will include the input and output layer
162162

163163
.. code-block:: text
164164
165-
| Name | Type | Params | In sizes | Out sizes
166-
--------------------------------------------------------------
167-
0 | net | Sequential | 132 K | [10, 256] | [10, 512]
168-
1 | net.0 | Linear | 131 K | [10, 256] | [10, 512]
169-
2 | net.1 | BatchNorm1d | 1.0 K | [10, 512] | [10, 512]
165+
| Name | Type | Params | Mode | In sizes | Out sizes
166+
----------------------------------------------------------------------
167+
0 | net | Sequential | 132 K | train | [10, 256] | [10, 512]
168+
1 | net.0 | Linear | 131 K | train | [10, 256] | [10, 512]
169+
2 | net.1 | BatchNorm1d | 1.0 K | train | [10, 512] | [10, 512]
170170
171171
when you call ``.fit()`` on the Trainer. This can help you find bugs in the composition of your layers.

examples/app/dag/app.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,9 @@ def __init__(self, models_paths: list):
6565
)
6666

6767
# Step 3: Create the work to train the models_paths in parallel.
68-
self.dict = Dict(
69-
**{model_path.split(".")[-1]: ModelWork(model_path, parallel=True) for model_path in models_paths}
70-
)
68+
self.dict = Dict(**{
69+
model_path.split(".")[-1]: ModelWork(model_path, parallel=True) for model_path in models_paths
70+
})
7171

7272
# Step 4: Some element to track components progress.
7373
self.has_completed = False

examples/app/server/app.py

+5-7
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,11 @@ def setup(self):
2020
def predict(self, request):
2121
image = base64.b64decode(request.image.encode("utf-8"))
2222
image = Image.open(io.BytesIO(image))
23-
transforms = torchvision.transforms.Compose(
24-
[
25-
torchvision.transforms.Resize(224),
26-
torchvision.transforms.ToTensor(),
27-
torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
28-
]
29-
)
23+
transforms = torchvision.transforms.Compose([
24+
torchvision.transforms.Resize(224),
25+
torchvision.transforms.ToTensor(),
26+
torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
27+
])
3028
image = transforms(image)
3129
image = image.to(self._device)
3230
prediction = self._model(image.unsqueeze(0))

examples/app/server_with_auto_scaler/app.py

+5-7
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,11 @@ def setup(self):
3434
self._model = torchvision.models.resnet18(pretrained=True).to(self._device)
3535

3636
def predict(self, requests: BatchRequestModel):
37-
transforms = torchvision.transforms.Compose(
38-
[
39-
torchvision.transforms.Resize(224),
40-
torchvision.transforms.ToTensor(),
41-
torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
42-
]
43-
)
37+
transforms = torchvision.transforms.Compose([
38+
torchvision.transforms.Resize(224),
39+
torchvision.transforms.ToTensor(),
40+
torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
41+
])
4442
images = []
4543
for request in requests.inputs:
4644
image = app.components.serve.types.image.Image.deserialize(request.image)

examples/fabric/dcgan/train_fabric.py

+8-9
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
Code adapted from the official PyTorch DCGAN tutorial:
55
https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
66
"""
7+
78
import os
89
import time
910
from pathlib import Path
@@ -55,14 +56,12 @@ def main():
5556
root=dataroot,
5657
split="all",
5758
download=True,
58-
transform=transforms.Compose(
59-
[
60-
transforms.Resize(image_size),
61-
transforms.CenterCrop(image_size),
62-
transforms.ToTensor(),
63-
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
64-
]
65-
),
59+
transform=transforms.Compose([
60+
transforms.Resize(image_size),
61+
transforms.CenterCrop(image_size),
62+
transforms.ToTensor(),
63+
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
64+
]),
6665
)
6766

6867
# Create the dataloader
@@ -227,7 +226,7 @@ def __init__(self):
227226
nn.ReLU(True),
228227
# state size. (ngf) x 32 x 32
229228
nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
230-
nn.Tanh()
229+
nn.Tanh(),
231230
# state size. (nc) x 64 x 64
232231
)
233232

examples/fabric/dcgan/train_torch.py

+8-9
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
Code adapted from the official PyTorch DCGAN tutorial:
55
https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
66
"""
7+
78
import os
89
import random
910
import time
@@ -55,14 +56,12 @@ def main():
5556
root=dataroot,
5657
split="all",
5758
download=True,
58-
transform=transforms.Compose(
59-
[
60-
transforms.Resize(image_size),
61-
transforms.CenterCrop(image_size),
62-
transforms.ToTensor(),
63-
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
64-
]
65-
),
59+
transform=transforms.Compose([
60+
transforms.Resize(image_size),
61+
transforms.CenterCrop(image_size),
62+
transforms.ToTensor(),
63+
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
64+
]),
6665
)
6766

6867
# Create the dataloader
@@ -236,7 +235,7 @@ def __init__(self):
236235
nn.ReLU(True),
237236
# state size. (ngf) x 32 x 32
238237
nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
239-
nn.Tanh()
238+
nn.Tanh(),
240239
# state size. (nc) x 64 x 64
241240
)
242241

examples/fabric/meta_learning/train_fabric.py

+1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
Run it with:
1515
lightning run model train_fabric.py --accelerator=cuda --devices=2 --strategy=ddp
1616
"""
17+
1718
import cherry
1819
import learn2learn as l2l
1920
import torch

examples/fabric/meta_learning/train_torch.py

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
Run it with:
1616
torchrun --nproc_per_node=2 --standalone train_torch.py
1717
"""
18+
1819
import os
1920
import random
2021

examples/fabric/reinforcement_learning/train_fabric.py

+4-8
Original file line numberDiff line numberDiff line change
@@ -84,14 +84,10 @@ def main(args: argparse.Namespace):
8484
)
8585

8686
# Environment setup
87-
envs = gym.vector.SyncVectorEnv(
88-
[
89-
make_env(
90-
args.env_id, args.seed + rank * args.num_envs + i, rank, args.capture_video, logger.log_dir, "train"
91-
)
92-
for i in range(args.num_envs)
93-
]
94-
)
87+
envs = gym.vector.SyncVectorEnv([
88+
make_env(args.env_id, args.seed + rank * args.num_envs + i, rank, args.capture_video, logger.log_dir, "train")
89+
for i in range(args.num_envs)
90+
])
9591
assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"
9692

9793
# Define the agent and the optimizer and setup them with Fabric

examples/fabric/reinforcement_learning/train_fabric_decoupled.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ def player(args, world_collective: TorchCollective, player_trainer_collective: T
5959
)
6060

6161
# Environment setup
62-
envs = gym.vector.SyncVectorEnv(
63-
[make_env(args.env_id, args.seed + i, 0, args.capture_video, log_dir, "train") for i in range(args.num_envs)]
64-
)
62+
envs = gym.vector.SyncVectorEnv([
63+
make_env(args.env_id, args.seed + i, 0, args.capture_video, log_dir, "train") for i in range(args.num_envs)
64+
])
6565
assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"
6666

6767
# Define the agent

examples/fabric/reinforcement_learning/train_torch.py

+11-13
Original file line numberDiff line numberDiff line change
@@ -142,19 +142,17 @@ def main(args: argparse.Namespace):
142142
)
143143

144144
# Environment setup
145-
envs = gym.vector.SyncVectorEnv(
146-
[
147-
make_env(
148-
args.env_id,
149-
args.seed + global_rank * args.num_envs + i,
150-
global_rank,
151-
args.capture_video,
152-
logger.log_dir if global_rank == 0 else None,
153-
"train",
154-
)
155-
for i in range(args.num_envs)
156-
]
157-
)
145+
envs = gym.vector.SyncVectorEnv([
146+
make_env(
147+
args.env_id,
148+
args.seed + global_rank * args.num_envs + i,
149+
global_rank,
150+
args.capture_video,
151+
logger.log_dir if global_rank == 0 else None,
152+
"train",
153+
)
154+
for i in range(args.num_envs)
155+
])
158156
assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"
159157

160158
# Define the agent and the optimizer and setup them with DistributedDataParallel

examples/pytorch/basics/autoencoder.py

+1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
To run: python autoencoder.py --trainer.max_epochs=50
1717
1818
"""
19+
1920
from os import path
2021
from typing import Optional, Tuple
2122

0 commit comments

Comments
 (0)