Skip to content

Commit f7aad69

Browse files
authored
Merge branch 'master' into feature/trainer-compile-fn
2 parents fc59439 + 4eda5a0 commit f7aad69

File tree

35 files changed

+50
-82
lines changed

35 files changed

+50
-82
lines changed

.github/CONTRIBUTING.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ We welcome any useful contribution! For your convenience here's a recommended wo
189189
#### How can I help/contribute?
190190

191191
All types of contributions are welcome - reporting bugs, fixing documentation, adding test cases, solving issues, and preparing bug fixes.
192-
To get started with code contributions, look for issues marked with the label [good first issue](https://github.com/Lightning-AI/lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) or chose something close to your domain with the label [help wanted](https://github.com/Lightning-AI/lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22). Before coding, make sure that the issue description is clear and comment on the issue so that we can assign it to you (or simply self-assign if you can).
192+
To get started with code contributions, look for issues marked with the label [good first issue](https://github.com/Lightning-AI/pytorch-lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) or chose something close to your domain with the label [help wanted](https://github.com/Lightning-AI/pytorch-lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22). Before coding, make sure that the issue description is clear and comment on the issue so that we can assign it to you (or simply self-assign if you can).
193193

194194
#### Is there a recommendation for branch names?
195195

docs/source-fabric/_templates/theme_variables.jinja

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{%- set external_urls = {
22
'github': 'https://github.com/Lightning-AI/lightning',
3-
'github_issues': 'https://github.com/Lightning-AI/lightning/issues',
3+
'github_issues': 'https://github.com/Lightning-AI/pytorch-lightning/issues',
44
'contributing': 'https://github.com/Lightning-AI/lightning/blob/master/.github/CONTRIBUTING.md',
55
'governance': 'https://lightning.ai/docs/pytorch/latest/community/governance.html',
66
'docs': 'https://lightning.ai/docs/fabric/',

docs/source-fabric/links.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
.. _PyTorchJob: https://www.kubeflow.org/docs/components/training/pytorch/
1+
.. _PyTorchJob: https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/pytorch/
22
.. _Kubeflow: https://www.kubeflow.org
33
.. _Trainer: https://lightning.ai/docs/pytorch/stable/common/trainer.html

docs/source-pytorch/_templates/theme_variables.jinja

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{%- set external_urls = {
22
'github': 'https://github.com/Lightning-AI/lightning',
3-
'github_issues': 'https://github.com/Lightning-AI/lightning/issues',
3+
'github_issues': 'https://github.com/Lightning-AI/pytorch-lightning/issues',
44
'contributing': 'https://github.com/Lightning-AI/lightning/blob/master/.github/CONTRIBUTING.md',
55
'governance': 'https://lightning.ai/docs/pytorch/latest/community/governance.html',
66
'docs': 'https://lightning.ai/docs/pytorch/latest/',

docs/source-pytorch/accelerators/accelerator_prepare.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ It is possible to perform some computation manually and log the reduced result o
123123
124124
# When you call `self.log` only on rank 0, don't forget to add
125125
# `rank_zero_only=True` to avoid deadlocks on synchronization.
126-
# Caveat: monitoring this is unimplemented, see https://github.com/Lightning-AI/lightning/issues/15852
126+
# Caveat: monitoring this is unimplemented, see https://github.com/Lightning-AI/pytorch-lightning/issues/15852
127127
if self.trainer.is_global_zero:
128128
self.log("my_reduced_metric", mean, rank_zero_only=True)
129129

docs/source-pytorch/accelerators/gpu_intermediate.rst

-4
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,6 @@ Lightning supports multiple ways of doing distributed training.
2525
.. note::
2626
If you request multiple GPUs or nodes without setting a strategy, DDP will be automatically used.
2727

28-
For a deeper understanding of what Lightning is doing, feel free to read this
29-
`guide <https://towardsdatascience.com/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565>`_.
30-
31-
3228
----
3329

3430

docs/source-pytorch/advanced/ddp_optimizations.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ On a Multi-Node Cluster, Set NCCL Parameters
5858
********************************************
5959

6060
`NCCL <https://developer.nvidia.com/nccl>`__ is the NVIDIA Collective Communications Library that is used by PyTorch to handle communication across nodes and GPUs.
61-
There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue <https://github.com/Lightning-AI/lightning/issues/7179>`__.
61+
There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue <https://github.com/Lightning-AI/pytorch-lightning/issues/7179>`__.
6262
In the issue, we see a 30% speed improvement when training the Transformer XLM-RoBERTa and a 15% improvement in training with Detectron2.
6363
NCCL parameters can be adjusted via environment variables.
6464

docs/source-pytorch/advanced/model_parallel/deepspeed.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -319,7 +319,7 @@ Additionally, DeepSpeed supports offloading to NVMe drives for even larger model
319319
)
320320
trainer.fit(model)
321321
322-
When offloading to NVMe you may notice that the speed is slow. There are parameters that need to be tuned based on the drives that you are using. Running the `aio_bench_perf_sweep.py <https://github.com/microsoft/DeepSpeed/blob/master/csrc/aio/py_test/aio_bench_perf_sweep.py>`__ script can help you to find optimum parameters. See the `issue <https://github.com/microsoft/DeepSpeed/issues/998>`__ for more information on how to parse the information.
322+
When offloading to NVMe you may notice that the speed is slow. There are parameters that need to be tuned based on the drives that you are using. Running the `aio_bench_perf_sweep.py <https://github.com/microsoft/DeepSpeed/blob/master/csrc/aio/py_test/aio_bench_perf_sweep.py>`__ script can help you to find optimum parameters. See the `issue <https://github.com/deepspeedai/DeepSpeed/issues/998>`__ for more information on how to parse the information.
323323

324324
.. _deepspeed-activation-checkpointing:
325325

docs/source-pytorch/data/alternatives.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ the desired GPU in your pipeline. When moving data to a specific device, you can
9090
WebDataset
9191
^^^^^^^^^^
9292

93-
The `WebDataset <https://webdataset.github.io/webdataset>`__ makes it easy to write I/O pipelines for large datasets.
93+
The `WebDataset <https://github.com/webdataset/webdataset>`__ makes it easy to write I/O pipelines for large datasets.
9494
Datasets can be stored locally or in the cloud. ``WebDataset`` is just an instance of a standard IterableDataset.
9595
The webdataset library contains a small wrapper (``WebLoader``) that adds a fluid interface to the DataLoader (and is otherwise identical).
9696

docs/source-pytorch/data/iterables.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ To choose a different mode, you can use the :class:`~lightning.pytorch.utilities
5050
5151
5252
Currently, the ``trainer.predict`` method only supports the ``"sequential"`` mode, while ``trainer.fit`` method does not support it.
53-
Support for this feature is tracked in this `issue <https://github.com/Lightning-AI/lightning/issues/16830>`__.
53+
Support for this feature is tracked in this `issue <https://github.com/Lightning-AI/pytorch-lightning/issues/16830>`__.
5454

5555
Note that when using the ``"sequential"`` mode, you need to add an additional argument ``dataloader_idx`` to some specific hooks.
5656
Lightning will `raise an error <https://github.com/Lightning-AI/lightning/pull/16837>`__ informing you of this requirement.

docs/source-pytorch/links.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
.. _PyTorchJob: https://www.kubeflow.org/docs/components/training/pytorch/
1+
.. _PyTorchJob: https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/pytorch/
22
.. _Kubeflow: https://www.kubeflow.org

docs/source-pytorch/versioning.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ For API removal, renaming or other forms of backwards-incompatible changes, the
6161
#. From that version onward, the deprecation warning gets converted into a helpful error, which will remain until next major release.
6262

6363
This policy is not strict. Shorter or longer deprecation cycles may apply to some cases.
64-
For example, in the past DDP2 was removed without a deprecation process because the feature was broken and unusable beyond fixing as discussed in `#12584 <https://github.com/Lightning-AI/lightning/issues/12584>`_.
65-
Also, `#10410 <https://github.com/Lightning-AI/lightning/issues/10410>`_ is an example that a longer deprecation applied to. We deprecated the accelerator arguments, such as ``Trainer(gpus=...)``, in 1.7, however, because the APIs were so core that they would impact almost all use cases, we decided not to introduce the breaking change until 2.0.
64+
For example, in the past DDP2 was removed without a deprecation process because the feature was broken and unusable beyond fixing as discussed in `#12584 <https://github.com/Lightning-AI/pytorch-lightning/issues/12584>`_.
65+
Also, `#10410 <https://github.com/Lightning-AI/pytorch-lightning/issues/10410>`_ is an example that a longer deprecation applied to. We deprecated the accelerator arguments, such as ``Trainer(gpus=...)``, in 1.7, however, because the APIs were so core that they would impact almost all use cases, we decided not to introduce the breaking change until 2.0.
6666

6767
Compatibility matrix
6868
********************

examples/fabric/reinforcement_learning/train_fabric_decoupled.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,7 @@ def trainer(
274274
if group_rank == 0:
275275
metrics = {}
276276

277-
# Lerning rate annealing
277+
# Learning rate annealing
278278
if args.anneal_lr:
279279
linear_annealing(optimizer, update, num_updates, args.learning_rate)
280280
if group_rank == 0:

src/lightning/__setup__.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -98,14 +98,13 @@ def _setup_args() -> dict[str, Any]:
9898
"entry_points": {
9999
"console_scripts": [
100100
"fabric = lightning.fabric.cli:_main",
101-
"lightning = lightning.fabric.cli:_legacy_main",
102101
],
103102
},
104103
"setup_requires": [],
105104
"install_requires": install_requires,
106105
"extras_require": _prepare_extras(),
107106
"project_urls": {
108-
"Bug Tracker": "https://github.com/Lightning-AI/lightning/issues",
107+
"Bug Tracker": "https://github.com/Lightning-AI/pytorch-lightning/issues",
109108
"Documentation": "https://lightning.ai/lightning-docs",
110109
"Source Code": "https://github.com/Lightning-AI/lightning",
111110
},

src/lightning/fabric/CHANGELOG.md

+7-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
44

55
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
66

7+
## [unreleased] - YYYY-MM-DD
8+
9+
### Removed
10+
11+
Removed legacy supoport for `lightning run model`. Use `fabric run` instead. ([#20588](https://github.com/Lightning-AI/pytorch-lightning/pull/20588))
12+
713
## [2.5.0] - 2024-12-19
814

915
### Added
@@ -331,7 +337,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
331337
### Fixed
332338

333339
- Fixed computing the next version folder in `CSVLogger` ([#17139](https://github.com/Lightning-AI/lightning/pull/17139))
334-
- Fixed inconsistent settings for FSDP Precision ([#17670](https://github.com/Lightning-AI/lightning/issues/17670))
340+
- Fixed inconsistent settings for FSDP Precision ([#17670](https://github.com/Lightning-AI/pytorch-lightning/issues/17670))
335341

336342

337343
## [2.0.2] - 2023-04-24

src/lightning/fabric/cli.py

-21
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@
1414
import logging
1515
import os
1616
import re
17-
import subprocess
18-
import sys
1917
from argparse import Namespace
2018
from typing import Any, Optional
2119

@@ -50,25 +48,6 @@ def _get_supported_strategies() -> list[str]:
5048
if _CLICK_AVAILABLE:
5149
import click
5250

53-
def _legacy_main() -> None:
54-
"""Legacy CLI handler for fabric.
55-
56-
Raises deprecation warning and runs through fabric cli if necessary, else runs the entrypoint directly
57-
58-
"""
59-
hparams = sys.argv[1:]
60-
if len(hparams) >= 2 and hparams[0] == "run" and hparams[1] == "model":
61-
print(
62-
"`lightning run model` is deprecated and will be removed in future versions."
63-
" Please call `fabric run` instead."
64-
)
65-
_main()
66-
return
67-
68-
if _LIGHTNING_SDK_AVAILABLE:
69-
subprocess.run([sys.executable, "-m", "lightning_sdk.cli.entrypoint"] + hparams)
70-
return
71-
7251
@click.group()
7352
def _main() -> None:
7453
pass

src/lightning/fabric/plugins/environments/kubeflow.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ class KubeflowEnvironment(ClusterEnvironment):
2828
This environment, unlike others, does not get auto-detected and needs to be passed to the Fabric/Trainer
2929
constructor manually.
3030
31-
.. _PyTorchJob: https://www.kubeflow.org/docs/components/training/pytorch/
31+
.. _PyTorchJob: https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/pytorch/
3232
.. _Kubeflow: https://www.kubeflow.org
3333
3434
"""

src/lightning/fabric/strategies/launchers/multiprocessing.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def __init__(
7878
def is_interactive_compatible(self) -> bool:
7979
# The start method 'spawn' is not supported in interactive environments
8080
# The start method 'fork' is the only one supported in Jupyter environments, with constraints around CUDA
81-
# initialization. For more context, see https://github.com/Lightning-AI/lightning/issues/7550
81+
# initialization. For more context, see https://github.com/Lightning-AI/pytorch-lightning/issues/7550
8282
return self._start_method == "fork"
8383

8484
@override

src/lightning/fabric/strategies/launchers/subprocess_script.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ def _check_can_spawn_children(self) -> None:
156156

157157

158158
def _basic_subprocess_cmd() -> Sequence[str]:
159-
import __main__ # local import to avoid https://github.com/Lightning-AI/lightning/issues/15218
159+
import __main__ # local import to avoid https://github.com/Lightning-AI/pytorch-lightning/issues/15218
160160

161161
if __main__.__spec__ is None: # pragma: no-cover
162162
return [sys.executable, os.path.abspath(sys.argv[0])] + sys.argv[1:]
@@ -167,7 +167,7 @@ def _hydra_subprocess_cmd(local_rank: int) -> tuple[Sequence[str], str]:
167167
from hydra.core.hydra_config import HydraConfig
168168
from hydra.utils import get_original_cwd, to_absolute_path
169169

170-
import __main__ # local import to avoid https://github.com/Lightning-AI/lightning/issues/15218
170+
import __main__ # local import to avoid https://github.com/Lightning-AI/pytorch-lightning/issues/15218
171171

172172
# when user is using hydra find the absolute path
173173
if __main__.__spec__ is None: # pragma: no-cover

src/lightning/pytorch/CHANGELOG.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -199,27 +199,27 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
199199
- Fixed handling checkpoint dirpath suffix in NeptuneLogger ([#18863](https://github.com/Lightning-AI/lightning/pull/18863))
200200
- Fixed an edge case where `ModelCheckpoint` would alternate between versioned and unversioned filename ([#19064](https://github.com/Lightning-AI/lightning/pull/19064))
201201
- Fixed broadcast at initialization in `MPIEnvironment` ([#19074](https://github.com/Lightning-AI/lightning/pull/19074))
202-
- Fixed the tensor conversion in `self.log` to respect the default dtype ([#19046](https://github.com/Lightning-AI/lightning/issues/19046))
202+
- Fixed the tensor conversion in `self.log` to respect the default dtype ([#19046](https://github.com/Lightning-AI/pytorch-lightning/issues/19046))
203203

204204

205205
## [2.1.2] - 2023-11-15
206206

207207
### Fixed
208208

209-
- Fixed an issue causing permission errors on Windows when attempting to create a symlink for the "last" checkpoint ([#18942](https://github.com/Lightning-AI/lightning/issues/18942))
210-
- Fixed an issue where Metric instances from `torchmetrics` wouldn't get moved to the device when using FSDP ([#18954](https://github.com/Lightning-AI/lightning/issues/18954))
211-
- Fixed an issue preventing the user to `Trainer.save_checkpoint()` an FSDP model when `Trainer.test/validate/predict()` ran after `Trainer.fit()` ([#18992](https://github.com/Lightning-AI/lightning/issues/18992))
209+
- Fixed an issue causing permission errors on Windows when attempting to create a symlink for the "last" checkpoint ([#18942](https://github.com/Lightning-AI/pytorch-lightning/issues/18942))
210+
- Fixed an issue where Metric instances from `torchmetrics` wouldn't get moved to the device when using FSDP ([#18954](https://github.com/Lightning-AI/pytorch-lightning/issues/18954))
211+
- Fixed an issue preventing the user to `Trainer.save_checkpoint()` an FSDP model when `Trainer.test/validate/predict()` ran after `Trainer.fit()` ([#18992](https://github.com/Lightning-AI/pytorch-lightning/issues/18992))
212212

213213

214214
## [2.1.1] - 2023-11-06
215215

216216
### Fixed
217217

218218
- Fixed an issue when replacing an existing `last.ckpt` file with a symlink ([#18793](https://github.com/Lightning-AI/lightning/pull/18793))
219-
- Fixed an issue when `BatchSizeFinder` `steps_per_trial` parameter ends up defining how many validation batches to run during the entire training ([#18394](https://github.com/Lightning-AI/lightning/issues/18394))
220-
- Fixed an issue saving the `last.ckpt` file when using `ModelCheckpoint` on a remote filesystem and no logger is used ([#18867](https://github.com/Lightning-AI/lightning/issues/18867))
219+
- Fixed an issue when `BatchSizeFinder` `steps_per_trial` parameter ends up defining how many validation batches to run during the entire training ([#18394](https://github.com/Lightning-AI/pytorch-lightning/issues/18394))
220+
- Fixed an issue saving the `last.ckpt` file when using `ModelCheckpoint` on a remote filesystem and no logger is used ([#18867](https://github.com/Lightning-AI/pytorch-lightning/issues/18867))
221221
- Refined the FSDP saving logic and error messaging when path exists ([#18884](https://github.com/Lightning-AI/lightning/pull/18884))
222-
- Fixed an issue parsing the version from folders that don't include a version number in `TensorBoardLogger` and `CSVLogger` ([#18897](https://github.com/Lightning-AI/lightning/issues/18897))
222+
- Fixed an issue parsing the version from folders that don't include a version number in `TensorBoardLogger` and `CSVLogger` ([#18897](https://github.com/Lightning-AI/pytorch-lightning/issues/18897))
223223

224224

225225
## [2.1.0] - 2023-10-11

src/lightning/pytorch/callbacks/stochastic_weight_avg.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,7 @@ def _clear_schedulers(trainer: "pl.Trainer") -> None:
354354
# Note that this relies on the callback state being restored before the scheduler state is
355355
# restored, and doesn't work if restore_checkpoint_after_setup is True, but at the time of
356356
# writing that is only True for deepspeed which is already not supported by SWA.
357-
# See https://github.com/Lightning-AI/lightning/issues/11665 for background.
357+
# See https://github.com/Lightning-AI/pytorch-lightning/issues/11665 for background.
358358
if trainer.lr_scheduler_configs:
359359
assert len(trainer.lr_scheduler_configs) == 1
360360
trainer.lr_scheduler_configs.clear()

src/lightning/pytorch/plugins/precision/xla.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def optimizer_step( # type: ignore[override]
7979
# we lack coverage here so disable this - something to explore if there's demand
8080
raise MisconfigurationException(
8181
"Skipping backward by returning `None` from your `training_step` is not implemented with XLA."
82-
" Please, open an issue in `https://github.com/Lightning-AI/lightning/issues`"
82+
" Please, open an issue in `https://github.com/Lightning-AI/pytorch-lightning/issues`"
8383
" requesting this feature."
8484
)
8585
return closure_result

src/lightning/pytorch/strategies/launchers/multiprocessing.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def __init__(
8888
def is_interactive_compatible(self) -> bool:
8989
# The start method 'spawn' is not supported in interactive environments
9090
# The start method 'fork' is the only one supported in Jupyter environments, with constraints around CUDA
91-
# initialization. For more context, see https://github.com/Lightning-AI/lightning/issues/7550
91+
# initialization. For more context, see https://github.com/Lightning-AI/pytorch-lightning/issues/7550
9292
return self._start_method == "fork"
9393

9494
@override
@@ -111,7 +111,7 @@ def launch(self, function: Callable, *args: Any, trainer: Optional["pl.Trainer"]
111111
if self._start_method == "spawn":
112112
_check_missing_main_guard()
113113
if self._already_fit and trainer is not None and trainer.state.fn == TrainerFn.FITTING:
114-
# resolving https://github.com/Lightning-AI/lightning/issues/18775 will lift this restriction
114+
# resolving https://github.com/Lightning-AI/pytorch-lightning/issues/18775 will lift this restriction
115115
raise NotImplementedError(
116116
"Calling `trainer.fit()` twice on the same Trainer instance using a spawn-based strategy is not"
117117
" supported. You can work around this limitation by creating a new Trainer instance and passing the"

0 commit comments

Comments
 (0)