You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source-fabric/guide/multi_node/cloud.rst
+88-53
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,13 @@
1
1
:orphan:
2
2
3
-
##########################
4
-
Run in the Lightning Cloud
5
-
##########################
3
+
#############################################
4
+
Run single or multi-node on Lightning Studios
5
+
#############################################
6
6
7
7
**Audience**: Users who don't want to waste time on cluster configuration and maintenance.
8
8
9
-
10
-
The Lightning AI cloud is a platform where you can build, train, finetune and deploy models without worrying about infrastructure, cost management, scaling, and other technical headaches.
11
-
In this guide, and within just 10 minutes, you will learn how to run a Fabric training script across multiple nodes in the cloud.
9
+
`Lightning Studios <https://lightning.ai>`_ is a cloud platform where you can build, train, finetune and deploy models without worrying about infrastructure, cost management, scaling, and other technical headaches.
10
+
This guide shows you how easy it is to run a Fabric training script across multiple machines on Lightning Studios.
12
11
13
12
14
13
----
@@ -19,13 +18,8 @@ Initial Setup
19
18
*************
20
19
21
20
First, create a free `Lightning AI account <https://lightning.ai/>`_.
22
-
Then, log in from the CLI:
23
-
24
-
.. code-block:: bash
25
-
26
-
lightning login
27
-
28
-
A page opens in your browser where you can follow the instructions to complete the setup.
21
+
You get free credits every month you can spend on GPU compute.
22
+
To use machines with multiple GPUs or run jobs across machines, you need to be on the `Pro or Teams plan <https://lightning.ai/pricing>`_.
29
23
30
24
31
25
----
@@ -35,66 +29,107 @@ A page opens in your browser where you can follow the instructions to complete t
35
29
Launch multi-node training in the cloud
36
30
***************************************
37
31
38
-
**Step 1:** Put your code inside a ``lightning.app.core.work.LightningWork``:
for batch_idx, batch inenumerate(train_dataloader):
76
+
input, target = batch
77
+
output = model(input, target)
78
+
loss = F.nll_loss(output, target.view(-1))
79
+
fabric.backward(loss)
80
+
optimizer.step()
81
+
optimizer.zero_grad()
82
+
83
+
if batch_idx %10==0:
84
+
fabric.print(f"iteration: {batch_idx} - loss {loss.item():.4f}")
85
+
86
+
87
+
if__name__=="__main__":
88
+
main()
89
+
90
+
|
60
91
61
-
**Step 2:** Init a ``lightning.app.core.app.LightningApp`` with the ``FabricMultiNode`` component.
62
-
Configure the number of nodes, the number of GPUs per node, and the type of GPU:
92
+
**Step 3:** Remove hardcoded accelerator settings if any and let Lightning automatically set them for you. No other changes are required in your script.
63
93
64
94
.. code-block:: python
65
-
:emphasize-lines: 5,7
66
-
:caption: app.py
67
95
68
-
# 2. Create the app with the FabricMultiNode component inside
**Step 4:** Install dependencies and download all necessary data. Test that your script runs in the Studio first. If it runs in the Studio, it will run in multi-node!
83
105
84
-
lightning run app app.py --cloud
106
+
|
85
107
86
-
This command will upload your Python file and then opens the app admin view, where you can see the logs of what's happening.
108
+
**Step 5:** Open the Multi-Machine Training (MMT) app. Type the command to run your script, select the machine type and how many machines you want to launch it on. Click "Run" to start the job.
0 commit comments