You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Multiple redundant calls to generate_example() when using multiple GPUs
Issue Description
When training with multiple devices using Fabric, the generate_example() function is redundantly called on every GPU/rank, leading to inefficient resource utilization. Each rank performs the identical computation while only the output from rank 0 is actually displayed through fabric.print(). This causes significant delays when using many devices.
Looking at the implementation, each GPU generates the same example independently, but ultimately only rank 0's output is displayed in the logs via fabric.print() in generate_example()
Expected Behavior
Only rank 0 should generate the example, or the work should be properly distributed across devices with results gathered at rank 0. The current implementation wastes GPU resources by redundantly performing the same computation across all devices.
Suggested Fix
Modify the code to only call generate_example() on rank 0, or implement proper distribution of this workload:
Bug description
Multiple redundant calls to generate_example() when using multiple GPUs
Issue Description
When training with multiple devices using Fabric, the
generate_example()
function is redundantly called on every GPU/rank, leading to inefficient resource utilization. Each rank performs the identical computation while only the output from rank 0 is actually displayed throughfabric.print()
. This causes significant delays when using many devices.litgpt/finetune/lora.py (lines 368-736) with additional logging
Reproduction Steps
Evidence
The logs show multiple "[Rank N] Generating example..." messages followed by each rank performing the same operations:
Looking at the implementation, each GPU generates the same example independently, but ultimately only rank 0's output is displayed in the logs via
fabric.print()
ingenerate_example()
Expected Behavior
Only rank 0 should generate the example, or the work should be properly distributed across devices with results gathered at rank 0. The current implementation wastes GPU resources by redundantly performing the same computation across all devices.
Suggested Fix
Modify the code to only call
generate_example()
on rank 0, or implement proper distribution of this workload:What operating system are you using?
Linux
LitGPT Version
The text was updated successfully, but these errors were encountered: