Skip to content

Proper way to log things when using DDP #6501

Discussion options

You must be logged in to vote

Hi all,
Sorry we have not got back to you in time, let me try to answer some of your questions:

  1. Is validation_epoch_end only called on rank 0?

No, it is called by all processes

  1. What does the sync_dist flag do:

Here is the essential code:
https://github.com/PyTorchLightning/pytorch-lightning/blob/a72a7992a283f2eb5183d129a8cf6466903f1dc8/pytorch_lightning/core/step_result.py#L108-L115
If sync_dist=True then it will as default call the sync_ddp function which will sum the value across all processes using torch.distributed.all_reduce
https://github.com/PyTorchLightning/pytorch-lightning/blob/a72a7992a283f2eb5183d129a8cf6466903f1dc8/pytorch_lightning/utilities/distributed.py#L120
Use this …

Replies: 5 comments 28 replies

Comment options

You must be logged in to vote
1 reply
@jandonov
Comment options

Comment options

You must be logged in to vote
2 replies
@jandonov
Comment options

@rudaoshi
Comment options

Comment options

You must be logged in to vote
3 replies
@williamFalcon
Comment options

@jandonov
Comment options

@jandonov
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
22 replies
@Alec-Stashevsky
Comment options

@SkafteNicki
Comment options

@krunolp
Comment options

@mfoglio
Comment options

@davidgill97
Comment options

Answer selected by jandonov
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment