Interesting result when applying SGD_DGP demo #2346
findoctorlin
started this conversation in
General
Replies: 1 comment
-
The adaptive learning rates of Adam are very necessary for GPs. We have even found this to be true for simple exact GPs. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone,
I was checking DGP example code from : https://docs.gpytorch.ai/en/stable/examples/05_Deep_Gaussian_Processes/Deep_Gaussian_Processes.html
In original code the epoch is 20 and learning rate is 0.01 for Adam algo, and the result visulization looks pretty good:

training set: Loss(Negative log likelihood): -0.6341
test set: RMSE: 0.09876643121242523, NLL: -0.6846765875816345
But when I use SGD with same epoch and learning rate(it doesn't make sense to make it same but the RMSE and NLL are similar as the training with Adam), the result visualization looks like:

training set: Loss(Negative log likelihood): 0.1388
test set: RMSE: 0.2517728805541992, NLL: 0.06959884613752365
It looks like with SGD the model statiscally can converge itself, but the prediction is a flat line which looks like the DGP model is an underfitting model, but with Adam and SGD the loss on training on training dataset, the RMSE on test dataset and the NLL on test dataset looks quite similar, I wonder why it happens with SGD?
Welcome discussion!
Beta Was this translation helpful? Give feedback.
All reactions