Interesting result when applying SGD_DGP demo #2346

findoctorlin · 2023-05-16T00:04:48Z

findoctorlin
May 16, 2023

Hi everyone,

I was checking DGP example code from : https://docs.gpytorch.ai/en/stable/examples/05_Deep_Gaussian_Processes/Deep_Gaussian_Processes.html

In original code the epoch is 20 and learning rate is 0.01 for Adam algo, and the result visulization looks pretty good:

training set: Loss(Negative log likelihood): -0.6341
test set: RMSE: 0.09876643121242523, NLL: -0.6846765875816345

But when I use SGD with same epoch and learning rate(it doesn't make sense to make it same but the RMSE and NLL are similar as the training with Adam), the result visualization looks like:

training set: Loss(Negative log likelihood): 0.1388
test set: RMSE: 0.2517728805541992, NLL: 0.06959884613752365

It looks like with SGD the model statiscally can converge itself, but the prediction is a flat line which looks like the DGP model is an underfitting model, but with Adam and SGD the loss on training on training dataset, the RMSE on test dataset and the NLL on test dataset looks quite similar, I wonder why it happens with SGD?

Welcome discussion!

gpleiss · 2023-05-26T13:20:05Z

gpleiss
May 26, 2023
Maintainer

The adaptive learning rates of Adam are very necessary for GPs. We have even found this to be true for simple exact GPs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interesting result when applying SGD_DGP demo #2346

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Interesting result when applying SGD_DGP demo #2346

findoctorlin May 16, 2023

Replies: 1 comment

gpleiss May 26, 2023 Maintainer

findoctorlin
May 16, 2023

gpleiss
May 26, 2023
Maintainer