You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now there is no feedback if step sizes are well-chosen. During optimization this can become apparent.
Too large steps would lead to parameter bouncing, with large changes in overall gradient direction. This should be visible in the relation between m and v, because the latter is the running average of g^2: |m/sqrt(v)| ~ 0.
Too small steps would lead to very smooth trajectories with lots of iterations and |m/sqrt(v)| ~ 1.
We could re-introduce Parameter.converged with a python enum of FAST, RIGHT, SLOW, so that user can inspect which parameter steps should be adjusted in case of trouble with convergence. Alternative would be a helper function to that effect.
Separate but related. We currently store m, v, vhat in Parameter, each of them has the same shape as the parameter array. When we pickle the source, they get saved so that we can restart with them. Beside the storage requirement, it is unclear if that's the best way to restart with more sources in Blend because the current sources have already converged. It's hard for a minor source (like a newly revealed detected) to fend for itself on equal footing. Empirically, this is still better than zeroing all of the stored gradient-related quantities.
But maybe there's a middle ground of setting the new step sizes to c * step * m / sqrt(vhat), so that the next step in the iteration would have a step size of the previous step (times some constant c, TBD), but gradients will be computed from scratch for all sources.
The text was updated successfully, but these errors were encountered:
Right now there is no feedback if step sizes are well-chosen. During optimization this can become apparent.
|m/sqrt(v)| ~ 0
.|m/sqrt(v)| ~ 1
.We could re-introduce
Parameter.converged
with a python enum ofFAST
,RIGHT
,SLOW
, so that user can inspect which parameter steps should be adjusted in case of trouble with convergence. Alternative would be a helper function to that effect.Separate but related. We currently store
m
,v
,vhat
inParameter
, each of them has the same shape as the parameter array. When we pickle the source, they get saved so that we can restart with them. Beside the storage requirement, it is unclear if that's the best way to restart with more sources inBlend
because the current sources have already converged. It's hard for a minor source (like a newly revealed detected) to fend for itself on equal footing. Empirically, this is still better than zeroing all of the stored gradient-related quantities.But maybe there's a middle ground of setting the new step sizes to
c * step * m / sqrt(vhat)
, so that the next step in the iteration would have a step size of the previousstep
(times some constantc
, TBD), but gradients will be computed from scratch for all sources.The text was updated successfully, but these errors were encountered: