You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, they say that they tie all the latent transformer weights. However in this implementation, TF in the first layer is not shared with the rest.
Hi Phil,
Want to confirm the reason behind this design choice:
perceiver-pytorch/perceiver_pytorch/perceiver_pytorch.py
Lines 194 to 210 in c3d505a
In the paper, they say that they tie all the latent transformer weights. However in this implementation, TF in the first layer is not shared with the rest.
It should probably be
What do you think?
The text was updated successfully, but these errors were encountered: