You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nice work!
The version of implementation can reach 22+ BLUE score. However, my implementation have only 0.16+ BLUE score on test dataset. Comparing with your work, I found changing the concatenation torch.cat((Y_t, o_pre), dim=1) to torch.cat((o_pre, Y_t), dim=1) can only reach 0.16+ BLUE score.
Would you like share your ideas why concatenating Y_t and o_pre in such way?
Thank you!
The text was updated successfully, but these errors were encountered:
Thank you for your kind words !
In my opinion, I think the order of features in that case doesn't affect the performance of model. With my implementation, imagine that (Y_t, o_prev) has corresponding weights (W_y, W_o)
Then after finished training, if I change the order into (o_prev, Y_t), and also changing the order of weights into (W_o, W_y), then the output are the same : Y_t * W_y + o_prev * W_o = o_prev * W_o + Y_t * W_y
But if you use your order and train from start, I think your model has different performance just because that order (which affect the initial weight corresponding to (o_prev, Y_t)) doesn't work well with the default random seed.
You can try training a little longer, or set a different random seed and tell me your BLEU score that you have ! 💪
Nice work!
The version of implementation can reach 22+ BLUE score. However, my implementation have only 0.16+ BLUE score on test dataset. Comparing with your work, I found changing the concatenation torch.cat((Y_t, o_pre), dim=1) to torch.cat((o_pre, Y_t), dim=1) can only reach 0.16+ BLUE score.
Would you like share your ideas why concatenating Y_t and o_pre in such way?
Thank you!
The text was updated successfully, but these errors were encountered: