You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 6, 2021. It is now read-only.
this repository is really remarkable! Recently I'm also trying to re-implement the pointer-generator and transformer-based summarization model. And their are some issues that I want to discuss with you. I try to trainng CNNDM with transoformer(Attention is all you need version):
I set batch_size=16, lr=0.15 with adagrad just as pointer generator. After training for 27 epoch, the training loss is 3.96 and dev loss is 3.95. Is this normal? And the loss is dereceasing really slow.
In decoding mode. Which decoding strategy do you think is better? topk-topp or beam-search?
Are you using label-smoothing is the loss function? I find that lots of similiar code using label-smoothing.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
this repository is really remarkable! Recently I'm also trying to re-implement the pointer-generator and transformer-based summarization model. And their are some issues that I want to discuss with you. I try to trainng CNNDM with transoformer(Attention is all you need version):
The text was updated successfully, but these errors were encountered: