Never approximate e^x
in transformer models
#1237
bobqianic
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A few days ago, I saw a method (A Fast, Compact Approximation of the Exponential Function) that could accelerate the calculation of
e^x
without losing more than6%
accuracy. The reason I wanted to speed upe^x
is that I found thatwhisper_process_logits
takes up95%
of the sample time under the defaultbeam_size
andbest_of
settings, of which62%
of the time is spent on theexpf()
function used insoftmax
calculations. Using the method mentioned in the paper, tests showed it can accelerate by3.3X
, reducing the single calculation time from2.14ns
to0.645ns
. The test platform was an i7-12700H. However, I found that this method is not applicable to Whisper; transformers are more sensitive than I thought. The quality will decrease as-bs
and-bo
increase, and sometimes large chunks of repeated content appear. So if you want to accelerate the transformer'ssoftmax
, try not to sacrifice the accuracy of theexpf()
function.Before Correction:
After Correction:
Beta Was this translation helpful? Give feedback.
All reactions