-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
confidence scores output from the LM #57
Comments
Hi, thanks for the question. As for the AM we decided to not include confidences out of the box, since there is no unique way to calculate them. Using the frame level annotations and averaging the probabilities or similar is probably the best bet here. As for respecting the LM and hotwords it gets a bit more complicated since neither are really normalized in a good way and it would probably depend heavily on the downstream task. Open to suggestions though if you have a strong use case |
Hi, @gkucsko Thank you very much for your reply. I can get the confidence from e2e AM by averaging the frame-level probabilities as you mentioned. But with LM, understanding the confidence with which a word is predicted could shed light on the contribution of LM (not just perplexity) and help us to decide if a particular word is suitable to process further in SLU tasks. If the contribution from individual modules can be segregated at the word level, there should be a way to track back the individual word confidences from the top beam. |
I'd also be very interested in this addition! I think it should be relatively easy to additionally return the pyctcdecode/pyctcdecode/decoder.py Line 326 in 9071d50
defines the lm_score + am_score probability that is given by pyctcdecode no? -
The What do you think @gkucsko ? |
Also cc @lopez86 :-) |
The main problem with using pyctcdecode/pyctcdecode/decoder.py Line 498 in 9071d50
for confidence scoring is that the score is not at all normalized on length. E.g. longer transcription would necessarily have a lower lm_score . One could normalize the score by the number of words but I wonder whether it's better to take the minimum of the words as described here.
Also related: https://discuss.huggingface.co/t/confidence-scores-self-training-for-wav2vec2-ctc-models-with-lm-pyctcdecode/17052 |
Is there a way to get the confidence scores (word/sub-word level) also as the output?
with decode_beams, it is possible to get the time information for alignment purposes and KenLM state, in addition to the segment level probabilities. It will be a nice addition if word-level confidence scores are also shown. Since this is calculated based on AM and LM (and optionally hotwords), we can do fine-grained analysis at the word level to remove or emphasize some words, as desired.
The text was updated successfully, but these errors were encountered: