Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confidence scores output from the LM #57

Open
Jiltseb opened this issue Mar 16, 2022 · 5 comments
Open

confidence scores output from the LM #57

Jiltseb opened this issue Mar 16, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@Jiltseb
Copy link

Jiltseb commented Mar 16, 2022

Is there a way to get the confidence scores (word/sub-word level) also as the output?
with decode_beams, it is possible to get the time information for alignment purposes and KenLM state, in addition to the segment level probabilities. It will be a nice addition if word-level confidence scores are also shown. Since this is calculated based on AM and LM (and optionally hotwords), we can do fine-grained analysis at the word level to remove or emphasize some words, as desired.

@gkucsko
Copy link
Contributor

gkucsko commented Mar 22, 2022

Hi, thanks for the question. As for the AM we decided to not include confidences out of the box, since there is no unique way to calculate them. Using the frame level annotations and averaging the probabilities or similar is probably the best bet here. As for respecting the LM and hotwords it gets a bit more complicated since neither are really normalized in a good way and it would probably depend heavily on the downstream task. Open to suggestions though if you have a strong use case

@Jiltseb
Copy link
Author

Jiltseb commented Mar 23, 2022

Hi, @gkucsko Thank you very much for your reply. I can get the confidence from e2e AM by averaging the frame-level probabilities as you mentioned. But with LM, understanding the confidence with which a word is predicted could shed light on the contribution of LM (not just perplexity) and help us to decide if a particular word is suitable to process further in SLU tasks. If the contribution from individual modules can be segregated at the word level, there should be a way to track back the individual word confidences from the top beam.

@patrickvonplaten
Copy link
Contributor

I'd also be very interested in this addition!

I think it should be relatively easy to additionally return the lm_score + am_score that pyctcdecode gives each word no?
Not sure if I understand the code a 100%, but this line here:

logit_score + lm_score,

defines the lm_score + am_score probability that is given by pyctcdecode no? -

The am_score corresponds to logit_score and if I understand correctly this is just \sum_[word_start, word_end] (log(logit[i)) and lm_score is the language model score returned by KenLM weighted by alpha and beta no?
So if we could just save those scores in some kind of list that would be very helpful IMO

What do you think @gkucsko ?

@patrickvonplaten
Copy link
Contributor

Also cc @lopez86 :-)

@patrickvonplaten
Copy link
Contributor

The main problem with using lm_score (that is already returned here:

for text, _, _, _, text_frames, _, logit_score, lm_score in trimmed_beams
)
for confidence scoring is that the score is not at all normalized on length. E.g. longer transcription would necessarily have a lower lm_score. One could normalize the score by the number of words but I wonder whether it's better to take the minimum of the words as described here.

Also related: https://discuss.huggingface.co/t/confidence-scores-self-training-for-wav2vec2-ctc-models-with-lm-pyctcdecode/17052

@lopez86 lopez86 added the enhancement New feature or request label May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants