confidence scores output from the LM #57

Jiltseb · 2022-03-16T18:02:22Z

Is there a way to get the confidence scores (word/sub-word level) also as the output?
with decode_beams, it is possible to get the time information for alignment purposes and KenLM state, in addition to the segment level probabilities. It will be a nice addition if word-level confidence scores are also shown. Since this is calculated based on AM and LM (and optionally hotwords), we can do fine-grained analysis at the word level to remove or emphasize some words, as desired.

gkucsko · 2022-03-22T16:26:28Z

Hi, thanks for the question. As for the AM we decided to not include confidences out of the box, since there is no unique way to calculate them. Using the frame level annotations and averaging the probabilities or similar is probably the best bet here. As for respecting the LM and hotwords it gets a bit more complicated since neither are really normalized in a good way and it would probably depend heavily on the downstream task. Open to suggestions though if you have a strong use case

Jiltseb · 2022-03-23T10:58:27Z

Hi, @gkucsko Thank you very much for your reply. I can get the confidence from e2e AM by averaging the frame-level probabilities as you mentioned. But with LM, understanding the confidence with which a word is predicted could shed light on the contribution of LM (not just perplexity) and help us to decide if a particular word is suitable to process further in SLU tasks. If the contribution from individual modules can be segregated at the word level, there should be a way to track back the individual word confidences from the top beam.

patrickvonplaten · 2022-04-21T09:56:55Z

I'd also be very interested in this addition!

I think it should be relatively easy to additionally return the lm_score + am_score that pyctcdecode gives each word no?
Not sure if I understand the code a 100%, but this line here:

pyctcdecode/pyctcdecode/decoder.py

Line 326 in 9071d50

logit_score + lm_score,

defines the lm_score + am_score probability that is given by pyctcdecode no? -

The am_score corresponds to logit_score and if I understand correctly this is just \sum_[word_start, word_end] (log(logit[i)) and lm_score is the language model score returned by KenLM weighted by alpha and beta no?
So if we could just save those scores in some kind of list that would be very helpful IMO

What do you think @gkucsko ?

patrickvonplaten · 2022-04-21T09:57:35Z

Also cc @lopez86 :-)

patrickvonplaten · 2022-04-21T11:16:26Z

The main problem with using lm_score (that is already returned here:

pyctcdecode/pyctcdecode/decoder.py

Line 498 in 9071d50

for text, _, _, _, text_frames, _, logit_score, lm_score in trimmed_beams

)
for confidence scoring is that the score is not at all normalized on length. E.g. longer transcription would necessarily have a lower lm_score. One could normalize the score by the number of words but I wonder whether it's better to take the minimum of the words as described here.

Also related: https://discuss.huggingface.co/t/confidence-scores-self-training-for-wav2vec2-ctc-models-with-lm-pyctcdecode/17052

lopez86 added the enhancement New feature or request label May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

confidence scores output from the LM #57

confidence scores output from the LM #57

Jiltseb commented Mar 16, 2022

gkucsko commented Mar 22, 2022

Jiltseb commented Mar 23, 2022

patrickvonplaten commented Apr 21, 2022

patrickvonplaten commented Apr 21, 2022

patrickvonplaten commented Apr 21, 2022

confidence scores output from the LM #57

confidence scores output from the LM #57

Comments

Jiltseb commented Mar 16, 2022

gkucsko commented Mar 22, 2022

Jiltseb commented Mar 23, 2022

patrickvonplaten commented Apr 21, 2022

patrickvonplaten commented Apr 21, 2022

patrickvonplaten commented Apr 21, 2022