Skip to content

Commit dfdb98d

Browse files
authored
feat: Android 14+ language detection (#36)
* feat: android language detection * register events * feat: avoid re-sending language detection event if last detected language or confidence don't change * chore: update docs for language detection
1 parent 131d0ce commit dfdb98d

File tree

6 files changed

+118
-20
lines changed

6 files changed

+118
-20
lines changed

README.md

+51-16
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ expo-speech-recognition implements the iOS [`SFSpeechRecognizer`](https://develo
2424
- [Polyfilling the Web SpeechRecognition API](#polyfilling-the-web-speechrecognition-api)
2525
- [Muting the beep sound on Android](#muting-the-beep-sound-on-android)
2626
- [Improving accuracy of single-word prompts](#improving-accuracy-of-single-word-prompts)
27+
- [Language Detection](#language-detection)
2728
- [Platform Compatibility Table](#platform-compatibility-table)
2829
- [Common Troubleshooting issues](#common-troubleshooting-issues)
2930
- [Android issues](#android-issues)
@@ -36,14 +37,14 @@ expo-speech-recognition implements the iOS [`SFSpeechRecognizer`](https://develo
3637
- [getPermissionsAsync()](#getpermissionsasync-promisepermissionresponse)
3738
- [getStateAsync()](#getstateasync-promisespeechrecognitionstate)
3839
- [addSpeechRecognitionListener()](#addspeechrecognitionlistener)
39-
- [getSupportedLocales()](#getsupportedlocales-promise-locales-string-installedlocales-string-)
40+
- [getSupportedLocales()](#getsupportedlocales)
4041
- [getSpeechRecognitionServices()](#getspeechrecognitionservices-string-android-only)
4142
- [getDefaultRecognitionService()](#getdefaultrecognitionservice--packagename-string--android-only)
4243
- [getAssistantService()](#getassistantservice--packagename-string--android-only)
4344
- [isRecognitionAvailable()](#isrecognitionavailable-boolean)
4445
- [supportsOnDeviceRecognition()](#supportsondevicerecognition-boolean)
4546
- [supportsRecording()](#supportsrecording-boolean)
46-
- [androidTriggerOfflineModelDownload()](#androidtriggerofflinemodeldownload-locale-string--promise-status-opened_dialog--download_success--download_canceled-message-string-)
47+
- [androidTriggerOfflineModelDownload()](#androidtriggerofflinemodeldownload)
4748
- [setCategoryIOS()](#setcategoryios-void-ios-only)
4849
- [getAudioSessionCategoryAndOptionsIOS()](#getaudiosessioncategoryandoptionsios-ios-only)
4950
- [setAudioSessionActiveIOS()](#setaudiosessionactiveiosvalue-boolean-options--notifyothersondeactivation-boolean--void)
@@ -322,18 +323,19 @@ ExpoSpeechRecognitionModule.abort();
322323

323324
Events are largely based on the [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition). The following events are supported:
324325

325-
| Event Name | Description | Notes |
326-
| -------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
327-
| `audiostart` | Audio capturing has started | Includes the `uri` if `recordingOptions.persist` is enabled. |
328-
| `audioend` | Audio capturing has ended | Includes the `uri` if `recordingOptions.persist` is enabled. |
329-
| `end` | Speech recognition service has disconnected. | This should always be the last event dispatched, including after errors. |
330-
| `error` | Fired when a speech recognition error occurs. | You'll also receive an `error` event (with code "aborted") when calling `.abort()` |
331-
| `nomatch` | Speech recognition service returns a final result with no significant recognition. | You may have non-final results recognized. This may get emitted after cancellation. |
332-
| `result` | Speech recognition service returns a word or phrase has been positively recognized. | On Android, continous mode runs as a segmented session, meaning when a final result is reached, additional partial and final results will cover a new segment separate from the previous final result. On iOS, you should expect one final result before speech recognition has stopped. |
333-
| `speechstart` | Fired when any sound — recognizable speech or not — has been detected | On iOS, this will fire once in the session after a result has occurred |
334-
| `speechend` | Fired when speech recognized by the speech recognition service has stopped being detected. | Not supported yet on iOS |
335-
| `start` | Speech recognition has started | Use this event to indicate to the user when to speak. |
336-
| `volumechange` | Fired when the input volume changes. | Returns a value between -2 and 10 indicating the volume of the input audio. Consider anything below 0 to be inaudible. |
326+
| Event Name | Description | Notes |
327+
| ------------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
328+
| `audiostart` | Audio capturing has started | Includes the `uri` if `recordingOptions.persist` is enabled. |
329+
| `audioend` | Audio capturing has ended | Includes the `uri` if `recordingOptions.persist` is enabled. |
330+
| `end` | Speech recognition service has disconnected. | This should always be the last event dispatched, including after errors. |
331+
| `error` | Fired when a speech recognition error occurs. | You'll also receive an `error` event (with code "aborted") when calling `.abort()` |
332+
| `nomatch` | Speech recognition service returns a final result with no significant recognition. | You may have non-final results recognized. This may get emitted after cancellation. |
333+
| `result` | Speech recognition service returns a word or phrase has been positively recognized. | On Android, continous mode runs as a segmented session, meaning when a final result is reached, additional partial and final results will cover a new segment separate from the previous final result. On iOS, you should expect one final result before speech recognition has stopped. |
334+
| `speechstart` | Fired when any sound — recognizable speech or not — has been detected | On iOS, this will fire once in the session after a result has occurred |
335+
| `speechend` | Fired when speech recognized by the speech recognition service has stopped being detected. | Not supported yet on iOS |
336+
| `start` | Speech recognition has started | Use this event to indicate to the user when to speak. |
337+
| `volumechange` | Fired when the input volume changes. | Returns a value between -2 and 10 indicating the volume of the input audio. Consider anything below 0 to be inaudible. |
338+
| `languagedetection` | Called when the language detection (and switching) results are available. | Android 14+ only with `com.google.android.as`. Enabled with `EXTRA_ENABLE_LANGUAGE_DETECTION` in the `androidIntent` option when starting. Also can be called multiple times by enabling `EXTRA_ENABLE_LANGUAGE_SWITCH`. |
337339

338340
## Handling Errors
339341

@@ -696,6 +698,39 @@ You may notice that after saying short syllables, words, letters, or numbers (e.
696698
- For both platforms, you also may want to consider using on-device recognition. On Android this seems to work well for single-word prompts.
697699
- Alternatively, you may want to consider recording the recognized audio and sending it to an external service for further processing. See [Persisting Audio Recordings](#persisting-audio-recordings) for more information. Note that some services (such as the Google Speech API) may require an audio file with a duration of at least 3 seconds.
698700

701+
## Language Detection
702+
703+
> [!NOTE]
704+
> This feature is currently only available on Android 14+ using the `com.google.android.as` service package.
705+
706+
You can use the `languagedetection` event to get the detected language and confidence level. This feature has a few requirements:
707+
708+
- Android 14+ only.
709+
- The `com.google.android.as` (on-device recognition) service package must be selected. This seems to be the only service that supports language detection as of writing this.
710+
- You must enable `EXTRA_ENABLE_LANGUAGE_DETECTION` in the `androidIntentOptions` when starting the recognition.
711+
- Optional: You can enable `EXTRA_ENABLE_LANGUAGE_SWITCH` to allow the user to switch languages, however **keep in mind that you need the language model to be downloaded for this to work**. Refer to [androidTriggerOfflineModelDownload()](#androidtriggerofflinemodeldownload) to download a model, and [getSupportedLocales()](#getsupportedlocales) to get the list of downloaded on-device locales.
712+
713+
Example:
714+
715+
```tsx
716+
import { useSpeechRecognitionEvent } from "expo-speech-recognition";
717+
718+
useSpeechRecognitionEvent("languagedetection", (event) => {
719+
console.log("Language detected:", event.detectedLanguage); // e.g. "en-us"
720+
console.log("Confidence:", event.confidence); // A value between 0.0 and 1.0
721+
console.log("Top locale alternatives:", event.topLocaleAlternatives); // e.g. ["en-au", "en-gb"]
722+
});
723+
724+
// Start recognition
725+
ExpoSpeechRecognitionModule.start({
726+
androidIntentOptions: {
727+
EXTRA_ENABLE_LANGUAGE_DETECTION: true,
728+
EXTRA_ENABLE_LANGUAGE_SWITCH: true,
729+
},
730+
androidRecognitionServicePackage: "com.google.android.as", // or set "requiresOnDeviceRecognition" to true
731+
});
732+
```
733+
699734
## Platform Compatibility Table
700735

701736
As of 7 Aug 2024, the following platforms are supported:
@@ -852,7 +887,7 @@ const listener = addSpeechRecognitionListener("result", (event) => {
852887
listener.remove();
853888
```
854889

855-
### `getSupportedLocales(): Promise<{ locales: string[]; installedLocales: string[] }>`
890+
### `getSupportedLocales()`
856891

857892
> [!NOTE]
858893
> Not supported on Android 12 and below
@@ -966,7 +1001,7 @@ const available = supportsRecording();
9661001
console.log("Recording available:", available);
9671002
```
9681003

969-
### `androidTriggerOfflineModelDownload({ locale: string }): Promise<{ status: "opened_dialog" | "download_success" | "download_canceled", message: string }>`
1004+
### `androidTriggerOfflineModelDownload()`
9701005

9711006
Users on Android devices will first need to download the offline model for the locale they want to use in order to use on-device speech recognition (i.e. the `requiresOnDeviceRecognition` setting in the `start` options).
9721007

android/src/main/java/expo/modules/speechrecognition/ExpoSpeechRecognitionModule.kt

+2
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,8 @@ class ExpoSpeechRecognitionModule : Module() {
8686
"start",
8787
// Called when there's results (as a string array, not API compliant)
8888
"results",
89+
// Called when the language detection (and switching) results are available.
90+
"languagedetection",
8991
// Fired when the input volume changes
9092
"volumechange",
9193
)

android/src/main/java/expo/modules/speechrecognition/ExpoSpeechService.kt

+35-1
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,9 @@ class ExpoSpeechService(
6060
private var delayedFileStreamer: DelayedFileStreamer? = null
6161
private var soundState = SoundState.INACTIVE
6262

63+
private var lastDetectedLanguage: String? = null
64+
private var lastLanguageConfidence: Float? = null
65+
6366
var recognitionState = RecognitionState.INACTIVE
6467

6568
companion object {
@@ -121,6 +124,8 @@ class ExpoSpeechService(
121124
audioRecorder = null
122125
delayedFileStreamer?.close()
123126
delayedFileStreamer = null
127+
lastDetectedLanguage = null
128+
lastLanguageConfidence = null
124129
recognitionState = RecognitionState.STARTING
125130
soundState = SoundState.INACTIVE
126131
lastVolumeChangeEventTime = 0L
@@ -435,7 +440,7 @@ class ExpoSpeechService(
435440
when {
436441
// File URI
437442
sourceUri.startsWith("file://") -> File(URI(sourceUri))
438-
443+
439444
// Local file path without URI scheme
440445
!sourceUri.startsWith("https://") -> File(sourceUri)
441446

@@ -581,6 +586,15 @@ class ExpoSpeechService(
581586
else -> 0.0f
582587
}
583588

589+
private fun languageDetectionConfidenceLevelToFloat(confidenceLevel: Int): Float =
590+
when (confidenceLevel) {
591+
SpeechRecognizer.LANGUAGE_DETECTION_CONFIDENCE_LEVEL_HIGHLY_CONFIDENT -> 1.0f
592+
SpeechRecognizer.LANGUAGE_DETECTION_CONFIDENCE_LEVEL_CONFIDENT -> 0.8f
593+
SpeechRecognizer.LANGUAGE_DETECTION_CONFIDENCE_LEVEL_NOT_CONFIDENT -> 0.5f
594+
SpeechRecognizer.LANGUAGE_DETECTION_CONFIDENCE_LEVEL_UNKNOWN -> 0f
595+
else -> 0.0f
596+
}
597+
584598
override fun onResults(results: Bundle?) {
585599
val resultsList = getResults(results)
586600

@@ -614,6 +628,26 @@ class ExpoSpeechService(
614628
}
615629
}
616630

631+
override fun onLanguageDetection(results: Bundle) {
632+
val detectedLanguage = results.getString(SpeechRecognizer.DETECTED_LANGUAGE)
633+
val confidence = languageDetectionConfidenceLevelToFloat(results.getInt(SpeechRecognizer.LANGUAGE_DETECTION_CONFIDENCE_LEVEL))
634+
635+
// Only send event if language or confidence has changed
636+
if (detectedLanguage != lastDetectedLanguage || confidence != lastLanguageConfidence) {
637+
lastDetectedLanguage = detectedLanguage
638+
lastLanguageConfidence = confidence
639+
640+
sendEvent(
641+
"languagedetection",
642+
mapOf(
643+
"detectedLanguage" to detectedLanguage,
644+
"confidence" to confidence,
645+
"topLocaleAlternatives" to results.getStringArrayList(SpeechRecognizer.TOP_LOCALE_ALTERNATIVES),
646+
),
647+
)
648+
}
649+
}
650+
617651
/**
618652
* For API 33: Basically same as onResults but doesn't stop
619653
*/

example/App.tsx

+10-3
Original file line numberDiff line numberDiff line change
@@ -93,8 +93,6 @@ export default function App() {
9393
const transcript = ev.results[0]?.transcript || "";
9494

9595
setTranscription((current) => {
96-
// When a final result comes in, we need to update the base transcript to build off from
97-
// Because on Android and Web, multiple final results can be returned within a continuous session
9896
// When a final result is received, any following recognized transcripts will omit the previous final result
9997
const transcriptTally = ev.isFinal
10098
? (current?.transcriptTally ?? "") + transcript
@@ -126,6 +124,10 @@ export default function App() {
126124
console.log("[event]: nomatch");
127125
});
128126

127+
useSpeechRecognitionEvent("languagedetection", (ev) => {
128+
console.log("[event]: languagedetection", ev);
129+
});
130+
129131
const startListening = () => {
130132
if (status !== "idle") {
131133
return;
@@ -574,7 +576,12 @@ function GeneralSettings(props: {
574576
: locale
575577
}
576578
active={settings.lang === locale}
577-
onPress={() => handleChange("lang", locale)}
579+
onPress={() =>
580+
handleChange(
581+
"lang",
582+
settings.lang === locale ? undefined : locale,
583+
)
584+
}
578585
/>
579586
);
580587
})}

ios/ExpoSpeechRecognitionModule.swift

+2
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ public class ExpoSpeechRecognitionModule: Module {
9494
"start",
9595
// Called when there's results (as a string array, not API compliant)
9696
"result",
97+
// Called when the language detection (and switching) results are available.
98+
"languagedetection",
9799
// Fired when the input volume changes
98100
"volumechange"
99101
)

0 commit comments

Comments
 (0)