Skip to content

Commit 2282648

Browse files
committedOct 31, 2024
Merge branch 'main' into fix/ios-non-continuous-recognition
2 parents 28a587d + 131d0ce commit 2282648

17 files changed

+924
-230
lines changed
 

‎README.md

+63-12
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ expo-speech-recognition implements the iOS [`SFSpeechRecognizer`](https://develo
1919
- [Transcribing audio files](#transcribing-audio-files)
2020
- [Supported input audio formats](#supported-input-audio-formats)
2121
- [File transcription example](#file-transcription-example)
22+
- [Volume metering](#volume-metering)
23+
- [Volume metering example](#volume-metering-example)
2224
- [Polyfilling the Web SpeechRecognition API](#polyfilling-the-web-speechrecognition-api)
2325
- [Muting the beep sound on Android](#muting-the-beep-sound-on-android)
2426
- [Improving accuracy of single-word prompts](#improving-accuracy-of-single-word-prompts)
@@ -237,12 +239,15 @@ ExpoSpeechRecognitionModule.start({
237239
// The maximum number of alternative transcriptions to return.
238240
maxAlternatives: 1,
239241
// [Default: false] Continuous recognition.
240-
// If false on iOS, recognition will run until no speech is detected for 3 seconds.
242+
// If false:
243+
// - on iOS 17-, recognition will run until no speech is detected for 3 seconds.
244+
// - on iOS 18+ and Android, recognition will run until a final result is received.
241245
// Not supported on Android 12 and below.
242246
continuous: true,
243247
// [Default: false] Prevent device from sending audio over the network. Only enabled if the device supports it.
244248
requiresOnDeviceRecognition: false,
245249
// [Default: false] Include punctuation in the recognition results. This applies to full stops and commas.
250+
// Not supported on Android 12 and below. On Android 13+, only supported when on-device recognition is enabled.
246251
addsPunctuation: false,
247252
// [Default: undefined] Short custom phrases that are unique to your app.
248253
contextualStrings: ["Carlsen", "Nepomniachtchi", "Praggnanandhaa"],
@@ -297,6 +302,13 @@ ExpoSpeechRecognitionModule.start({
297302
// Default: 50ms for network-based recognition, 15ms for on-device recognition
298303
chunkDelayMillis: undefined,
299304
},
305+
// Settings for volume change events.
306+
volumeChangeEventOptions: {
307+
// [Default: false] Whether to emit the `volumechange` events when the input volume changes.
308+
enabled: false,
309+
// [Default: 100ms on iOS] The interval (in milliseconds) to emit `volumechange` events.
310+
intervalMillis: 300,
311+
},
300312
});
301313

302314
// Stop capturing audio (and emit a final result if there is one)
@@ -310,17 +322,18 @@ ExpoSpeechRecognitionModule.abort();
310322

311323
Events are largely based on the [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition). The following events are supported:
312324

313-
| Event Name | Description | Notes |
314-
| ------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
315-
| `audiostart` | Audio capturing has started | Includes the `uri` if `recordingOptions.persist` is enabled. |
316-
| `audioend` | Audio capturing has ended | Includes the `uri` if `recordingOptions.persist` is enabled. |
317-
| `end` | Speech recognition service has disconnected. | This should always be the last event dispatched, including after errors. |
318-
| `error` | Fired when a speech recognition error occurs. | You'll also receive an `error` event (with code "aborted") when calling `.abort()` |
319-
| `nomatch` | Speech recognition service returns a final result with no significant recognition. | You may have non-final results recognized. This may get emitted after cancellation. |
320-
| `result` | Speech recognition service returns a word or phrase has been positively recognized. | On Android, continous mode runs as a segmented session, meaning when a final result is reached, additional partial and final results will cover a new segment separate from the previous final result. On iOS, you should expect one final result before speech recognition has stopped. |
321-
| `speechstart` | Fired when any sound — recognizable speech or not — has been detected | On iOS, this will fire once in the session after a result has occurred |
322-
| `speechend` | Fired when speech recognized by the speech recognition service has stopped being detected. | Not supported yet on iOS |
323-
| `start` | Speech recognition has started | Use this event to indicate to the user when to speak. |
325+
| Event Name | Description | Notes |
326+
| -------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
327+
| `audiostart` | Audio capturing has started | Includes the `uri` if `recordingOptions.persist` is enabled. |
328+
| `audioend` | Audio capturing has ended | Includes the `uri` if `recordingOptions.persist` is enabled. |
329+
| `end` | Speech recognition service has disconnected. | This should always be the last event dispatched, including after errors. |
330+
| `error` | Fired when a speech recognition error occurs. | You'll also receive an `error` event (with code "aborted") when calling `.abort()` |
331+
| `nomatch` | Speech recognition service returns a final result with no significant recognition. | You may have non-final results recognized. This may get emitted after cancellation. |
332+
| `result` | Speech recognition service returns a word or phrase has been positively recognized. | On Android, continous mode runs as a segmented session, meaning when a final result is reached, additional partial and final results will cover a new segment separate from the previous final result. On iOS, you should expect one final result before speech recognition has stopped. |
333+
| `speechstart` | Fired when any sound — recognizable speech or not — has been detected | On iOS, this will fire once in the session after a result has occurred |
334+
| `speechend` | Fired when speech recognized by the speech recognition service has stopped being detected. | Not supported yet on iOS |
335+
| `start` | Speech recognition has started | Use this event to indicate to the user when to speak. |
336+
| `volumechange` | Fired when the input volume changes. | Returns a value between -2 and 10 indicating the volume of the input audio. Consider anything below 0 to be inaudible. |
324337

325338
## Handling Errors
326339

@@ -528,6 +541,44 @@ function TranscribeAudioFile() {
528541
}
529542
```
530543

544+
## Volume metering
545+
546+
You can use the `volumeChangeEventOptions.enabled` option to enable volume metering. This will emit a `volumechange` event with the current volume level (between -2 and 10) as a value. You can use this value to animate the volume metering of a user's voice, or to provide feedback to the user about the volume level.
547+
548+
### Volume metering example
549+
550+
![Volume metering example](./images/volume-metering.gif)
551+
552+
See: [VolumeMeteringAvatar.tsx](https://github.com/jamsch/expo-speech-recognition/tree/main/example/components/VolumeMeteringAvatar.tsx) for a complete example that involves using `react-native-reanimated` to animate the volume metering.
553+
554+
```tsx
555+
import { Button } from "react-native";
556+
import {
557+
ExpoSpeechRecognitionModule,
558+
useSpeechRecognitionEvent,
559+
} from "expo-speech-recognition";
560+
561+
function VolumeMeteringExample() {
562+
useSpeechRecognitionEvent("volumechange", (event) => {
563+
// a value between -2 and 10. <= 0 is inaudible
564+
console.log("Volume changed to:", event.value);
565+
});
566+
567+
const handleStart = () => {
568+
ExpoSpeechRecognitionModule.start({
569+
lang: "en-US",
570+
volumeChangeEventOptions: {
571+
enabled: true,
572+
// how often you want to receive the volumechange event
573+
intervalMillis: 300,
574+
},
575+
});
576+
};
577+
578+
return <Button title="Start" onPress={handleStart} />;
579+
}
580+
```
581+
531582
## Polyfilling the Web SpeechRecognition API
532583

533584
> [!IMPORTANT]

‎android/src/main/java/expo/modules/speechrecognition/ExpoSpeechRecognitionModule.kt

+20-12
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,8 @@ class ExpoSpeechRecognitionModule : Module() {
8686
"start",
8787
// Called when there's results (as a string array, not API compliant)
8888
"results",
89+
// Fired when the input volume changes
90+
"volumechange",
8991
)
9092

9193
Function("getDefaultRecognitionService") {
@@ -325,26 +327,32 @@ class ExpoSpeechRecognitionModule : Module() {
325327
promise: Promise,
326328
) {
327329
if (Build.VERSION.SDK_INT < Build.VERSION_CODES.TIRAMISU) {
328-
promise.resolve(mapOf(
329-
"locales" to mutableListOf<String>(),
330-
"installedLocales" to mutableListOf<String>(),
331-
))
330+
promise.resolve(
331+
mapOf(
332+
"locales" to mutableListOf<String>(),
333+
"installedLocales" to mutableListOf<String>(),
334+
),
335+
)
332336
return
333337
}
334338

335339
if (options.androidRecognitionServicePackage == null && !SpeechRecognizer.isOnDeviceRecognitionAvailable(appContext)) {
336-
promise.resolve(mapOf(
337-
"locales" to mutableListOf<String>(),
338-
"installedLocales" to mutableListOf<String>(),
339-
))
340+
promise.resolve(
341+
mapOf(
342+
"locales" to mutableListOf<String>(),
343+
"installedLocales" to mutableListOf<String>(),
344+
),
345+
)
340346
return
341347
}
342348

343349
if (options.androidRecognitionServicePackage != null && !SpeechRecognizer.isRecognitionAvailable(appContext)) {
344-
promise.resolve(mapOf(
345-
"locales" to mutableListOf<String>(),
346-
"installedLocales" to mutableListOf<String>(),
347-
))
350+
promise.resolve(
351+
mapOf(
352+
"locales" to mutableListOf<String>(),
353+
"installedLocales" to mutableListOf<String>(),
354+
),
355+
)
348356
return
349357
}
350358

‎android/src/main/java/expo/modules/speechrecognition/ExpoSpeechRecognitionOptions.kt

+11
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,17 @@ class SpeechRecognitionOptions : Record {
5050

5151
@Field
5252
val iosCategory: Map<String, Any>? = null
53+
54+
@Field
55+
val volumeChangeEventOptions: VolumeChangeEventOptions? = null
56+
}
57+
58+
class VolumeChangeEventOptions : Record {
59+
@Field
60+
val enabled: Boolean? = false
61+
62+
@Field
63+
val intervalMillis: Int? = null
5364
}
5465

5566
class RecordingOptions : Record {

‎android/src/main/java/expo/modules/speechrecognition/ExpoSpeechService.kt

+23-3
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,9 @@ class ExpoSpeechService(
5050
private var speech: SpeechRecognizer? = null
5151
private val mainHandler = Handler(Looper.getMainLooper())
5252

53+
private lateinit var options: SpeechRecognitionOptions
54+
private var lastVolumeChangeEventTime: Long = 0L
55+
5356
/** Audio recorder for persisting audio */
5457
private var audioRecorder: ExpoAudioRecorder? = null
5558

@@ -108,6 +111,7 @@ class ExpoSpeechService(
108111

109112
/** Starts speech recognition */
110113
fun start(options: SpeechRecognitionOptions) {
114+
this.options = options
111115
mainHandler.post {
112116
log("Start recognition.")
113117

@@ -119,6 +123,7 @@ class ExpoSpeechService(
119123
delayedFileStreamer = null
120124
recognitionState = RecognitionState.STARTING
121125
soundState = SoundState.INACTIVE
126+
lastVolumeChangeEventTime = 0L
122127
try {
123128
val intent = createSpeechIntent(options)
124129
speech = createSpeechRecognizer(options)
@@ -428,11 +433,11 @@ class ExpoSpeechService(
428433
*/
429434
private fun resolveSourceUri(sourceUri: String): File =
430435
when {
431-
// Local file path without URI scheme
432-
!sourceUri.startsWith("https://") && !sourceUri.startsWith("file://") -> File(sourceUri)
433-
434436
// File URI
435437
sourceUri.startsWith("file://") -> File(URI(sourceUri))
438+
439+
// Local file path without URI scheme
440+
!sourceUri.startsWith("https://") -> File(sourceUri)
436441

437442
// HTTP URI - throw an error
438443
else -> {
@@ -454,6 +459,21 @@ class ExpoSpeechService(
454459
}
455460

456461
override fun onRmsChanged(rmsdB: Float) {
462+
if (options.volumeChangeEventOptions?.enabled != true) {
463+
return
464+
}
465+
466+
val intervalMs = options.volumeChangeEventOptions?.intervalMillis
467+
468+
if (intervalMs == null) {
469+
sendEvent("volumechange", mapOf("value" to rmsdB))
470+
} else {
471+
val currentTime = System.currentTimeMillis()
472+
if (currentTime - lastVolumeChangeEventTime >= intervalMs) {
473+
sendEvent("volumechange", mapOf("value" to rmsdB))
474+
lastVolumeChangeEventTime = currentTime
475+
}
476+
}
457477
/*
458478
val isSilent = rmsdB <= 0
459479

‎example/App.tsx

+27-2
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ import {
4747
AndroidOutputFormat,
4848
IOSOutputFormat,
4949
} from "expo-av/build/Audio";
50+
import { VolumeMeteringAvatar } from "./components/VolumeMeteringAvatar";
5051

5152
const speechRecognitionServices = getSpeechRecognitionServices();
5253

@@ -71,7 +72,16 @@ export default function App() {
7172
continuous: true,
7273
requiresOnDeviceRecognition: false,
7374
addsPunctuation: true,
74-
contextualStrings: ["Carlsen", "Ian Nepomniachtchi", "Praggnanandhaa"],
75+
contextualStrings: [
76+
"expo-speech-recognition",
77+
"Carlsen",
78+
"Ian Nepomniachtchi",
79+
"Praggnanandhaa",
80+
],
81+
volumeChangeEventOptions: {
82+
enabled: false,
83+
intervalMillis: 300,
84+
},
7585
});
7686

7787
useSpeechRecognitionEvent("result", (ev) => {
@@ -140,6 +150,10 @@ export default function App() {
140150
<SafeAreaView style={styles.container}>
141151
<StatusBar style="dark" translucent={false} />
142152

153+
{settings.volumeChangeEventOptions?.enabled ? (
154+
<VolumeMeteringAvatar />
155+
) : null}
156+
143157
<View style={styles.card}>
144158
<Text style={styles.text}>
145159
{error ? JSON.stringify(error) : "Error messages go here"}
@@ -510,6 +524,17 @@ function GeneralSettings(props: {
510524
checked={Boolean(settings.continuous)}
511525
onPress={() => handleChange("continuous", !settings.continuous)}
512526
/>
527+
528+
<CheckboxButton
529+
title="Volume events"
530+
checked={Boolean(settings.volumeChangeEventOptions?.enabled)}
531+
onPress={() =>
532+
handleChange("volumeChangeEventOptions", {
533+
enabled: !settings.volumeChangeEventOptions?.enabled,
534+
intervalMillis: settings.volumeChangeEventOptions?.intervalMillis,
535+
})
536+
}
537+
/>
513538
</View>
514539

515540
<View style={styles.textOptionContainer}>
@@ -714,7 +739,7 @@ function AndroidSettings(props: {
714739
onPress={() =>
715740
handleChange("androidIntentOptions", {
716741
...settings.androidIntentOptions,
717-
[key]: !settings.androidIntentOptions?.[key] ?? false,
742+
[key]: !settings.androidIntentOptions?.[key],
718743
})
719744
}
720745
/>

‎example/assets/avatar.png

18.8 KB
Loading

‎example/babel.config.js

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
const path = require("path");
2-
module.exports = function (api) {
2+
module.exports = (api) => {
33
api.cache(true);
44
return {
55
presets: ["babel-preset-expo"],
66
plugins: [
7+
"react-native-reanimated/plugin",
78
[
89
"module-resolver",
910
{

0 commit comments

Comments
 (0)
Failed to load comments.