Skip to content

Commit 2d2e73d

Browse files
authored
feat: volume change event support (#37)
* feat: volume change event support * chore: update readme * update events doc * refactor: rmsDB -> value, use max power in buffer * chore: autoformat * chore: update VolumeMeteringAvatar and docs * update volume metering link
1 parent 1da7819 commit 2d2e73d

16 files changed

+865
-214
lines changed

README.md

+58-11
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ expo-speech-recognition implements the iOS [`SFSpeechRecognizer`](https://develo
1919
- [Transcribing audio files](#transcribing-audio-files)
2020
- [Supported input audio formats](#supported-input-audio-formats)
2121
- [File transcription example](#file-transcription-example)
22+
- [Volume metering](#volume-metering)
23+
- [Volume metering example](#volume-metering-example)
2224
- [Polyfilling the Web SpeechRecognition API](#polyfilling-the-web-speechrecognition-api)
2325
- [Muting the beep sound on Android](#muting-the-beep-sound-on-android)
2426
- [Improving accuracy of single-word prompts](#improving-accuracy-of-single-word-prompts)
@@ -299,6 +301,13 @@ ExpoSpeechRecognitionModule.start({
299301
// Default: 50ms for network-based recognition, 15ms for on-device recognition
300302
chunkDelayMillis: undefined,
301303
},
304+
// Settings for volume change events.
305+
volumeChangeEventOptions: {
306+
// [Default: false] Whether to emit the `volumechange` events when the input volume changes.
307+
enabled: false,
308+
// [Default: 100ms on iOS] The interval (in milliseconds) to emit `volumechange` events.
309+
intervalMillis: 300,
310+
},
302311
});
303312

304313
// Stop capturing audio (and emit a final result if there is one)
@@ -312,17 +321,18 @@ ExpoSpeechRecognitionModule.abort();
312321

313322
Events are largely based on the [Web Speech API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition). The following events are supported:
314323

315-
| Event Name | Description | Notes |
316-
| ------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
317-
| `audiostart` | Audio capturing has started | Includes the `uri` if `recordingOptions.persist` is enabled. |
318-
| `audioend` | Audio capturing has ended | Includes the `uri` if `recordingOptions.persist` is enabled. |
319-
| `end` | Speech recognition service has disconnected. | This should always be the last event dispatched, including after errors. |
320-
| `error` | Fired when a speech recognition error occurs. | You'll also receive an `error` event (with code "aborted") when calling `.abort()` |
321-
| `nomatch` | Speech recognition service returns a final result with no significant recognition. | You may have non-final results recognized. This may get emitted after cancellation. |
322-
| `result` | Speech recognition service returns a word or phrase has been positively recognized. | On Android, continous mode runs as a segmented session, meaning when a final result is reached, additional partial and final results will cover a new segment separate from the previous final result. On iOS, you should expect one final result before speech recognition has stopped. |
323-
| `speechstart` | Fired when any sound — recognizable speech or not — has been detected | On iOS, this will fire once in the session after a result has occurred |
324-
| `speechend` | Fired when speech recognized by the speech recognition service has stopped being detected. | Not supported yet on iOS |
325-
| `start` | Speech recognition has started | Use this event to indicate to the user when to speak. |
324+
| Event Name | Description | Notes |
325+
| -------------- | ------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
326+
| `audiostart` | Audio capturing has started | Includes the `uri` if `recordingOptions.persist` is enabled. |
327+
| `audioend` | Audio capturing has ended | Includes the `uri` if `recordingOptions.persist` is enabled. |
328+
| `end` | Speech recognition service has disconnected. | This should always be the last event dispatched, including after errors. |
329+
| `error` | Fired when a speech recognition error occurs. | You'll also receive an `error` event (with code "aborted") when calling `.abort()` |
330+
| `nomatch` | Speech recognition service returns a final result with no significant recognition. | You may have non-final results recognized. This may get emitted after cancellation. |
331+
| `result` | Speech recognition service returns a word or phrase has been positively recognized. | On Android, continous mode runs as a segmented session, meaning when a final result is reached, additional partial and final results will cover a new segment separate from the previous final result. On iOS, you should expect one final result before speech recognition has stopped. |
332+
| `speechstart` | Fired when any sound — recognizable speech or not — has been detected | On iOS, this will fire once in the session after a result has occurred |
333+
| `speechend` | Fired when speech recognized by the speech recognition service has stopped being detected. | Not supported yet on iOS |
334+
| `start` | Speech recognition has started | Use this event to indicate to the user when to speak. |
335+
| `volumechange` | Fired when the input volume changes. | Returns a value between -2 and 10 indicating the volume of the input audio. Consider anything below 0 to be inaudible. |
326336

327337
## Handling Errors
328338

@@ -530,6 +540,43 @@ function TranscribeAudioFile() {
530540
}
531541
```
532542

543+
## Volume metering
544+
545+
You can use the `volumeChangeEventOptions.enabled` option to enable volume metering. This will emit a `volumechange` event with the current volume level (between -2 and 10) as a value. You can use this value to animate the volume metering of a user's voice, or to provide feedback to the user about the volume level.
546+
547+
### Volume metering example
548+
549+
![Volume metering example](./images/volume-metering.gif)
550+
551+
See: [VolumeMeteringAvatar.tsx](https://github.com/jamsch/expo-speech-recognition/tree/main/example/components/VolumeMeteringAvatar.tsx) for a complete example that involves using `react-native-reanimated` to animate the volume metering.
552+
553+
```tsx
554+
import { ExpoSpeechRecognitionModule } from "expo-speech-recognition";
555+
556+
function VolumeMeteringAvatar() {
557+
useSpeechRecognitionEvent("volumechange", (event) => {
558+
console.log("Volume changed to:", event.value);
559+
});
560+
561+
const handleStart = () => {
562+
ExpoSpeechRecognitionModule.start({
563+
lang: "en-US",
564+
volumeChangeEventOptions: {
565+
enabled: true,
566+
intervalMillis: 300,
567+
},
568+
});
569+
};
570+
571+
return (
572+
<View>
573+
<Button title="Start" onPress={handleStart} />
574+
<Text>Volume: {volume}</Text>
575+
</View>
576+
);
577+
}
578+
```
579+
533580
## Polyfilling the Web SpeechRecognition API
534581

535582
> [!IMPORTANT]

android/src/main/java/expo/modules/speechrecognition/ExpoSpeechRecognitionModule.kt

+20-12
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,8 @@ class ExpoSpeechRecognitionModule : Module() {
8686
"start",
8787
// Called when there's results (as a string array, not API compliant)
8888
"results",
89+
// Fired when the input volume changes
90+
"volumechange",
8991
)
9092

9193
Function("getDefaultRecognitionService") {
@@ -325,26 +327,32 @@ class ExpoSpeechRecognitionModule : Module() {
325327
promise: Promise,
326328
) {
327329
if (Build.VERSION.SDK_INT < Build.VERSION_CODES.TIRAMISU) {
328-
promise.resolve(mapOf(
329-
"locales" to mutableListOf<String>(),
330-
"installedLocales" to mutableListOf<String>(),
331-
))
330+
promise.resolve(
331+
mapOf(
332+
"locales" to mutableListOf<String>(),
333+
"installedLocales" to mutableListOf<String>(),
334+
),
335+
)
332336
return
333337
}
334338

335339
if (options.androidRecognitionServicePackage == null && !SpeechRecognizer.isOnDeviceRecognitionAvailable(appContext)) {
336-
promise.resolve(mapOf(
337-
"locales" to mutableListOf<String>(),
338-
"installedLocales" to mutableListOf<String>(),
339-
))
340+
promise.resolve(
341+
mapOf(
342+
"locales" to mutableListOf<String>(),
343+
"installedLocales" to mutableListOf<String>(),
344+
),
345+
)
340346
return
341347
}
342348

343349
if (options.androidRecognitionServicePackage != null && !SpeechRecognizer.isRecognitionAvailable(appContext)) {
344-
promise.resolve(mapOf(
345-
"locales" to mutableListOf<String>(),
346-
"installedLocales" to mutableListOf<String>(),
347-
))
350+
promise.resolve(
351+
mapOf(
352+
"locales" to mutableListOf<String>(),
353+
"installedLocales" to mutableListOf<String>(),
354+
),
355+
)
348356
return
349357
}
350358

android/src/main/java/expo/modules/speechrecognition/ExpoSpeechRecognitionOptions.kt

+11
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,17 @@ class SpeechRecognitionOptions : Record {
5050

5151
@Field
5252
val iosCategory: Map<String, Any>? = null
53+
54+
@Field
55+
val volumeChangeEventOptions: VolumeChangeEventOptions? = null
56+
}
57+
58+
class VolumeChangeEventOptions : Record {
59+
@Field
60+
val enabled: Boolean? = false
61+
62+
@Field
63+
val intervalMillis: Int? = null
5364
}
5465

5566
class RecordingOptions : Record {

android/src/main/java/expo/modules/speechrecognition/ExpoSpeechService.kt

+20
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,9 @@ class ExpoSpeechService(
5050
private var speech: SpeechRecognizer? = null
5151
private val mainHandler = Handler(Looper.getMainLooper())
5252

53+
private lateinit var options: SpeechRecognitionOptions
54+
private var lastVolumeChangeEventTime: Long = 0L
55+
5356
/** Audio recorder for persisting audio */
5457
private var audioRecorder: ExpoAudioRecorder? = null
5558

@@ -108,6 +111,7 @@ class ExpoSpeechService(
108111

109112
/** Starts speech recognition */
110113
fun start(options: SpeechRecognitionOptions) {
114+
this.options = options
111115
mainHandler.post {
112116
log("Start recognition.")
113117

@@ -119,6 +123,7 @@ class ExpoSpeechService(
119123
delayedFileStreamer = null
120124
recognitionState = RecognitionState.STARTING
121125
soundState = SoundState.INACTIVE
126+
lastVolumeChangeEventTime = 0L
122127
try {
123128
val intent = createSpeechIntent(options)
124129
speech = createSpeechRecognizer(options)
@@ -454,6 +459,21 @@ class ExpoSpeechService(
454459
}
455460

456461
override fun onRmsChanged(rmsdB: Float) {
462+
if (options.volumeChangeEventOptions?.enabled != true) {
463+
return
464+
}
465+
466+
val intervalMs = options.volumeChangeEventOptions?.intervalMillis
467+
468+
if (intervalMs == null) {
469+
sendEvent("volumechange", mapOf("value" to rmsdB))
470+
} else {
471+
val currentTime = System.currentTimeMillis()
472+
if (currentTime - lastVolumeChangeEventTime >= intervalMs) {
473+
sendEvent("volumechange", mapOf("value" to rmsdB))
474+
lastVolumeChangeEventTime = currentTime
475+
}
476+
}
457477
/*
458478
val isSilent = rmsdB <= 0
459479

example/App.tsx

+27-2
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ import {
4747
AndroidOutputFormat,
4848
IOSOutputFormat,
4949
} from "expo-av/build/Audio";
50+
import { VolumeMeteringAvatar } from "./components/VolumeMeteringAvatar";
5051

5152
const speechRecognitionServices = getSpeechRecognitionServices();
5253

@@ -71,7 +72,16 @@ export default function App() {
7172
continuous: true,
7273
requiresOnDeviceRecognition: false,
7374
addsPunctuation: true,
74-
contextualStrings: ["Carlsen", "Ian Nepomniachtchi", "Praggnanandhaa"],
75+
contextualStrings: [
76+
"expo-speech-recognition",
77+
"Carlsen",
78+
"Ian Nepomniachtchi",
79+
"Praggnanandhaa",
80+
],
81+
volumeChangeEventOptions: {
82+
enabled: false,
83+
intervalMillis: 300,
84+
},
7585
});
7686

7787
useSpeechRecognitionEvent("result", (ev) => {
@@ -140,6 +150,10 @@ export default function App() {
140150
<SafeAreaView style={styles.container}>
141151
<StatusBar style="dark" translucent={false} />
142152

153+
{settings.volumeChangeEventOptions?.enabled ? (
154+
<VolumeMeteringAvatar />
155+
) : null}
156+
143157
<View style={styles.card}>
144158
<Text style={styles.text}>
145159
{error ? JSON.stringify(error) : "Error messages go here"}
@@ -510,6 +524,17 @@ function GeneralSettings(props: {
510524
checked={Boolean(settings.continuous)}
511525
onPress={() => handleChange("continuous", !settings.continuous)}
512526
/>
527+
528+
<CheckboxButton
529+
title="Volume events"
530+
checked={Boolean(settings.volumeChangeEventOptions?.enabled)}
531+
onPress={() =>
532+
handleChange("volumeChangeEventOptions", {
533+
enabled: !settings.volumeChangeEventOptions?.enabled,
534+
intervalMillis: settings.volumeChangeEventOptions?.intervalMillis,
535+
})
536+
}
537+
/>
513538
</View>
514539

515540
<View style={styles.textOptionContainer}>
@@ -714,7 +739,7 @@ function AndroidSettings(props: {
714739
onPress={() =>
715740
handleChange("androidIntentOptions", {
716741
...settings.androidIntentOptions,
717-
[key]: !settings.androidIntentOptions?.[key] ?? false,
742+
[key]: !settings.androidIntentOptions?.[key],
718743
})
719744
}
720745
/>

example/assets/avatar.png

18.8 KB
Loading

example/babel.config.js

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
const path = require("path");
2-
module.exports = function (api) {
2+
module.exports = (api) => {
33
api.cache(true);
44
return {
55
presets: ["babel-preset-expo"],
66
plugins: [
7+
"react-native-reanimated/plugin",
78
[
89
"module-resolver",
910
{

0 commit comments

Comments
 (0)