Skip to content

Commit a462605

Browse files
committed
Merge remote-tracking branch 'origin/main' into fern/update-api-specs
2 parents 67f773f + 42e3226 commit a462605

File tree

14 files changed

+502
-40
lines changed

14 files changed

+502
-40
lines changed

fern/apis/api/generators.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ groups:
4747
java-sdk:
4848
generators:
4949
- name: fernapi/fern-java-sdk
50-
version: 2.9.0
50+
version: 2.10.4
5151
output:
5252
location: maven
5353
coordinate: dev.vapi:server-sdk
@@ -60,7 +60,7 @@ groups:
6060
go-sdk:
6161
generators:
6262
- name: fernapi/fern-go-sdk
63-
version: 0.35.1
63+
version: 0.35.2
6464
api:
6565
settings:
6666
unions: v1
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
## What is Voice Input Formatted?
2+
3+
When interacting with voice assistants, you might notice terms like `Voice Input Formatted` in call logs or system outputs. This article explains what this means, how it works, and why it's important for delivering clear and natural voice interactions.
4+
5+
Voice Input Formatted is a function that takes raw text from a language model (LLM) and cleans it up so text-to-speech (TTS) provider can read it more naturally. It’s **on by default** in your assistant’s voice provider settings, because it helps turn things like:
6+
7+
- `$42.50``forty two dollars and fifty cents`
8+
- `ST``STREET`,
9+
- or phone numbers → spaced digits (“1 2 3 4 5 6 7 8 9 0”).
10+
11+
If you prefer the raw, unchanged text, you can **turn off** these transformations, which we’ll show you later.
12+
13+
### Log Example
14+
15+
![Screenshot 2025-01-21 at 10.23.19.png](https://img.notionusercontent.com/s3/prod-files-secure%2Ffdafdda2-774c-49e6-8896-a352ff4d44f3%2Ff603f2bd-36cf-4085-a3bc-f76c89a1ef75%2FScreenshot_2025-01-21_at_10.23.19.png/size/w=2000?exp=1737581744&sig=yoEEQF-BcTTgEVBNdcZh9MWHye2moRsbUcxGPjATNX8)
16+
17+
## 1. Step-by-Step Transformations
18+
19+
When `Voice Input Formatted` runs, it calls a bunch of helper functions in a row. Each one focuses on a different kind of text pattern. The entire process happens in this order:
20+
21+
1. **removeAngleBracketContent**
22+
2. **removeMarkdownSymbols**
23+
3. **removePhrasesInAsterisks**
24+
4. **replaceNewLinesWithPeriods**
25+
5. **replaceColonsWithPeriods**
26+
6. **formatAcronyms**
27+
7. **formatDollarAmounts**
28+
8. **formatEmails**
29+
9. **formatDates**
30+
10. **formatTimes**
31+
11. **formatDistances, formatUnits, formatPercentages, formatPhoneNumbers**
32+
12. **formatNumbers**
33+
13. **Applying Replacements**
34+
35+
We’ll walk you through them using a **shorter example** than before.
36+
37+
### 1.1 Our Simpler Example Input
38+
39+
```
40+
Hello <tag> world
41+
**Wanted** to say *hi*
42+
We have NASA and .NET here,
43+
call me at 123-456-7890,
44+
price: $42.50
45+
and the date is 2023 05 10
46+
and time is 14:00
47+
Distance is 5km
48+
We might see 9999
49+
the address is 320 ST 21 RD
50+
my email is JOHN.DOE@example.COM
51+
52+
```
53+
54+
### 1.2 removeAngleBracketContent
55+
56+
- **What it does**: Removes `<anything>` unless it’s `<break>`, `<spell>`, or double angle brackets `<< >>`.
57+
- **Example effect**: `<tag>` gets removed.
58+
59+
**Result so far**:
60+
61+
```
62+
Hello world
63+
**Wanted** to say *hi*
64+
We have NASA and .NET here,
65+
call me at 123-456-7890,
66+
price: $42.50
67+
and the date is 2023 05 10
68+
and time is 14:00
69+
Distance is 5km
70+
We might see 9999
71+
the address is 320 ST 21 RD
72+
my email is JOHN.DOE@example.COM
73+
74+
```
75+
76+
### 1.3 removeMarkdownSymbols
77+
78+
- **What it does**: Removes `_`, ```, or `~`. Some versions also remove double asterisks, but that might happen in a later step (next function).
79+
80+
In this example, there’s `**Wanted**`, which _might_ remain if we strictly only remove `_`, backticks, and tildes. If the code does remove `**` as well, it’ll vanish here or in the next step. Let’s assume it doesn’t remove them in this step.
81+
82+
**Result**: _No real change if the code only targets `_` , ```, and `~`.\_
83+
84+
```
85+
Hello world
86+
**Wanted** to say *hi*
87+
...
88+
89+
```
90+
91+
### 1.4 removePhrasesInAsterisks
92+
93+
- **What it does**: Looks for `some text*` or `*some text**` and cuts it out.
94+
95+
In our text, we have `**Wanted**` and `*hi*`. Both get removed if the function is broad enough to remove single and double-asterisk blocks.
96+
97+
**Result**:
98+
99+
```
100+
Hello world
101+
to say
102+
We have NASA and .NET here,
103+
call me at 123-456-7890,
104+
price: $42.50
105+
and the date is 2023 05 10
106+
and time is 14:00
107+
Distance is 5km
108+
We might see 9999
109+
the address is 320 ST 21 RD
110+
my email is JOHN.DOE@example.COM
111+
112+
```
113+
114+
### 1.5 replaceNewLinesWithPeriods
115+
116+
- **What it does**: Turns line breaks into `.` or `.` and merges repeated periods.
117+
118+
Let’s say the above text has line breaks. After this step, it’s more of a single line (or fewer lines), each newline replaced by a period.
119+
120+
**Result** (roughly):
121+
122+
```
123+
Hello world . to say . We have NASA and .NET here, call me at 123-456-7890, price: $42.50 and the date is 2023 05 10 and time is 14:00 Distance is 5km We might see 9999 the address is 320 ST 21 RD my email is JOHN.DOE@example.COM
124+
125+
```
126+
127+
### 1.6 replaceColonsWithPeriods
128+
129+
- **What it does**: `:``.`
130+
131+
Our text has `price: $42.50`. That becomes `price. $42.50`.
132+
133+
**Result**:
134+
135+
```
136+
Hello world . to say . We have NASA and .NET here, call me at 123-456-7890, price. $42.50 ...
137+
138+
```
139+
140+
### 1.7 formatAcronyms
141+
142+
- **What it does**:
143+
- If something is in a known “to-lower” list (like `NASA`, `.NET`), it becomes lowercase (`nasa`, `.net`).
144+
- If it’s all-caps but not recognized, it might get spaced letters. If it has vowels, it’s left alone.
145+
146+
In the example:
147+
148+
- `NASA``nasa`
149+
- `.NET``.net`
150+
151+
### 1.8 formatDollarAmounts
152+
153+
- **What it does**: `$42.50` → “forty two dollars and fifty cents.”
154+
155+
### 1.9 formatEmails
156+
157+
- **What it does**: Replaces `@` with “ at ” and `.` with “ dot ” in emails.
158+
- `JOHN.DOE@example.COM``JOHN dot DOE at example dot COM`
159+
160+
### 1.10 formatDates
161+
162+
- **What it does**: `YYYY MM DD` → e.g. “Wednesday, May 10, 2023” (if valid).
163+
- `2023 05 10` become “Wednesday, May 10, 2023” (day name depends on how the code calculates it).
164+
165+
### 1.11 formatTimes
166+
167+
- **What it does**: `14:00``14` (since minutes are “00,” it remove them).
168+
- If it was `14:30`, it might become `14 30`.
169+
170+
### 1.12 formatDistances, formatUnits, formatPercentages, formatPhoneNumbers
171+
172+
- **Distances**: `5km` → “5 kilometers.”
173+
- **Units**: e.g. `43 lb` → “forty three pounds.”
174+
- **Percentages**: `50%` → “50 percent.”
175+
- **PhoneNumbers**: `123-456-7890``1 2 3 4 5 6 7 8 9 0`.
176+
177+
### 1.13 formatNumbers
178+
179+
- **What it does**:
180+
- Skips year-like numbers if they’re below current year(2025).
181+
- For large numbers above a cutoff (e.g. 1000 or 5000), it reads as digits.
182+
- Negative numbers: `9` → “minus nine.”
183+
- Decimals: `2.5` → “two point five.”
184+
185+
In our case, `9999` might be big enough to become spelled out (nine thousand nine hundred ninety nine) or digits spaced out, depending on the cutoff.
186+
187+
`2023` used with `05 10` might get turned into a date, so it’s handled by the date logic, not the plain number logic.
188+
189+
### 1.14 Applying Replacements (street-suffix expansions)
190+
191+
- **Runs last**. If you have user-defined replacements like `\bST\b``STREET`, `\bRD\b``ROAD`, it changes them after all the other steps.
192+
- So `320 ST 21 RD``320 STREET 21 ROAD`.
193+
194+
**End Result**: A single line of text with all the helpful expansions and transformations done.
195+
196+
## 2. Formatting Plan: Customization Options
197+
198+
The **Formatting Plan** governs how Voice Input Formatted works. Here are the main settings you can customize:
199+
200+
### 2.1 Enabled
201+
202+
Determines whether the formatting is applied.
203+
204+
- **Default**: `true`
205+
- To disable: Set `voice.chunkPlan.formatPlan.enabled = false`.
206+
207+
### 2.2 Number-to-Digits Cutoff
208+
209+
This decides when numbers are read as digits instead of words.
210+
211+
- **Default**: `2025` (current year).
212+
- The code generally **doesn’t** convert numbers below the current year (like `2025`) into spelled-out words, so it stays as digits if it’s obviously a year.
213+
- If a number is bigger than the cutoff (`numberToDigitsCutoff`), it reads digits out loud.
214+
- Negative numbers become “minus,” decimals get “point,” etc.
215+
- Example: With a cutoff of `2025`, numbers like `12345` will remain digits.
216+
- To ensure larger numbers are spelled out, set the cutoff higher, like `300000`. For example:
217+
- `30003` → “thirty thousand and three” (with a cutoff of `300000`).
218+
219+
### 2.3 Replacements
220+
221+
Allows exact or regex-based substitutions in text.
222+
223+
- **Example 1**: Replace `hello` with `hi`:`{ type: 'exact', key: 'hello', value: 'hi' }`.
224+
- **Example 2**: Replace words matching a pattern:`{ type: 'regex', regex: '\\\\b[a-zA-Z]{5}\\\\b', value: 'hi' }`.
225+
226+
### Note
227+
228+
Currently, only **replacements** and **number-to-digits cutoff** are exposed for customization. Other options, such as toggling acronym replacement, are not exposed to be toggled.
229+
230+
## 3. How to Turn It Off
231+
232+
By default, the entire pipeline is **on** because it helps TTS read better. To **turn it off**, set:
233+
234+
```
235+
voice.chunkPlan.enabled = false;
236+
// or
237+
voice.chunkPlan.formatPlan.enabled = false;
238+
```
239+
240+
Any of those flags being `false` means we **skip** calling `Voice Input Formatted`.
241+
242+
## 4. Conclusion
243+
244+
- `Voice Input Formatted` orchestrates a chain of mini-functions that together fix punctuation, expand abbreviations, and make text more readable out loud.
245+
- You can keep it **on** for better TTS results or **off** if you need the raw LLM output.
246+
- The final transformations, especially the user-supplied replacements (like street expansions), happen **last**, so keep that in mind it rely on other expansions earlier.

0 commit comments

Comments
 (0)