|
| 1 | +## What is Voice Input Formatted? |
| 2 | + |
| 3 | +When interacting with voice assistants, you might notice terms like `Voice Input Formatted` in call logs or system outputs. This article explains what this means, how it works, and why it's important for delivering clear and natural voice interactions. |
| 4 | + |
| 5 | +Voice Input Formatted is a function that takes raw text from a language model (LLM) and cleans it up so text-to-speech (TTS) provider can read it more naturally. It’s **on by default** in your assistant’s voice provider settings, because it helps turn things like: |
| 6 | + |
| 7 | +- `$42.50` → `forty two dollars and fifty cents` |
| 8 | +- `ST` → `STREET`, |
| 9 | +- or phone numbers → spaced digits (“1 2 3 4 5 6 7 8 9 0”). |
| 10 | + |
| 11 | +If you prefer the raw, unchanged text, you can **turn off** these transformations, which we’ll show you later. |
| 12 | + |
| 13 | +### Log Example |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +## 1. Step-by-Step Transformations |
| 18 | + |
| 19 | +When `Voice Input Formatted` runs, it calls a bunch of helper functions in a row. Each one focuses on a different kind of text pattern. The entire process happens in this order: |
| 20 | + |
| 21 | +1. **removeAngleBracketContent** |
| 22 | +2. **removeMarkdownSymbols** |
| 23 | +3. **removePhrasesInAsterisks** |
| 24 | +4. **replaceNewLinesWithPeriods** |
| 25 | +5. **replaceColonsWithPeriods** |
| 26 | +6. **formatAcronyms** |
| 27 | +7. **formatDollarAmounts** |
| 28 | +8. **formatEmails** |
| 29 | +9. **formatDates** |
| 30 | +10. **formatTimes** |
| 31 | +11. **formatDistances, formatUnits, formatPercentages, formatPhoneNumbers** |
| 32 | +12. **formatNumbers** |
| 33 | +13. **Applying Replacements** |
| 34 | + |
| 35 | +We’ll walk you through them using a **shorter example** than before. |
| 36 | + |
| 37 | +### 1.1 Our Simpler Example Input |
| 38 | + |
| 39 | +``` |
| 40 | +Hello <tag> world |
| 41 | +**Wanted** to say *hi* |
| 42 | +We have NASA and .NET here, |
| 43 | +call me at 123-456-7890, |
| 44 | +price: $42.50 |
| 45 | +and the date is 2023 05 10 |
| 46 | +and time is 14:00 |
| 47 | +Distance is 5km |
| 48 | +We might see 9999 |
| 49 | +the address is 320 ST 21 RD |
| 50 | +my email is JOHN.DOE@example.COM |
| 51 | +
|
| 52 | +``` |
| 53 | + |
| 54 | +### 1.2 removeAngleBracketContent |
| 55 | + |
| 56 | +- **What it does**: Removes `<anything>` unless it’s `<break>`, `<spell>`, or double angle brackets `<< >>`. |
| 57 | +- **Example effect**: `<tag>` gets removed. |
| 58 | + |
| 59 | +**Result so far**: |
| 60 | + |
| 61 | +``` |
| 62 | +Hello world |
| 63 | +**Wanted** to say *hi* |
| 64 | +We have NASA and .NET here, |
| 65 | +call me at 123-456-7890, |
| 66 | +price: $42.50 |
| 67 | +and the date is 2023 05 10 |
| 68 | +and time is 14:00 |
| 69 | +Distance is 5km |
| 70 | +We might see 9999 |
| 71 | +the address is 320 ST 21 RD |
| 72 | +my email is JOHN.DOE@example.COM |
| 73 | +
|
| 74 | +``` |
| 75 | + |
| 76 | +### 1.3 removeMarkdownSymbols |
| 77 | + |
| 78 | +- **What it does**: Removes `_`, ```, or `~`. Some versions also remove double asterisks, but that might happen in a later step (next function). |
| 79 | + |
| 80 | +In this example, there’s `**Wanted**`, which _might_ remain if we strictly only remove `_`, backticks, and tildes. If the code does remove `**` as well, it’ll vanish here or in the next step. Let’s assume it doesn’t remove them in this step. |
| 81 | + |
| 82 | +**Result**: _No real change if the code only targets `_` , ```, and `~`.\_ |
| 83 | + |
| 84 | +``` |
| 85 | +Hello world |
| 86 | +**Wanted** to say *hi* |
| 87 | +... |
| 88 | +
|
| 89 | +``` |
| 90 | + |
| 91 | +### 1.4 removePhrasesInAsterisks |
| 92 | + |
| 93 | +- **What it does**: Looks for `some text*` or `*some text**` and cuts it out. |
| 94 | + |
| 95 | +In our text, we have `**Wanted**` and `*hi*`. Both get removed if the function is broad enough to remove single and double-asterisk blocks. |
| 96 | + |
| 97 | +**Result**: |
| 98 | + |
| 99 | +``` |
| 100 | +Hello world |
| 101 | + to say |
| 102 | +We have NASA and .NET here, |
| 103 | +call me at 123-456-7890, |
| 104 | +price: $42.50 |
| 105 | +and the date is 2023 05 10 |
| 106 | +and time is 14:00 |
| 107 | +Distance is 5km |
| 108 | +We might see 9999 |
| 109 | +the address is 320 ST 21 RD |
| 110 | +my email is JOHN.DOE@example.COM |
| 111 | +
|
| 112 | +``` |
| 113 | + |
| 114 | +### 1.5 replaceNewLinesWithPeriods |
| 115 | + |
| 116 | +- **What it does**: Turns line breaks into `.` or `.` and merges repeated periods. |
| 117 | + |
| 118 | +Let’s say the above text has line breaks. After this step, it’s more of a single line (or fewer lines), each newline replaced by a period. |
| 119 | + |
| 120 | +**Result** (roughly): |
| 121 | + |
| 122 | +``` |
| 123 | +Hello world . to say . We have NASA and .NET here, call me at 123-456-7890, price: $42.50 and the date is 2023 05 10 and time is 14:00 Distance is 5km We might see 9999 the address is 320 ST 21 RD my email is JOHN.DOE@example.COM |
| 124 | +
|
| 125 | +``` |
| 126 | + |
| 127 | +### 1.6 replaceColonsWithPeriods |
| 128 | + |
| 129 | +- **What it does**: `:` → `.` |
| 130 | + |
| 131 | +Our text has `price: $42.50`. That becomes `price. $42.50`. |
| 132 | + |
| 133 | +**Result**: |
| 134 | + |
| 135 | +``` |
| 136 | +Hello world . to say . We have NASA and .NET here, call me at 123-456-7890, price. $42.50 ... |
| 137 | +
|
| 138 | +``` |
| 139 | + |
| 140 | +### 1.7 formatAcronyms |
| 141 | + |
| 142 | +- **What it does**: |
| 143 | + - If something is in a known “to-lower” list (like `NASA`, `.NET`), it becomes lowercase (`nasa`, `.net`). |
| 144 | + - If it’s all-caps but not recognized, it might get spaced letters. If it has vowels, it’s left alone. |
| 145 | + |
| 146 | +In the example: |
| 147 | + |
| 148 | +- `NASA` → `nasa` |
| 149 | +- `.NET` → `.net` |
| 150 | + |
| 151 | +### 1.8 formatDollarAmounts |
| 152 | + |
| 153 | +- **What it does**: `$42.50` → “forty two dollars and fifty cents.” |
| 154 | + |
| 155 | +### 1.9 formatEmails |
| 156 | + |
| 157 | +- **What it does**: Replaces `@` with “ at ” and `.` with “ dot ” in emails. |
| 158 | +- `JOHN.DOE@example.COM` → `JOHN dot DOE at example dot COM` |
| 159 | + |
| 160 | +### 1.10 formatDates |
| 161 | + |
| 162 | +- **What it does**: `YYYY MM DD` → e.g. “Wednesday, May 10, 2023” (if valid). |
| 163 | +- `2023 05 10` become “Wednesday, May 10, 2023” (day name depends on how the code calculates it). |
| 164 | + |
| 165 | +### 1.11 formatTimes |
| 166 | + |
| 167 | +- **What it does**: `14:00` → `14` (since minutes are “00,” it remove them). |
| 168 | +- If it was `14:30`, it might become `14 30`. |
| 169 | + |
| 170 | +### 1.12 formatDistances, formatUnits, formatPercentages, formatPhoneNumbers |
| 171 | + |
| 172 | +- **Distances**: `5km` → “5 kilometers.” |
| 173 | +- **Units**: e.g. `43 lb` → “forty three pounds.” |
| 174 | +- **Percentages**: `50%` → “50 percent.” |
| 175 | +- **PhoneNumbers**: `123-456-7890` → `1 2 3 4 5 6 7 8 9 0`. |
| 176 | + |
| 177 | +### 1.13 formatNumbers |
| 178 | + |
| 179 | +- **What it does**: |
| 180 | + - Skips year-like numbers if they’re below current year(2025). |
| 181 | + - For large numbers above a cutoff (e.g. 1000 or 5000), it reads as digits. |
| 182 | + - Negative numbers: `9` → “minus nine.” |
| 183 | + - Decimals: `2.5` → “two point five.” |
| 184 | + |
| 185 | +In our case, `9999` might be big enough to become spelled out (nine thousand nine hundred ninety nine) or digits spaced out, depending on the cutoff. |
| 186 | + |
| 187 | +`2023` used with `05 10` might get turned into a date, so it’s handled by the date logic, not the plain number logic. |
| 188 | + |
| 189 | +### 1.14 Applying Replacements (street-suffix expansions) |
| 190 | + |
| 191 | +- **Runs last**. If you have user-defined replacements like `\bST\b` → `STREET`, `\bRD\b` → `ROAD`, it changes them after all the other steps. |
| 192 | +- So `320 ST 21 RD` → `320 STREET 21 ROAD`. |
| 193 | + |
| 194 | +**End Result**: A single line of text with all the helpful expansions and transformations done. |
| 195 | + |
| 196 | +## 2. Formatting Plan: Customization Options |
| 197 | + |
| 198 | +The **Formatting Plan** governs how Voice Input Formatted works. Here are the main settings you can customize: |
| 199 | + |
| 200 | +### 2.1 Enabled |
| 201 | + |
| 202 | +Determines whether the formatting is applied. |
| 203 | + |
| 204 | +- **Default**: `true` |
| 205 | +- To disable: Set `voice.chunkPlan.formatPlan.enabled = false`. |
| 206 | + |
| 207 | +### 2.2 Number-to-Digits Cutoff |
| 208 | + |
| 209 | +This decides when numbers are read as digits instead of words. |
| 210 | + |
| 211 | +- **Default**: `2025` (current year). |
| 212 | +- The code generally **doesn’t** convert numbers below the current year (like `2025`) into spelled-out words, so it stays as digits if it’s obviously a year. |
| 213 | +- If a number is bigger than the cutoff (`numberToDigitsCutoff`), it reads digits out loud. |
| 214 | +- Negative numbers become “minus,” decimals get “point,” etc. |
| 215 | +- Example: With a cutoff of `2025`, numbers like `12345` will remain digits. |
| 216 | +- To ensure larger numbers are spelled out, set the cutoff higher, like `300000`. For example: |
| 217 | + - `30003` → “thirty thousand and three” (with a cutoff of `300000`). |
| 218 | + |
| 219 | +### 2.3 Replacements |
| 220 | + |
| 221 | +Allows exact or regex-based substitutions in text. |
| 222 | + |
| 223 | +- **Example 1**: Replace `hello` with `hi`:`{ type: 'exact', key: 'hello', value: 'hi' }`. |
| 224 | +- **Example 2**: Replace words matching a pattern:`{ type: 'regex', regex: '\\\\b[a-zA-Z]{5}\\\\b', value: 'hi' }`. |
| 225 | + |
| 226 | +### Note |
| 227 | + |
| 228 | +Currently, only **replacements** and **number-to-digits cutoff** are exposed for customization. Other options, such as toggling acronym replacement, are not exposed to be toggled. |
| 229 | + |
| 230 | +## 3. How to Turn It Off |
| 231 | + |
| 232 | +By default, the entire pipeline is **on** because it helps TTS read better. To **turn it off**, set: |
| 233 | + |
| 234 | +``` |
| 235 | +voice.chunkPlan.enabled = false; |
| 236 | +// or |
| 237 | +voice.chunkPlan.formatPlan.enabled = false; |
| 238 | +``` |
| 239 | + |
| 240 | +Any of those flags being `false` means we **skip** calling `Voice Input Formatted`. |
| 241 | + |
| 242 | +## 4. Conclusion |
| 243 | + |
| 244 | +- `Voice Input Formatted` orchestrates a chain of mini-functions that together fix punctuation, expand abbreviations, and make text more readable out loud. |
| 245 | +- You can keep it **on** for better TTS results or **off** if you need the raw LLM output. |
| 246 | +- The final transformations, especially the user-supplied replacements (like street expansions), happen **last**, so keep that in mind it rely on other expansions earlier. |
0 commit comments