Skip to content

feat: add voice formatting and transfer docs #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
246 changes: 246 additions & 0 deletions fern/assistants/voice-formatting-plan.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
## What is Voice Input Formatted?

When interacting with voice assistants, you might notice terms like `Voice Input Formatted` in call logs or system outputs. This article explains what this means, how it works, and why it's important for delivering clear and natural voice interactions.

Voice Input Formatted is a function that takes raw text from a language model (LLM) and cleans it up so text-to-speech (TTS) provider can read it more naturally. It’s **on by default** in your assistant’s voice provider settings, because it helps turn things like:

- `$42.50` → `forty two dollars and fifty cents`
- `ST` → `STREET`,
- or phone numbers → spaced digits (“1 2 3 4 5 6 7 8 9 0”).

If you prefer the raw, unchanged text, you can **turn off** these transformations, which we’ll show you later.

### Log Example

![Screenshot 2025-01-21 at 10.23.19.png](https://img.notionusercontent.com/s3/prod-files-secure%2Ffdafdda2-774c-49e6-8896-a352ff4d44f3%2Ff603f2bd-36cf-4085-a3bc-f76c89a1ef75%2FScreenshot_2025-01-21_at_10.23.19.png/size/w=2000?exp=1737581744&sig=yoEEQF-BcTTgEVBNdcZh9MWHye2moRsbUcxGPjATNX8)

## 1. Step-by-Step Transformations

When `Voice Input Formatted` runs, it calls a bunch of helper functions in a row. Each one focuses on a different kind of text pattern. The entire process happens in this order:

1. **removeAngleBracketContent**
2. **removeMarkdownSymbols**
3. **removePhrasesInAsterisks**
4. **replaceNewLinesWithPeriods**
5. **replaceColonsWithPeriods**
6. **formatAcronyms**
7. **formatDollarAmounts**
8. **formatEmails**
9. **formatDates**
10. **formatTimes**
11. **formatDistances, formatUnits, formatPercentages, formatPhoneNumbers**
12. **formatNumbers**
13. **Applying Replacements**

We’ll walk you through them using a **shorter example** than before.

### 1.1 Our Simpler Example Input

```
Hello <tag> world
**Wanted** to say *hi*
We have NASA and .NET here,
call me at 123-456-7890,
price: $42.50
and the date is 2023 05 10
and time is 14:00
Distance is 5km
We might see 9999
the address is 320 ST 21 RD
my email is JOHN.DOE@example.COM

```

### 1.2 removeAngleBracketContent

- **What it does**: Removes `<anything>` unless it’s `<break>`, `<spell>`, or double angle brackets `<< >>`.
- **Example effect**: `<tag>` gets removed.

**Result so far**:

```
Hello world
**Wanted** to say *hi*
We have NASA and .NET here,
call me at 123-456-7890,
price: $42.50
and the date is 2023 05 10
and time is 14:00
Distance is 5km
We might see 9999
the address is 320 ST 21 RD
my email is JOHN.DOE@example.COM

```

### 1.3 removeMarkdownSymbols

- **What it does**: Removes `_`, ```, or `~`. Some versions also remove double asterisks, but that might happen in a later step (next function).

In this example, there’s `**Wanted**`, which _might_ remain if we strictly only remove `_`, backticks, and tildes. If the code does remove `**` as well, it’ll vanish here or in the next step. Let’s assume it doesn’t remove them in this step.

**Result**: _No real change if the code only targets `_` , ```, and `~`.\_

```
Hello world
**Wanted** to say *hi*
...

```

### 1.4 removePhrasesInAsterisks

- **What it does**: Looks for `some text*` or `*some text**` and cuts it out.

In our text, we have `**Wanted**` and `*hi*`. Both get removed if the function is broad enough to remove single and double-asterisk blocks.

**Result**:

```
Hello world
to say
We have NASA and .NET here,
call me at 123-456-7890,
price: $42.50
and the date is 2023 05 10
and time is 14:00
Distance is 5km
We might see 9999
the address is 320 ST 21 RD
my email is JOHN.DOE@example.COM

```

### 1.5 replaceNewLinesWithPeriods

- **What it does**: Turns line breaks into `.` or `.` and merges repeated periods.

Let’s say the above text has line breaks. After this step, it’s more of a single line (or fewer lines), each newline replaced by a period.

**Result** (roughly):

```
Hello world . to say . We have NASA and .NET here, call me at 123-456-7890, price: $42.50 and the date is 2023 05 10 and time is 14:00 Distance is 5km We might see 9999 the address is 320 ST 21 RD my email is JOHN.DOE@example.COM

```

### 1.6 replaceColonsWithPeriods

- **What it does**: `:` → `.`

Our text has `price: $42.50`. That becomes `price. $42.50`.

**Result**:

```
Hello world . to say . We have NASA and .NET here, call me at 123-456-7890, price. $42.50 ...

```

### 1.7 formatAcronyms

- **What it does**:
- If something is in a known “to-lower” list (like `NASA`, `.NET`), it becomes lowercase (`nasa`, `.net`).
- If it’s all-caps but not recognized, it might get spaced letters. If it has vowels, it’s left alone.

In the example:

- `NASA` → `nasa`
- `.NET` → `.net`

### 1.8 formatDollarAmounts

- **What it does**: `$42.50` → “forty two dollars and fifty cents.”

### 1.9 formatEmails

- **What it does**: Replaces `@` with “ at ” and `.` with “ dot ” in emails.
- `JOHN.DOE@example.COM` → `JOHN dot DOE at example dot COM`

### 1.10 formatDates

- **What it does**: `YYYY MM DD` → e.g. “Wednesday, May 10, 2023” (if valid).
- `2023 05 10` become “Wednesday, May 10, 2023” (day name depends on how the code calculates it).

### 1.11 formatTimes

- **What it does**: `14:00` → `14` (since minutes are “00,” it remove them).
- If it was `14:30`, it might become `14 30`.

### 1.12 formatDistances, formatUnits, formatPercentages, formatPhoneNumbers

- **Distances**: `5km` → “5 kilometers.”
- **Units**: e.g. `43 lb` → “forty three pounds.”
- **Percentages**: `50%` → “50 percent.”
- **PhoneNumbers**: `123-456-7890` → `1 2 3 4 5 6 7 8 9 0`.

### 1.13 formatNumbers

- **What it does**:
- Skips year-like numbers if they’re below current year(2025).
- For large numbers above a cutoff (e.g. 1000 or 5000), it reads as digits.
- Negative numbers: `9` → “minus nine.”
- Decimals: `2.5` → “two point five.”

In our case, `9999` might be big enough to become spelled out (nine thousand nine hundred ninety nine) or digits spaced out, depending on the cutoff.

`2023` used with `05 10` might get turned into a date, so it’s handled by the date logic, not the plain number logic.

### 1.14 Applying Replacements (street-suffix expansions)

- **Runs last**. If you have user-defined replacements like `\bST\b` → `STREET`, `\bRD\b` → `ROAD`, it changes them after all the other steps.
- So `320 ST 21 RD` → `320 STREET 21 ROAD`.

**End Result**: A single line of text with all the helpful expansions and transformations done.

## 2. Formatting Plan: Customization Options

The **Formatting Plan** governs how Voice Input Formatted works. Here are the main settings you can customize:

### 2.1 Enabled

Determines whether the formatting is applied.

- **Default**: `true`
- To disable: Set `voice.chunkPlan.formatPlan.enabled = false`.

### 2.2 Number-to-Digits Cutoff

This decides when numbers are read as digits instead of words.

- **Default**: `2025` (current year).
- The code generally **doesn’t** convert numbers below the current year (like `2025`) into spelled-out words, so it stays as digits if it’s obviously a year.
- If a number is bigger than the cutoff (`numberToDigitsCutoff`), it reads digits out loud.
- Negative numbers become “minus,” decimals get “point,” etc.
- Example: With a cutoff of `2025`, numbers like `12345` will remain digits.
- To ensure larger numbers are spelled out, set the cutoff higher, like `300000`. For example:
- `30003` → “thirty thousand and three” (with a cutoff of `300000`).

### 2.3 Replacements

Allows exact or regex-based substitutions in text.

- **Example 1**: Replace `hello` with `hi`:`{ type: 'exact', key: 'hello', value: 'hi' }`.
- **Example 2**: Replace words matching a pattern:`{ type: 'regex', regex: '\\\\b[a-zA-Z]{5}\\\\b', value: 'hi' }`.

### Note

Currently, only **replacements** and **number-to-digits cutoff** are exposed for customization. Other options, such as toggling acronym replacement, are not exposed to be toggled.

## 3. How to Turn It Off

By default, the entire pipeline is **on** because it helps TTS read better. To **turn it off**, set:

```
voice.chunkPlan.enabled = false;
// or
voice.chunkPlan.formatPlan.enabled = false;
```

Any of those flags being `false` means we **skip** calling `Voice Input Formatted`.

## 4. Conclusion

- `Voice Input Formatted` orchestrates a chain of mini-functions that together fix punctuation, expand abbreviations, and make text more readable out loud.
- You can keep it **on** for better TTS results or **off** if you need the raw LLM output.
- The final transformations, especially the user-supplied replacements (like street expansions), happen **last**, so keep that in mind it rely on other expansions earlier.
141 changes: 141 additions & 0 deletions fern/calls/call-dynamic-transfers.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
## Introduction to Transfer Destinations

Transferring calls dynamically based on context is an essential feature for handling user interactions effectively. This guide walks you through creating a custom transfer tool, linking it to your assistant, and handling transfer requests with detailed examples. Whether the destination is a phone number, SIP, or another assistant, you'll learn how to configure it seamlessly.

## Step 1: Create a Custom Transfer Tool

To get started, create a transfer tool with an empty `destinations` array:

```bash
curl -X POST https://api.vapi.ai/tool \
-H "Authorization: Bearer insert-private-key-here" \
-H "Content-Type: application/json" \
-d '{
"type": "transferCall",
"destinations": [],
"function": {
"name": "dynamicDestinationTransferCall"
}
}'
```

This tool acts as a placeholder, allowing dynamic destinations to be defined at runtime.

## Step 2: Link the Tool to Your Assistant

After creating the tool, link it to your assistant. This connection enables the assistant to trigger the tool during calls.

## Step 3: Configure the Server Event

Select the `transfer-destination-request` server event in your assistant settings. This event sends a webhook to your server whenever a transfer is requested, giving you the flexibility to dynamically determine the destination.

## Step 4: Set Up Your Server

Ensure your server is ready to handle incoming requests. Update the assistant's server URL to point to your server, which will process transfer requests and respond with the appropriate destination or error.

## Step 5: Trigger the Tool and Process Requests

Use the following prompt to trigger the transfer tool:

```
[TASK]
trigger the dynamicDestinationTransferCall tool
```

When triggered, the assistant sends a `transfer-destination-request` webhook to your server. This webhook contains all the necessary call details, such as transcripts and messages, allowing your server to process the request dynamically.

**Sample Request Payload:**

```json
{
"type": "transfer-destination-request",
"artifact": {
"messages": [...],
"transcript": "Hello, how can I help you?",
"messagesOpenAIFormatted": [...]
},
"assistant": { "id": "assistant123" },
"phoneNumber": "+14155552671",
"customer": { "id": "customer456" },
"call": { "id": "call789", "status": "ongoing" }
}
```

## Step 6: Respond to Transfer Requests

Your server should respond with either a valid `destination` or an `error` to indicate why the transfer cannot be completed.

### Transfer Destination Request Response Payload

#### Assistant Destination

```json
{
"destination": {
"type": "assistant",
"message": "Connecting you to our support assistant.",
"assistantName": "SupportAssistant",
"transferMode": "rolling-history"
}
}
```

Transfers the call to another assistant, specifying how the conversation history should be handled.

#### Number Destination

```json
{
"destination": {
"type": "number",
"message": "Connecting you to our support line.",
"number": "+14155552671",
"numberE164CheckEnabled": true,
"callerId": "+14155551234",
"extension": "101"
}
}
```

Transfers the call to a specific phone number, with options for caller ID and extensions.

#### SIP Destination

```json
{
"destination": {
"type": "sip",
"message": "Connecting your call via SIP.",
"sipUri": "sip:customer-support@domain.com",
"sipHeaders": {
"X-Custom-Header": "value"
}
}
}
```

Transfers the call to a SIP URI with optional custom headers.

### Error Response

If the transfer cannot be completed, respond with an error:

```json
{
"error": "Invalid destination specified."
}
```

- **Field**: `error`
- **Description**: Provides a clear reason why the transfer failed.

## Destination or Error in Response

Every response to a transfer-destination-request must include either a `destination` or an `error`. These indicate the outcome of the transfer request:

- **Destination**: Provides details for transferring the call.
- **Error**: Explains why the transfer cannot be completed.

## Conclusion

Dynamic call transfers empower your assistant to route calls efficiently based on real-time data. By implementing this flow, you can ensure seamless interactions and provide a better experience for your users.
Loading
Loading