Skip to content

Reading files into chat #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.sample
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
OPENAI_API_KEY=
PERSONALIZATION_FILE=./personalization.json
BROWSER_CUSTOMIZATION_FILE=./browser.json
SCRATCH_PAD_DIR=./scratchpad
RUN_TIME_TABLE_LOG_JSON=runtime_time_table.jsonl
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# POC Python Realtime API o1 assistant
> This is a proof of concept for using the OpenAI's [Realtime API](https://openai.com/index/introducing-the-realtime-api/) to chain tools, call o1-preview & o1-mini, [structure output](https://openai.com/index/introducing-structured-outputs-in-the-api/) responses, and glimpse into the future of **AI assistant powered engineering**.
>
> See video where we [use this POC](https://youtu.be/vN0t-kcPOXo)
> See video where we [use and discuss this POC](https://youtu.be/vN0t-kcPOXo)
>
> This codebase is a v0, poc. It's buggy, but contains the core ideas for realtime personal ai assistants & AI Agents.

<img src="./images/ada-is-back.png" alt="realtime-assistant" style="max-width: 800px;">

Expand Down Expand Up @@ -63,6 +65,7 @@ The codebase includes various utility functions for tasks such as structured out
## Improvements
> Up for a challenge? Here are some ideas on how to improve the experience:

- Organize code.
- Add interruption handling. Current version prevents it for simplicity.
- Add transcript logging.
- Make personalization.json a pydantic type.
Expand Down
17 changes: 17 additions & 0 deletions browser.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"browser_urls": [
"https://aider.chat",
"https://simonwillison.net",
"https://hackernews.com",
"https://chat.openai.com",
"https://notebooklm.google.com",
"https://google.com",
"https://youtube.com",
"https://twitter.com",
"https://claude.ai/chat",
"https://changelog.cursor.com",
"https://gemini.google.com/u/1/",
"https://openai.com/index/introducing-the-realtime-api/"
],
"browser": "open -a /Applications/Google\\ Chrome.app %s"
}
17 changes: 1 addition & 16 deletions personalization.json
Original file line number Diff line number Diff line change
@@ -1,19 +1,4 @@
{
"browser_urls": [
"https://aider.chat",
"https://simonwillison.net",
"https://hackernews.com",
"https://chat.openai.com",
"https://notebooklm.google.com",
"https://google.com",
"https://youtube.com",
"https://twitter.com",
"https://claude.ai/chat",
"https://changelog.cursor.com",
"https://gemini.google.com/u/1/",
"https://openai.com/index/introducing-the-realtime-api/"
],
"browser": "chrome",
"ai_assistant_name": "Ada",
"human_name": "Dan"
}
}
Empty file.
84 changes: 84 additions & 0 deletions src/realtime_api_async_python/audio/bidirectional_audio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import pyaudio
import queue
import asyncio
from ..utils.logging import logging

# Audio recording parameters
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 24000

class BidirectionalAudio:
def __init__(self):
self.p = pyaudio.PyAudio()
self.stream = self.p.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK,
stream_callback=self.callback,
)
self.queue = queue.Queue()
self.is_recording = False
self.is_receiving = False
logging.info("AsyncMicrophone initialized")

def callback(self, in_data, frame_count, time_info, status):
if self.is_recording and not self.is_receiving:
self.queue.put(in_data)
# if self.is_recording:
# self.queue.put(in_data)
return (None, pyaudio.paContinue)

def start_recording(self):
self.is_recording = True
logging.info("Started recording")

def stop_recording(self):
self.is_recording = False
logging.info("Stopped recording")

def start_receiving(self):
self.is_receiving = True
self.is_recording = False
logging.info("Started receiving assistant response")

def stop_receiving(self):
self.is_receiving = False
logging.info("Stopped receiving assistant response")

def get_audio_data(self):
data = b""
while not self.queue.empty():
data += self.queue.get()
return data if data else None

def close(self):
self.stream.stop_stream()
self.stream.close()
self.p.terminate()
logging.info("AsyncMicrophone closed")

async def play_audio(audio_data):
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, output=True)
stream.write(audio_data)

# Add a small delay (e.g., 100ms) of silence at the end to prevent popping, and weird cuts off sounds
silence_duration = 0.2 # 200ms
silence_frames = int(RATE * silence_duration)
silence = b"\x00" * (
silence_frames * CHANNELS * 2
) # 2 bytes per sample for 16-bit audio
stream.write(silence)

# Add a small pause before closing the stream to make sure the audio is fully played
await asyncio.sleep(0.5)

stream.stop_stream()
stream.close()
p.terminate()
logging.debug("Audio playback completed")

Loading