Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal UX - Audio Component #1112

Merged
merged 35 commits into from
Feb 14, 2025
Merged

Multimodal UX - Audio Component #1112

merged 35 commits into from
Feb 14, 2025

Conversation

nking-1
Copy link
Collaborator

@nking-1 nking-1 commented Feb 5, 2025

Implements an audio component for the Jupyter widget.

  1. Adds some plumbing for mock data to be passed from the audio() guidance function to the Jupyter widget.
  2. Implements a new audio player component with waveform visualization. Still working on the look and feel, but core functionality, including start, stop, seek, and volume control, are all working.
  3. Sets us up for handing off the current work to the back end team. We'll be working on image and video in the meantime.

nopdive and others added 22 commits November 11, 2024 13:01
Missing import error catch in environment detection.
Moved exceptions catch into one line.
Not yet implemented, but available to call within notebooks.
Primitives were duplicating code.
Audio/image/video now have API primitives to generate from model.
Very basic but enough for rendering.
Also added sample audio/video assets (both creative commons).
This is important as we're using kernel comms (JSON) behind the scenes.
Clean-up of previous commit.
Important for package testing.
Console prints, frontend controls need to be added later.
@nking-1
Copy link
Collaborator Author

nking-1 commented Feb 5, 2025

There are some large formatting changes to existing code from using the "Svelte for VS Code" extension. It seems like that's actually using Prettier under the hood. Sam and I chatted about this and will move forward with using it as our default formatter so hopefully the formatting changes won't happen again.

@hudson-ai
Copy link
Collaborator

Besides the failing tests (😆), LGTM.

We'll have to get aligned on the api to the image, audio, etc. functions, especially in how we denote "inputs" vs "outputs", but non-essential for this first pr

@nking-1
Copy link
Collaborator Author

nking-1 commented Feb 13, 2025

Besides the failing tests (😆), LGTM.

How much of a blocker are the failing tests for this PR? It'd be great to get it merged when we can (I need to fix merge conflicts now because it's been pending for a while)

@hudson-ai
Copy link
Collaborator

How much of a blocker are the failing tests for this PR? It'd be great to get it merged when we can (I need to fix merge conflicts now because it's been pending for a while)

@nking-1 not a blocker at all -- the only failing tests are in tests/unit/library/test_image.py, which just needs to be rewritten as we add real multimodal support. I'd just ask you to mark them as xfails or delete them tbh

@nking-1
Copy link
Collaborator Author

nking-1 commented Feb 13, 2025

I rewrote the tests as pseudocode comments and just have a pass at the end - hopefully pytest will accept that. We're changing the internals so I thought this was a good balance of keeping notes about what we used to test, without keeping stale code around.

# TODO(nopdive): Mock for testing. Remove all of this code later.
bytes_data = bytes_from(src, allow_local=allow_local)
base64_string = base64.b64encode(bytes_data).decode('utf-8')
lm += AudioOutput(value=base64_string, is_input=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the is_input=True an indication that we'll eventually move to a single Audio type that will have an is_input flag much like our TextOutput object that has is_generated? (Although we still have a LiteralInput text type)

Copy link
Collaborator

@nopdive nopdive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We'll do a pass to remove stub dependencies before release.

@nking-1 nking-1 merged commit 9fe8b26 into main Feb 14, 2025
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants