Multimodal UX - Audio Component #1112

nking-1 · 2025-02-05T19:42:32Z

Implements an audio component for the Jupyter widget.

Adds some plumbing for mock data to be passed from the audio() guidance function to the Jupyter widget.
Implements a new audio player component with waveform visualization. Still working on the look and feel, but core functionality, including start, stop, seek, and volume control, are all working.
Sets us up for handing off the current work to the back end team. We'll be working on image and video in the meantime.

Missing import error catch in environment detection.

Moved exceptions catch into one line.

Not yet implemented, but available to call within notebooks.

Primitives were duplicating code.

Audio/image/video now have API primitives to generate from model.

Very basic but enough for rendering.

Also added sample audio/video assets (both creative commons).

This is important as we're using kernel comms (JSON) behind the scenes.

Clean-up of previous commit.

Important for package testing.

Console prints, frontend controls need to be added later.

nking-1 · 2025-02-05T19:56:09Z

There are some large formatting changes to existing code from using the "Svelte for VS Code" extension. It seems like that's actually using Prettier under the hood. Sam and I chatted about this and will move forward with using it as our default formatter so hopefully the formatting changes won't happen again.

Connected from API primitives to client.

…guidance into multimodal-surfaces

Forgot to commit this for previous.

…guidance into multimodal-surfaces

hudson-ai · 2025-02-10T21:42:25Z

Besides the failing tests (😆), LGTM.

We'll have to get aligned on the api to the image, audio, etc. functions, especially in how we denote "inputs" vs "outputs", but non-essential for this first pr

nking-1 · 2025-02-13T21:15:37Z

Besides the failing tests (😆), LGTM.

How much of a blocker are the failing tests for this PR? It'd be great to get it merged when we can (I need to fix merge conflicts now because it's been pending for a while)

hudson-ai · 2025-02-13T22:30:51Z

How much of a blocker are the failing tests for this PR? It'd be great to get it merged when we can (I need to fix merge conflicts now because it's been pending for a while)

@nking-1 not a blocker at all -- the only failing tests are in tests/unit/library/test_image.py, which just needs to be rewritten as we add real multimodal support. I'd just ask you to mark them as xfails or delete them tbh

nking-1 · 2025-02-13T23:06:16Z

I rewrote the tests as pseudocode comments and just have a pass at the end - hopefully pytest will accept that. We're changing the internals so I thought this was a good balance of keeping notes about what we used to test, without keeping stale code around.

hudson-ai · 2025-02-14T18:50:29Z

guidance/library/_audio.py

+    # TODO(nopdive): Mock for testing. Remove all of this code later.
+    bytes_data = bytes_from(src, allow_local=allow_local)
+    base64_string = base64.b64encode(bytes_data).decode('utf-8')
+    lm += AudioOutput(value=base64_string, is_input=True)


Is the is_input=True an indication that we'll eventually move to a single Audio type that will have an is_input flag much like our TextOutput object that has is_generated? (Although we still have a LiteralInput text type)

nopdive

LGTM. We'll do a pass to remove stub dependencies before release.

nopdive and others added 22 commits November 11, 2024 13:01

Fix for environment detection in barebones environments.

2c47707

Missing import error catch in environment detection.

Minor clean-up of env detection.

11c60b1

Moved exceptions catch into one line.

Merge branch 'guidance-ai:main' into main

d721a5f

Merge branch 'guidance-ai:main' into main

a958ac6

Merge branch 'guidance-ai:main' into main

428b81c

Merge branch 'guidance-ai:main' into main

4aa9913

Audio and video API primitives added.

6cd2ba0

Not yet implemented, but available to call within notebooks.

Refactor of byte parsing for multi-modal primitives.

3153368

Primitives were duplicating code.

Added additional API primitives for modal outputs.

8972b04

Audio/image/video now have API primitives to generate from model.

Added gen multi modal primitives to guidance top-level API.

45ecde5

Trace nodes updated to handle audio and video.

b813d6d

Very basic but enough for rendering.

Connecting audio/video to model class.

dffec01

Also added sample audio/video assets (both creative commons).

Base64 encoding for modal messages.

0dad4b6

This is important as we're using kernel comms (JSON) behind the scenes.

Minor additions.

7338f04

Clean-up of previous commit.

Added interfaces to client for modal.

94ebcf4

Updated manifest for sample assets.

abd54fa

Important for package testing.

Audio / video added to app for client.

5557fe0

Console prints, frontend controls need to be added later.

Code reformat on some client files.

aee2656

Added bundle.

0d054f4

hacky initial audio prototype

c0df8d1

Custom audio widget draft - trying to draw waveform

48c3e7b

Waveform height bars fixed

cbf2dca

nking-1 requested review from nopdive, Harsha-Nori and hudson-ai February 5, 2025 19:43

nopdive and others added 4 commits February 5, 2025 12:12

Added image mock via guidance library.

594d886

Connected from API primitives to client.

Merge branch 'multimodal-surfaces' of https://github.com/guidance-ai/…

30163f0

…guidance into multimodal-surfaces

Fix closing bracket & format

b087d2f

New bundle

487a507

nking-1 and others added 3 commits February 5, 2025 19:00

New line for audio widget

5330838

Added missing sample image.

5f9fb17

Forgot to commit this for previous.

Merge branch 'multimodal-surfaces' of https://github.com/guidance-ai/…

1c2d1a9

…guidance into multimodal-surfaces

nking-1 added 2 commits February 11, 2025 14:26

Fix some video output and image output pipeline bugs

e467879

video and image rendering on front end (no styling or controls yet)

94c0588

nking-1 added 2 commits February 13, 2025 13:22

Merge branch 'main' into multimodal-surfaces

a54a143

Fix merge regressions

08f5290

Rewrite image tests as pseudocode placeholders

2df401a

Try adding setuptools install dependency to fix build

93e69a4

hudson-ai reviewed Feb 14, 2025

View reviewed changes

nopdive approved these changes Feb 14, 2025

View reviewed changes

nking-1 merged commit 9fe8b26 into main Feb 14, 2025
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal UX - Audio Component #1112

Multimodal UX - Audio Component #1112

nking-1 commented Feb 5, 2025

nking-1 commented Feb 5, 2025

hudson-ai commented Feb 10, 2025

nking-1 commented Feb 13, 2025

hudson-ai commented Feb 13, 2025

nking-1 commented Feb 13, 2025

hudson-ai Feb 14, 2025

nopdive left a comment

Multimodal UX - Audio Component #1112

Multimodal UX - Audio Component #1112

Conversation

nking-1 commented Feb 5, 2025

nking-1 commented Feb 5, 2025

hudson-ai commented Feb 10, 2025

nking-1 commented Feb 13, 2025

hudson-ai commented Feb 13, 2025

nking-1 commented Feb 13, 2025

hudson-ai Feb 14, 2025

Choose a reason for hiding this comment

nopdive left a comment

Choose a reason for hiding this comment