Releases: pipecat-ai/pipecat
v0.0.68
Added
-
Added
GoogleHttpTTSService
which uses Google's HTTP TTS API. -
Added
TavusTransport
, a new transport implementation compatible with any Pipecat pipeline. When using theTavusTransport
the Pipecat bot will connect in the same room as the Tavus Avatar and the user. -
Added
PlivoFrameSerializer
to support Plivo calls. A full running example has also been added toexamples/plivo-chatbot
. -
Added
UserBotLatencyLogObserver
. This is an observer that logs the latency between when the user stops speaking and when the bot starts speaking. This gives you an initial idea on how quickly the AI services respond. -
Added
SarvamTTSService
, which implements Sarvam AI's TTS API:
https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert. -
Added
PipelineTask.add_observer()
andPipelineTask.remove_observer()
to allow mangaging observers at runtime. This is useful for cases where the task is passed around to other code components that might want to observe the pipeline dynamically. -
Added
user_id
field toTranscriptionMessage
. This allows identifying the user in a multi-user scenario. Note that this requires thatTranscriptionFrame
has theuser_id
properly set. -
Added new
PipelineTask
event handlerson_pipeline_started
,on_pipeline_stopped
,on_pipeline_ended
andon_pipeline_cancelled
, which correspond to theStartFrame
,StopFrame
,EndFrame
andCancelFrame
respectively. -
Added additional languages to
LmntTTSService
. Languages include:hi
,id
,it
,ja
,nl
,pl
,ru
,sv
,th
,tr
,uk
,vi
. -
Added a
model
parameter to theLmntTTSService
constructor, allowing switching between LMNT models. -
Added
MiniMaxHttpTTSService
, which implements MiniMax's T2A API for TTS.
Learn more: https://www.minimax.io/platform_overview -
A new function
FrameProcessor.setup()
has been added to allow setting up frame processors before receiving aStartFrame
. This is what's happening internally:FrameProcessor.setup()
is called,StartFrame
is pushed from the beginning of the pipeline, your regular pipeline operations,EndFrame
orCancelFrame
are pushed from the beginning of the pipeline and finallyFrameProcessor.cleanup()
is called. -
Added support for OpenTelemetry tracing in Pipecat. This initial implementation includes:
- A
setup_tracing
method where you can specify your OpenTelemetry exporter - Service decorators for STT (
@traced_stt
), LLM (@traced_llm
), and TTS (@traced_tts
) which trace the execution and collect properties and metrics (TTFB, token usage, character counts, etc.) - Class decorators that provide execution tracking; these are generic and can be used for service tracking as needed
- Spans that help track traces on a per conversations and turn basis:
conversation-uuid ├── turn-1 │ ├── stt_deepgramsttservice │ ├── llm_openaillmservice │ └── tts_cartesiattsservice ... └── turn-n └── ...
By default, Pipecat has implemented service decorators to trace execution of STT, LLM, and TTS services. You can enable tracing by setting
enable_tracing
toTrue
in the PipelineTask. - A
-
Added
TurnTrackingObserver
, which tracks the start and end of a user/bot turn pair and emits eventson_turn_started
andon_turn_stopped
corresponding to the start and end of a turn, respectively. -
Allow passing observers to
run_test()
while running unit tests.
Changed
-
Upgraded
daily-python
to 0.19.1. -
⚠️ UpdatedSmallWebRTCTransport
to align with how other transports handleon_client_disconnected
. Now, when the connection is closed and no reconnection is attempted,on_client_disconnected
is called instead ofon_client_close
. Theon_client_close
callback is no longer used, useon_client_disconnected
instead. -
Check if
PipelineTask
has already been cancelled. -
Don't raise an exception if event handler is not registered.
-
Upgraded
deepgram-sdk
to 4.1.0. -
Updated
GoogleTTSService
to use Google's streaming TTS API. The default voice also updated toen-US-Chirp3-HD-Charon
. -
⚠️ Refactored theTavusVideoService
, so it acts like a proxy, sending audio to Tavus and receiving both audio and video. This will makeTavusVideoService
usable with any Pipecat pipeline and with any transport. This is a breaking change, check theexamples/foundational/21a-tavus-layer-small-webrtc.py
to see how to use it. -
DailyTransport
now uses custom microphone audio tracks instead of virtual microphones. Now, multiple Daily transports can be used in the same process. -
DailyTransport
now captures audio from individual participants instead of the whole room. This allows identifying audio frames per participant. -
Updated the default model for
AnthropicLLMService
toclaude-sonnet-4-20250514
. -
Updated the default model for
GeminiMultimodalLiveLLMService
tomodels/gemini-2.5-flash-preview-native-audio-dialog
. -
BaseTextFilter
methodsfilter()
,update_settings()
,handle_interruption()
andreset_interruption()
are now async. -
BaseTextAggregator
methodsaggregate()
,handle_interruption()
andreset()
are now async. -
The API version for
CartesiaTTSService
andCartesiaHttpTTSService
has been updated. Also, thecartesia
dependency has been updated to 2.x. -
CartesiaTTSService
andCartesiaHttpTTSService
now support Cartesia's newspeed
parameter which accepts values ofslow
,normal
, andfast
. -
GeminiMultimodalLiveLLMService
now uses the user transcription and usage metrics provided by Gemini Live. -
GoogleLLMService
has been updated to usegoogle-genai
instead of the deprecatedgoogle-generativeai
.
Deprecated
- In
CartesiaTTSService
andCartesiaHttpTTSService
,emotion
has been deprecated by Cartesia. Pipecat is following suit and deprecatingemotion
as well.
Removed
-
Since
GeminiMultimodalLiveLLMService
now transcribes it's own audio, thetranscribe_user_audio
arg has been removed. Audio is now transcribed automatically. -
Removed
SileroVAD
frame processor, just useSileroVADAnalyzer
instead. Also removed,07a-interruptible-vad.py
example.
Fixed
-
Fixed a
DailyTransport
issue that was not allow capturing video frames if framerate was greater than zero. -
Fixed a
DeegramSTTService
connection issue when the user provided their ownLiveOptions
. -
Fixed a
DailyTransport
issue that would cause images needing resize to block the event loop. -
Fixed an issue with
ElevenLabsTTSService
where changing the model or voice while the service is running wasn't working. -
Fixed an issue that would cause multiple instances of the same class to behave incorrectly if any of the given constructor arguments defaulted to a mutable value (e.g. lists, dictionaries, objects).
-
Fixed an issue with
CartesiaTTSService
whereTTSTextFrame
messages weren't being emitted when the model was set tosonic
. This resulted in the assistant context not being updated with assistant messages.
Performance
-
DailyTransport
: process audio, video and events in separate tasks. -
Don't create event handler tasks if no user event handlers have been registered.
Other
-
It is now possible to run all (or most) foundational example with multiple transports. By default, they run with P2P (Peer-To-Peer) WebRTC so you can try everything locally. You can also run them with Daily or even with a Twilio phone number.
-
Added foundation examples
07y-interruptible-minimax.py
and07z-interruptible-sarvam.py
to show how to use theMiniMaxHttpTTSService
andSarvamTTSService
, respectively. -
Added an
open-telemetry-tracing
example, showing how to setup tracing. The example also includes Jaeger as an open source OpenTelemetry client to review traces from the example runs. -
Added foundational example
29-turn-tracking-observer.py
to show how to use theTurnTrackingObserver
.
v0.0.67
Added
-
Added
DebugLogObserver
for detailed frame logging with configurable filtering by frame type and endpoint. This observer automatically extracts and formats all frame data fields for debug logging. -
UserImageRequestFrame.video_source
field has been added to request an image from the desired video source. -
Added support for the AWS Nova Sonic speech-to-speech model with the new
AWSNovaSonicLLMService
.
See https://docs.aws.amazon.com/nova/latest/userguide/speech.html.
Note that it requires Python >= 3.12 andpip install pipecat-ai[aws-nova-sonic]
. -
Added new AWS services
AWSBedrockLLMService
andAWSTranscribeSTTService
. -
Added
on_active_speaker_changed
event handler to theDailyTransport
class. -
Added
enable_ssml_parsing
andenable_logging
toInputParams
inElevenLabsTTSService
. -
Added support to
RimeHttpTTSService
for thearcana
model.
Changed
-
Updated
ElevenLabsTTSService
to use the beta websocket API (multi-stream-input). This new API supports context_ids and cancelling those contexts, which greatly improves interruption handling. -
Observers
on_push_frame()
now take a single argumentFramePushed
instead of multiple arguments. -
Updated the default voice for
DeepgramTTSService
toaura-2-helena-en
.
Deprecated
-
PollyTTSService
is now deprecated, useAWSPollyTTSService
instead. -
Observer
on_push_frame(src, dst, frame, direction, timestamp)
is now deprecated, useon_push_frame(data: FramePushed)
instead.
Fixed
-
Fixed a
DailyTransport
issue that was causing issues when multiple audio or video sources where being captured. -
Fixed a
UltravoxSTTService
issue that would cause the service to generate all tokens as one word. -
Fixed a
PipelineTask
issue that would cause tasks to not be cancelled if task was cancelled from outside of Pipecat. -
Fixed a
TaskManager
that was causing dangling tasks to be reported. -
Fixed an issue that could cause data to be sent to the transports when they were still not ready.
-
Remove custom audio tracks from
DailyTransport
before leaving.
Removed
- Removed
CanonicalMetricsService
as it's no longer maintained.
v0.0.66
Added
-
Added two new input parameters to
RimeTTSService
:pause_between_brackets
andphonemize_between_brackets
. -
Added support for cross-platform local smart turn detection. You can use
LocalSmartTurnAnalyzer
for on-device inference using Torch. -
BaseOutputTransport
now allows multiple destinations if the transport implementation supports it (e.g. Daily's custom tracks). With multiple destinations it is possible to send different audio or video tracks with a single transport simultaneously. To do that, you need to set the newFrame.transport_destination
field with your desired transport destination (e.g. custom track name), tell the transport you want a new destination withTransportParams.audio_out_destinations
orTransportParams.video_out_destinations
and the transport should take care of the rest. -
Similar to the new
Frame.transport_destination
, there's a newFrame.transport_source
field which is set by theBaseInputTransport
if the incoming data comes from a non-default source (e.g. custom tracks). -
TTSService
has a newtransport_destination
constructor parameter. This parameter will be used to update theFrame.transport_destination
field for each generatedTTSAudioRawFrame
. This allows sending multiple bots' audio to multiple destinations in the same pipeline. -
Added
DailyTransportParams.camera_out_enabled
andDailyTransportParams.microphone_out_enabled
which allows you to enable/disable the main output camera or microphone tracks. This is useful if you only want to use custom tracks and not send the main tracks. Note that you still needaudio_out_enabled=True
orvideo_out_enabled
. -
Added
DailyTransport.capture_participant_audio()
which allows you to capture an audio source (e.g. "microphone", "screenAudio" or a custom track name) from a remote participant. -
Added
DailyTransport.update_publishing()
which allows you to update the call video and audio publishing settings (e.g. audio and video quality). -
Added
RTVIObserverParams
which allows you to configure what RTVI messages are sent to the clients. -
Added a
context_window_compression
InputParam toGeminiMultimodalLiveLLMService
which allows you to enable a sliding context window for the session as well as set the token limit of the sliding window. -
Updated
SmallWebRTCConnection
to supportice_servers
with credentials. -
Added
VADUserStartedSpeakingFrame
andVADUserStoppedSpeakingFrame
, indicating when the VAD detected the user to start and stop speaking. These events are helpful when using smart turn detection, as the user's stop time can differ from when their turn ends (signified by UserStoppedSpeakingFrame). -
Added
TranslationFrame
, a new frame type that contains a translated transcription. -
Added
TransportParams.audio_in_passthrough
. If set (the default), incoming audio will be pushed downstream. -
Added
MCPClient
; a way to connect to MCP servers and use the MCP servers' tools. -
Added
Mem0 OSS
, along with Mem0 cloud support now the OSS version is also available.
Changed
TransportParams.audio_mixer
now supports a string and also a dictionary to provide a mixer per destination. For example:
audio_out_mixer={
"track-1": SoundfileMixer(...),
"track-2": SoundfileMixer(...),
"track-N": SoundfileMixer(...),
},
-
The
STTMuteFilter
now mutesInterimTranscriptionFrame
andTranscriptionFrame
which allows theSTTMuteFilter
to be used in conjunction with transports that generate transcripts, e.g.DailyTransport
. -
Function calls now receive a single parameter
FunctionCallParams
instead of(function_name, tool_call_id, args, llm, context, result_callback)
which is now deprecated. -
Changed the user aggregator timeout for late transcriptions from 1.0s to 0.5s (
LLMUserAggregatorParams.aggregation_timeout
). Sometimes, the STT services might give us more than one transcription which could come after the user stopped speaking. We still want to include these additional transcriptions with the first one because it's part of the user turn. This is what this timeout is helpful with. -
Short utterances not detected by VAD while the bot is speaking are now ignored. This reduces the amount of bot interruptions significantly providing a more natural conversation experience.
-
Updated
GladiaSTTService
to output aTranslationFrame
when specifying atranslation
andtranslation_config
. -
STT services now passthrough audio frames by default. This allows you to add audio recording without worrying about what's wrong in your pipeline when it doesn't work the first time.
-
Input transports now always push audio downstream unless disabled with
TransportParams.audio_in_passthrough
. After many Pipecat releases, we realized this is the common use case. There are use cases where the input transport already provides STT and you also don't want recordings, in which case there's no need to push audio to the rest of the pipeline, but this is not a very common case. -
Added
RivaSegmentedSTTService
, which allows Riva offline/batch models, such as to be "canary-1b-asr" used in Pipecat.
Deprecated
-
Function calls with parameters
(function_name, tool_call_id, args, llm, context, result_callback)
are deprectated, use a singleFunctionCallParams
parameter instead. -
TransportParams.camera_*
parameters are now deprecated, useTransportParams.video_*
instead. -
TransportParams.vad_enabled
parameter is now deprecated, useTransportParams.audio_in_enabled
andTransportParams.vad_analyzer
instead. -
TransportParams.vad_audio_passthrough
parameter is now deprecated, useTransportParams.audio_in_passthrough
instead. -
ParakeetSTTService
is now deprecated, useRivaSTTService
instead, which uses the model "parakeet-ctc-1.1b-asr" by default. -
FastPitchTTSService
is now deprecated, useRivaTTSService
instead, which uses the model "magpie-tts-multilingual" by default.
Fixed
-
Fixed an issue with
SimliVideoService
where the bot was continuously outputting audio, which prevents theBotStoppedSpeakingFrame
from being emitted. -
Fixed an issue where
OpenAIRealtimeBetaLLMService
would add two assistant messages to the context. -
Fixed an issue with
GeminiMultimodalLiveLLMService
where the context contained tokens instead of words. -
Fixed an issue with HTTP Smart Turn handling, where the service returns a 500 error. Previously, this would cause an unhandled exception. Now, a 500 error is treated as an incomplete response.
-
Fixed a TTS services issue that could cause assistant output not to be aggregated to the context when also using
TTSSpeakFrame
s. -
Fixed an issue where the
SmartTurnMetricsData
was reporting 0ms for inference and processing time when using theFalSmartTurnAnalyzer
.
Other
-
Added
examples/daily-custom-tracks
to show how to send and receive Daily custom tracks. -
Added
examples/daily-multi-translation
to showcase how to send multiple simulataneous translations with the same transport. -
Added 04 foundational examples for client/server transports. Also, renamed
29-livekit-audio-chat.py
to04b-transports-livekit.py
. -
Added foundational example
13c-gladia-translation.py
showing how to useTranscriptionFrame
andTranslationFrame
.
v0.0.65
https://en.wikipedia.org/wiki/Saint_George%27s_Day_in_Catalonia
Added
-
Added automatic hangup logic to the Telnyx serializer. This feature hangs up the Telnyx call when an
EndFrame
orCancelFrame
is received. It is enabled by default and is configurable via theauto_hang_up
InputParam
. -
Added a keepalive task to
GladiaSTTService
to prevent the websocket from disconnecting after 30 seconds of no audio input.
Changed
-
The
InputParams
forElevenLabsTTSService
andElevenLabsHttpTTSService
no longer require thatstability
andsimilarity_boost
be set. You can individually set each param. -
In
TwilioFrameSerializer
,call_sid
is Optional so as to avoid a breaking changed.call_sid
is required to automatically hang up.
Fixed
- Fixed an issue where
TwilioFrameSerializer
would send two hang up commands: one for theEndFrame
and one for theCancelFrame
.
v0.0.64
Added
-
Added automatic hangup logic to the Twilio serializer. This feature hangs up the Twilio call when an
EndFrame
orCancelFrame
is received. It is enabled by default and is configurable via theauto_hang_up
InputParam
. -
Added
SmartTurnMetricsData
, which contains end-of-turn prediction metrics, to theMetricsFrame
. UsingMetricsFrame
, you can now retrieve prediction confidence scores and processing time metrics from the smart turn analyzers. -
Added support for Application Default Credentials in Google services,
GoogleSTTService
,GoogleTTSService
, andGoogleVertexLLMService
. -
Added support for Smart Turn Detection via the
turn_analyzer
transport parameter. You can now choose betweenHttpSmartTurnAnalyzer()
orFalSmartTurnAnalyzer()
for remote inference orLocalCoreMLSmartTurnAnalyzer()
for on-device inference using Core ML. -
DeepgramTTSService
acceptsbase_url
argument again, allowing you to connect to an on-prem service. -
Added
LLMUserAggregatorParams
andLLMAssistantAggregatorParams
which allow you to control aggregator settings. You can now pass these arguments when creating aggregator pairs withcreate_context_aggregator()
. -
Added
previous_text
context support to ElevenLabsHttpTTSService, improving speech consistency across sentences within an LLM response. -
Added word/timestamp pairs to
ElevenLabsHttpTTSService
. -
It is now possible to disable
SoundfileMixer
when created. You can then useMixerEnableFrame
to dynamically enable it when necessary. -
Added
on_client_connected
andon_client_disconnected
event handlers to theDailyTransport
class. These handlers map to the same underlying Daily events ason_participant_joined
andon_participant_left
, respectively. This makes it easier to write a single bot pipeline that can also use other transports likeSmallWebRTCTransport
andFastAPIWebsocketTransport
.
Changed
-
GrokLLMService
now usesgrok-3-beta
as its default model. -
Daily's REST helpers now include an
eject_at_token_exp
param, which ejects the user when their token expires. This new parameter defaults to False. Also, the default value forenable_prejoin_ui
changed to False andeject_at_room_exp
changed to False. -
OpenAILLMService
andOpenPipeLLMService
now usegpt-4.1
as their default model. -
SoundfileMixer
constructor arguments need to be keywords.
Deprecated
DeepgramSTTService
parameterurl
is now deprecated, usebase_url
instead.
Removed
- Parameters
user_kwargs
andassistant_kwargs
when creating a context aggregator pair usingcreate_context_aggregator()
have been removed. Useuser_params
andassistant_params
instead.
Fixed
-
Fixed an issue that would cause TTS websocket-based services to not cleanup resources properly when disconnecting.
-
Fixed a
TavusVideoService
issue that was causing audio choppiness. -
Fixed an issue in
SmallWebRTCTransport
where an error was thrown if the client did not create a video transceiver. -
Fixed an issue where LLM input parameters were not working and applied correctly in
GoogleVertexLLMService
, causing unexpected behavior during inference.
Other
- Updated the
twilio-chatbot
example to use the auto-hangup feature.
v0.0.63
Added
-
Added media resolution control to
GeminiMultimodalLiveLLMService
withGeminiMediaResolution
enum, allowing configuration of token usage for image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing with 256 tokens). -
Added Gemini's Voice Activity Detection (VAD) configuration to
GeminiMultimodalLiveLLMService
withGeminiVADParams
, allowing fine control over speech detection sensitivity and timing, including:- Start sensitivity (how quickly speech is detected)
- End sensitivity (how quickly turns end after pauses)
- Prefix padding (milliseconds of audio to keep before speech is detected)
- Silence duration (milliseconds of silence required to end a turn)
-
Added comprehensive language support to
GeminiMultimodalLiveLLMService
, supporting over 30 languages via thelanguage
parameter, with proper mapping between Pipecat'sLanguage
enum and Gemini's language codes. -
Added support in
SmallWebRTCTransport
to detect when remote tracks are muted. -
Added support for image capture from a video stream to the
SmallWebRTCTransport
. -
Added a new iOS client option to the
SmallWebRTCTransport
video-transform example. -
Added new processors
ProducerProcessor
andConsumerProcessor
. The producer processor processes frames from the pipeline and decides whether the consumers should consume it or not. If so, the same frame that is received by the producer is sent to the consumer. There can be multiple consumers per producer. These processors can be useful to push frames from one part of a pipeline to a different one (e.g. when usingParallelPipeline
). -
Improvements for the
SmallWebRTCTransport
:- Wait until the pipeline is ready before triggering the
connected
event. - Queue messages if the data channel is not ready.
- Update the aiortc dependency to fix an issue where the 'video/rtx' MIME type was incorrectly handled as a codec retransmission.
- Avoid initial video delays.
- Wait until the pipeline is ready before triggering the
Changed
-
In
GeminiMultimodalLiveLLMService
, removed thetranscribe_model_audio
parameter in favor of Gemini Live's native output transcription support. Now text transcriptions are produced directly by the model. No configuration is required. -
Updated
GeminiMultimodalLiveLLMService
’s defaultmodel
tomodels/gemini-2.0-flash-live-001
andbase_url
to thev1beta
websocket URL.
Fixed
-
Updated
daily-python
to 0.17.0 to fix an issue that was preventing to run on older platforms. -
Fixed an issue where
CartesiaTTSService
's spell feature would result in the spelled word in the context appearing as "F,O,O,B,A,R" instead of "FOOBAR". -
Fixed an issue in the Azure TTS services where the language was being set incorrectly.
-
Fixed
SmallWebRTCTransport
to support dynamic values forTransportParams.audio_out_10ms_chunks
. Previously, it only worked with 20ms chunks. -
Fixed an issue with
GeminiMultimodalLiveLLMService
where the assistant context messages had no space between words. -
Fixed an issue where
LLMAssistantContextAggregator
would prevent aBotStoppedSpeakingFrame
from moving through the pipeline.
v0.0.62
Added
-
Added
TransportParams.audio_out_10ms_chunks
parameter to allow controlling the amount of audio being sent by the output transport. It defaults to 2, so 20ms audio chunks are sent. -
Added
QwenLLMService
for Qwen integration with an OpenAI-compatible interface. Added foundational example14q-function-calling-qwen.py
. -
Added
Mem0MemoryService
. Mem0 is a self-improving memory layer for LLM applications. Learn more at: https://mem0.ai/. -
Added
WhisperSTTServiceMLX
for Whisper transcription on Apple Silicon. See example inexamples/foundational/13e-whisper-mlx.py
. Latency of completed transcription using Whisper large-v3-turbo on an M4 macbook is ~500ms. -
GladiaSTTService
now have comprehensive support for the latest API config options, including model, language detection, preprocessing, custom vocabulary, custom spelling, translation, and message filtering options. -
Added
SmallWebRTCTransport
, a new P2P WebRTC transport.- Created two examples in
p2p-webrtc
:- video-transform: Demonstrates sending and receiving audio/video with
SmallWebRTCTransport
usingTypeScript
. Includes video frame processing with OpenCV. - voice-agent: A minimal example of creating a voice agent with
SmallWebRTCTransport
.
- video-transform: Demonstrates sending and receiving audio/video with
- Created two examples in
-
Added support to
ProtobufFrameSerializer
to send the messages fromTransportMessageFrame
andTransportMessageUrgentFrame
. -
Added support for a new TTS service,
PiperTTSService
.
(see https://github.com/rhasspy/piper/) -
It is now possible to tell whether
UserStartedSpeakingFrame
orUserStoppedSpeakingFrame
have been generated because of emulation frames.
Changed
-
FunctionCallResultFrame
a are now system frames. This is to prevent function call results to be discarded during interruptions. -
Pipecat services have been reorganized into packages. Each package can have one or more of the following modules (in the future new module names might be needed) depending on the services implemented:
- image: for image generation services
- llm: for LLM services
- memory: for memory services
- stt: for Speech-To-Text services
- tts: for Text-To-Speech services
- video: for video generation services
- vision: for video recognition services
-
Base classes for AI services have been reorganized into modules. They can now be found in
pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]
. -
GladiaSTTService
now uses thesolaria-1
model by default. Other params use Gladia's default values. Added support for more language codes.
Deprecated
-
All Pipecat services imports have been deprecated and a warning will be shown when using the old import. The new import should be
pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]
. For example,from pipecat.services.openai.llm import OpenAILLMService
. -
Import for AI services base classes from
pipecat.services.ai_services
is now deprecated, use one ofpipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]
. -
Deprecated the
language
parameter inGladiaSTTService.InputParams
in favor oflanguage_config
, which better aligns with Gladia's API. -
Deprecated using
GladiaSTTService.InputParams
directly. Use the newGladiaInputParams
class instead.
Fixed
-
Fixed a
FastAPIWebsocketTransport
andWebsocketClientTransport
issue that would cause the transport to be closed prematurely, preventing the internally queued audio to be sent. The same issue could also cause an infinite loop while using an output mixer and when sending anEndFrame
, preventing the bot to finish. -
Fixed an issue that could cause the
TranscriptionUpdateFrame
being pushed because of an interruption to be discarded. -
Fixed an issue that would cause
SegmentedSTTService
based services (e.g.OpenAISTTService
) to try to transcribe non-spoken audio, causing invalid transcriptions. -
Fixed an issue where
GoogleTTSService
was emitting twoTTSStoppedFrames
.
Performance
-
Output transports now send 40ms audio chunks instead of 20ms. This should improve performance.
-
BotSpeakingFrame
s are now sent every 200ms. If the output transport audio chunks are higher than 200ms then they will be sent at every audio chunk.
Other
-
Added foundational example
37-mem0.py
demonstrating how to use theMem0MemoryService
. -
Added foundational example
13e-whisper-mlx.py
demonstrating how to use theWhisperSTTServiceMLX
.
v0.0.61
Added
-
Added a new frame,
LLMSetToolChoiceFrame
, which provides a mechanism for modifying thetool_choice
in the context. -
Added
GroqTTSService
which provides text-to-speech functionality using Groq's API. -
Added support in
DailyTransport
for updating remote participants'canReceive
permission via theupdate_remote_participants()
method, by bumping the daily-python dependency to >= 0.16.0. -
ElevenLabs TTS services now support a sample rate of 8000.
-
Added support for
instructions
inOpenAITTSService
. -
Added support for
base_url
inOpenAIImageGenService
andOpenAITTSService
.
Fixed
-
Fixed an issue in
RTVIObserver
that prevented handling of Google LLM context messages. The observer now processes both OpenAI-style and Google-style contexts. -
Fixed an issue in Daily involving switching virtual devices, by bumping the daily-python dependency to >= 0.16.1.
-
Fixed a
GoogleAssistantContextAggregator
issue where function calls placeholders where not being updated when then function call result was different from a string. -
Fixed an issue that would cause
LLMAssistantContextAggregator
to block processing more frames while processing a function call result. -
Fixed an issue where the
RTVIObserver
would report two bot started and stopped speaking events for each bot turn. -
Fixed an issue in
UltravoxSTTService
that caused improper audio processing and incorrect LLM frame output.
Other
- Added
examples/foundational/07x-interruptible-local.py
to show how a local transport can be used.
v0.0.60
Added
- Added
default_headers
parameter toBaseOpenAILLMService
constructor.
Changed
-
Rollback to
deepgram-sdk
3.8.0 since 3.10.1 was causing connections issues. -
Changed the default
InputAudioTranscription
model togpt-4o-transcribe
forOpenAIRealtimeBetaLLMService
.
Other
- Update the
19-openai-realtime-beta.py
and19a-azure-realtime-beta.py
examples to use the FunctionSchema format.
v0.0.59
Added
-
When registering a function call it is now possible to indicate if you want the function call to be cancelled if there's a user interruption via
cancel_on_interruption
(defaults to False). This is now possible because function calls are executed concurrently. -
Added support for detecting idle pipelines. By default, if no activity has been detected during 5 minutes, the
PipelineTask
will be automatically cancelled. It is possible to override this behavior by passingcancel_on_idle_timeout=False
. It is also possible to change the default timeout withidle_timeout_secs
or the frames that prevent the pipeline from being idle withidle_timeout_frames
. Finally, anon_idle_timeout
event handler will be triggered if the idle timeout is reached (whether the pipeline task is cancelled or not). -
Added
FalSTTService
, which provides STT for Fal's Wizper API. -
Added a
reconnect_on_error
parameter to websocket-based TTS services as well as aon_connection_error
event handler. Thereconnect_on_error
indicates whether the TTS service should reconnect on error. Theon_connection_error
will always get called if there's any error no matter the value ofreconnect_on_error
. This allows, for example, to fallback to a different TTS provider if something goes wrong with the current one. -
Added new
SkipTagsAggregator
that extendsBaseTextAggregator
to aggregate text and skips end of sentence matching if aggregated text is between start/end tags. -
Added new
PatternPairAggregator
that extendsBaseTextAggregator
to identify content between matching pattern pairs in streamed text. This allows for detection and processing of structured content like XML-style tags that may span across multiple text chunks or sentence boundaries. -
Added new
BaseTextAggregator
. Text aggregators are used by the TTS service to aggregate LLM tokens and decide when the aggregated text should be pushed to the TTS service. They also allow for the text to be manipulated while it's being aggregated. A text aggregator can be passed viatext_aggregator
to the TTS service. -
Added new
sample_rate
constructor parameter toTavusVideoService
to allow changing the output sample rate. -
Added new
NeuphonicTTSService
.
(see https://neuphonic.com) -
Added new
UltravoxSTTService
.
(see https://github.com/fixie-ai/ultravox) -
Added
on_frame_reached_upstream
andon_frame_reached_downstream
event handlers toPipelineTask
. Those events will be called when a frame reaches the beginning or end of the pipeline respectively. Note that by default, the event handlers will not be called unless a filter is set withPipelineTask.set_reached_upstream_filter()
orPipelineTask.set_reached_downstream_filter()
. -
Added support for Chirp voices in
GoogleTTSService
. -
Added a
flush_audio()
method toFishTTSService
andLmntTTSService
. -
Added a
set_language
convenience method forGoogleSTTService
, allowing you to set a single language. This is in addition to theset_languages
method which allows you to set a list of languages. -
Added
on_user_turn_audio_data
andon_bot_turn_audio_data
toAudioBufferProcessor
. This gives the ability to grab the audio of only that turn for both the user and the bot. -
Added new base class
BaseObject
which is now the base class ofFrameProcessor
,PipelineRunner
,PipelineTask
andBaseTransport
. The newBaseObject
adds supports for event handlers. -
Added support for a unified format for specifying function calling across all LLM services.
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
-
Added
speech_threshold
parameter toGladiaSTTService
. -
Allow passing user (
user_kwargs
) and assistant (assistant_kwargs
) context aggregator parameters when usingcreate_context_aggregator()
. The values are passed as a mapping that will then be converted to arguments. -
Added
speed
as anInputParam
for bothElevenLabsTTSService
andElevenLabsHttpTTSService
. -
Added new
LLMFullResponseAggregator
to aggregate full LLM completions. At every completion theon_completion
event handler is triggered. -
Added a new frame,
RTVIServerMessageFrame
, and RTVI messageRTVIServerMessage
which provides a generic mechanism for sending custom messages from server to client. TheRTVIServerMessageFrame
is processed by theRTVIObserver
and will be delivered to the client'sonServerMessage
callback orServerMessage
event. -
Added
GoogleLLMOpenAIBetaService
for Google LLM integration with an OpenAI-compatible interface. Added foundational example14o-function-calling-gemini-openai-format.py
. -
Added
AzureRealtimeBetaLLMService
to support Azure's OpeanAI Realtime API. Added foundational example19a-azure-realtime-beta.py
. -
Introduced
GoogleVertexLLMService
, a new class for integrating with Vertex AI Gemini models. Added foundational example14p-function-calling-gemini-vertex-ai.py
. -
Added support in
OpenAIRealtimeBetaLLMService
for a slate of new features:-
The
'gpt-4o-transcribe'
input audio transcription model, along with newlanguage
andprompt
options specific to that model. -
The
input_audio_noise_reduction
session property.session_properties = SessionProperties( # ... input_audio_noise_reduction=InputAudioNoiseReduction( type="near_field" # also supported: "far_field" ) # ... )
-
The
'semantic_vad'
turn_detection
session property value, a more sophisticated model for detecting when the user has stopped speaking. -
on_conversation_item_created
andon_conversation_item_updated
events toOpenAIRealtimeBetaLLMService
.@llm.event_handler("on_conversation_item_created") async def on_conversation_item_created(llm, item_id, item): # ... @llm.event_handler("on_conversation_item_updated") async def on_conversation_item_updated(llm, item_id, item): # `item` may not always be available here # ...
-
The
retrieve_conversation_item(item_id)
method for introspecting a conversation item on the server.item = await llm.retrieve_conversation_item(item_id)
-
Changed
-
Updated
OpenAISTTService
to usegpt-4o-transcribe
as the default transcription model. -
Updated
OpenAITTSService
to usegpt-4o-mini-tts
as the default TTS model. -
Function calls are now executed in tasks. This means that the pipeline will not be blocked while the function call is being executed.
-
⚠️ PipelineTask
will now be automatically cancelled if no bot activity is happening in the pipeline. There are a few settings to configure this behavior, seePipelineTask
documentation for more details. -
All event handlers are now executed in separate tasks in order to prevent blocking the pipeline. It is possible that event handlers take some time to execute in which case the pipeline would be blocked waiting for the event handler to complete.
-
Updated
TranscriptProcessor
to support text output fromOpenAIRealtimeBetaLLMService
. -
OpenAIRealtimeBetaLLMService
andGeminiMultimodalLiveLLMService
now push aTTSTextFrame
. -
Updated the default mode for
CartesiaTTSService
andCartesiaHttpTTSService
tosonic-2
.
Deprecated
-
Passing a
start_callback
toLLMService.register_function()
is now deprecated, simply move the code from the start callback to the function call. -
TTSService
parametertext_filter
is now deprecated, usetext_filters
instead which is now a list. This allows passing multiple filters that will be executed in order.
Removed
-
Removed deprecated
audio.resample_audio()
, usecreate_default_resampler()
instead. -
Removed deprecated
stt_service
parameter fromSTTMuteFilter
. -
Removed deprecated RTVI processors, use an
RTVIObserver
instead. -
Removed deprecated
AWSTTSService
, usePollyTTSService
instead. -
Removed deprecated field
tier
fromDailyTranscriptionSettings
, usemodel
instead. -
Removed deprecated
pipecat.vad
package, usepipecat.audio.vad
instead.
Fixed
-
Fixed an assistant aggregator issue that could cause assistant text to be split into multiple chunks during function calls.
-
Fixed an assistant aggregator issue that was causing assistant text to not be added to the context during function calls. This could lead to duplications.
-
Fixed a
SegmentedSTTService
issue that was causing audio to be sent prematurely to the STT service. Instead of analyzing the volume in this service we rely on VAD events which use both VAD and volume. -
Fixed a
GeminiMultimodalLiveLLMService
issue that was causing messages to be duplicated in the context when pushingLLMMessagesAppendFrame
frames. -
Fixed an issue with
SegmentedSTTService
based services (e.g.GroqSTTService
) that was not allow audio to pass-through downstream. -
Fixed a
CartesiaTTSService
andRimeTTSService
issue that would consider text between spelling out tags end of sentence. -
Fixed a
match_endofsentence
issue that would result in floating point numbers to be considered an end of sentence. -
Fixed a
match_endofsentence
issue that would result in emails to be considered an end of sentence. -
Fixed an issue w...