Releases: pipecat-ai/pipecat
v0.0.83
Added
-
Added new frames
InputTransportMessageUrgentFrame
andDailyInputTransportMessageUrgentFrame
for transport messages received from external sources. -
Added
UserSpeakingFrame
. This will be sent upstream and downstream while VAD detects the user is speaking. -
Expanded support for universal
LLMContext
to more LLM services. Using the universalLLMContext
and associatedLLMContextAggregatorPair
is a pre-requisite for usingLLMSwitcher
to switch between LLMs at runtime. Here are the newly-supported services:- Azure
- Cerebras
- Deepseek
- Fireworks AI
- Google Vertex AI
- Grok
- Groq
- Mistral
- NVIDIA NIM
- Ollama
- OpenPipe
- OpenRouter
- Perplexity
- Qwen
- SambaNova
- Together.ai
-
Added support for WhatsApp User-initiated Calls.
-
Added new audio filter
AICFilter
, speech enhancement for improving VAD/STT performance, no ONNX dependency.
See https://ai-coustics.com/sdk/ -
Added a timeout around cancel input tasks to prevent indefinite hangs when cancellation is swallowed by third-party code.
-
Added
pipecat.extensions.ivr
for automated IVR system navigation with configurable goals and conversation handling. Supports DTMF input, verbal responses, and intelligent menu traversal.Basic usage:
from pipecat.extensions.ivr.ivr_navigator import IVRNavigator # Create IVR navigator with your goal ivr_navigator = IVRNavigator( llm=llm_service, ivr_prompt="Navigate to billing department to dispute a charge" ) # Handle different outcomes @ivr_navigator.event_handler("on_conversation_detected") async def on_conversation(processor, conversation_history): # Switch to normal conversation mode pass @ivr_navigator.event_handler("on_ivr_status_changed") async def on_ivr_status(processor, status): if status == IVRStatus.COMPLETED: # End pipeline, transfer call, or start bot conversation elif status == IVRStatus.STUCK: # Handle navigation failure
-
BaseOutputTransport
now implementswrite_dtmf()
by loading DTMF audio and sending it through the transport. This makes sending DTMF generic across all output transports. -
Added new config parameters to
GladiaSTTService
.- PreProcessingConfig >
audio_enhancer
to enhance audio quality. - CustomVocabularyItem >
pronunciations
andlanguage
to specify special pronunciations and in which language it will be pronounced.
- PreProcessingConfig >
Changed
-
UserStartedSpeakingFrame
andUserStoppedSpeakingFrame
are also pushed
upstream. -
ParallelPipeline
now waits forCancelFrame
to finish in all branches before pushing it downstream. -
Added
sip_codecs
to theDailyRoomSipParams
. -
Updated the
configure()
function inpipecat.runner.daily
to include new args to create SIP-enabled rooms. Additionally, added new args to control the room and token expiration durations. -
pipecat.frames.frames.KeypadEntry
is deprecated and has been moved topipecat.audio.dtmf.types.KeypadEntry
. -
Updated
RimeTTSService
's flush_audio message to conform with Rime's official API. -
Updated the default model for
CerebrasLLMService
to GPT-OSS-120B.
Removed
-
Remove
StopInterruptionFrame
. This was a legacy frame that was not being used really anywhere and it didn't provide any useful meaning. It was only pushed afterUserStoppedSpeakingFrame
, so developers can just useUserStoppedSpeakingFrame
. -
DailyTransport.write_dtmf()
has been removed in favor of the genericBaseOutputTransport.write_dtmf()
. -
Remove deprecated
DailyTransport.send_dtmf()
.
Deprecated
-
Transports have been re-organized.
pipecat.transports.network.small_webrtc -> pipecat.transports.smallwebrtc.transport pipecat.transports.network.webrtc_connection -> pipecat.transports.smallwebrtc.connection pipecat.transports.network.websocket_client -> pipecat.transports.websocket.client pipecat.transports.network.websocket_server -> pipecat.transports.websocket.server pipecat.transports.network.fastapi_websocket -> pipecat.transports.websocket.fastapi pipecat.transports.services.daily -> pipecat.transports.daily.transport pipecat.transports.services.helpers.daily_rest -> pipecat.transports.daily.utils pipecat.transports.services.livekit -> pipecat.transports.livekit.transport pipecat.transports.services.tavus -> pipecat.transports.tavus.transport
-
pipecat.frames.frames.KeypadEntry
is deprecated usepipecat.audio.dtmf.types.KeypadEntry
instead.
Fixed
-
Fixed an issue where messages received from the transport were always being resent.
-
Fixed
SmallWebRTCTransport
to not usemid
to decide if the transceiver should besendrecv
or not. -
Fixed an issue where Deepgram swallowed
asyncio.CancelledError
during disconnect, preventing tasks from being cancelled. -
Fixed an issue where
PipelineTask
was not cleaning up the observers.
Performance
- Reduced latency and improved memory performance in
Mem0MemoryService
.
v0.0.82
Added
-
Added a new
LLMRunFrame
to trigger an LLM response:await task.queue_frames([LLMRunFrame()])
This replaces
OpenAILLMContextFrame
, which you’d previously typically use like this:await task.queue_frames([context_aggregator.user().get_context_frame()])
Use this way of kicking off your conversation when you’ve already initialized your context and are simply instructing the bot when to go:
context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # ... @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. await task.queue_frames([LLMRunFrame()])
Note that if you want to add new messages when kicking off the conversation, you could use
LLMMessagesAppendFrame
withrun_llm=True
instead:@transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. await task.queue_frames([LLMMessagesAppendFrame(new_messages, run_llm=True)])
In the rare case you don’t have a context aggregator in your pipeline, then you may continue using a context frame.
-
Added support for switching between audio+text to text-only modes within the same pipeline. This is done by pushing
LLMConfigureOutputFrame(skip_tts=True)
to enter text-only mode, and disabling it to return to audio+text. The LLM will still generate tokens and add them to the context, but they will not be sent to TTS. -
Added
skip_tts
field toTextFrame
. This lets a text frame bypass TTS while still being included in the LLM context. Useful for cases like structured text that isn’t meant to be spoken but should still contribute to context. -
Added a
cancel_timeout_secs
argument toPipelineTask
which defines how long the pipeline has to complete cancellation. WhenPipelineTask.cancel()
is called, aCancelFrame
is pushed through the pipeline and must reach the end. If it does not reach the end within the specified time, a warning is shown and the wait is aborted. -
Added a new "universal" (LLM-agnostic)
LLMContext
and accompanyingLLMContextAggregatorPair
, which will eventually replaceOpenAILLMContext
(and the other under-the-hood contexts) and the other context aggregators. The new universalLLMContext
machinery allows a single context to be shared between different LLMs, enabling runtime LLM switching and scenarios like failover.From the developer's point of view, switching to using the new universal context machinery will usually be a matter of going from this:
context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context)
To this:
context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
To start, the universal
LLMContext
is supported with the following LLM services:OpenAILLMService
GoogleLLMService
-
Added a new
LLMSwitcher
class to enable runtime LLM switching, built atop a new genericServiceSwitcher
.Switchers take a switching strategy. The first available strategy is
ServiceSwitcherStrategyManual
.To switch LLMs at runtime, the LLMs must be sharing one instance of the new universal
LLMContext
(see above bullet).# Instantiate your LLM services llm_openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm_google = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY")) # Instantiate a switcher # (ServiceSwitcherStrategyManual defaults to OpenAI, as it's first in the list) llm_switcher = LLMSwitcher( llms=[llm_openai, llm_google], strategy_type=ServiceSwitcherStrategyManual ) # Create your pipeline pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm_switcher, tts, transport.output(), context_aggregator.assistant(), ] ) task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True)) # ... # Whenever is appropriate, switch LLMs! await task.queue_frames([ManuallySwitchServiceFrame(service=llm_google)])
-
Added an
LLMService.run_inference()
method to LLM services to enable direct, out-of-band (i.e. out-of-pipeline) inference.
Changed
-
Updated
daily-python
to 0.19.8. -
PipelineTask
now waits forStartFrame
to reach the end of the pipeline before pushing any other frames. -
Updated
CartesiaTTSService
andCartesiaHttpTTSService
to align with Cartesia's changes for thespeed
parameter. It now takes only an enum ofslow
,normal
, orfast
. -
Added support to
AWSBedrockLLMService
for setting authentication credentials through environment variables. -
Updated
SarvamTTSService
to use WebSocket streaming for real-time audio generation with multiple Indian languages, with HTTP support still available viaSarvamHttpTTSService
.
Fixed
-
Fixed an RTVI issue that was causing frames to be pushed before pipeline was properly initialized.
-
Fixed some
get_messages_for_logging()
that were returning a JSON string instead of a list. -
Fixed a
DailyTransport
issue that prevented DTMF tones from being sent. -
Fixed a missing import in
SentryMetrics
. -
Fixed
AWSPollyTTSService
to support AWS credential provider chain (IAM roles, IRSA, instance profiles) instead of requiring explicit environment variables. -
Fixed a
CartesiaTTSService
issue that was causing the application to hang after Cartesia's 5 minutes timed out. -
Fixed an issue preventing
SpeechmaticsSTTService
from transcribing audio.
v0.0.81
Added
-
Added
pipecat.extensions.voicemail
, a module for detecting voicemail vs. live conversation, primarily intended for use in outbound calling scenarios. The voicemail module is optimized for text LLMs only. -
Added new frames to the
idle_timeout_frames
arg:TranscriptionFrame
,InterimTranscriptionFrame
,UserStartedSpeakingFrame
, andUserStoppedSpeakingFrame
. These additions serve as indicators of user activity in the pipeline idle detection logic. -
Allow passing custom pipeline sink and source processors to a
Pipeline
. Pipeline source and sink processors are used to know and control what's coming in and out of aPipeline
processor. -
Added
FrameProcessor.pause_processing_system_frames()
andFrameProcessor.resume_processing_system_frames()
. These allow to pause and resume the processing of system frame. -
Added new
on_process_frame()
observer method which makes it possible to know when a frame is being processed. -
Added new
FrameProcessor.entry_processor()
method. This allows you to access the first non-compound processor in a pipeline. -
Added
FrameProcessor
propertiesprocessors
,next
andprevious
. -
ElevenLabsTTSService
now supports additional runtime changes to themodel
,language
, andvoice_settings
parameters. -
Added
apply_text_normalization
support toElevenLabsTTSService
andElevenLabsHttpTTSService
. -
Added
MistralLLMService
, using Mistral's chat completion API. -
Added the ability to retry executing a chat completion after a timeout period for
OpenAILLMService
and its subclasses,AnthropicLLMService
, andAWSBedrockLLMService
. The LLM services accept new args:retry_timeout_secs
andretry_on_timeout
. This feature is disabled by default.
Changed
- Updated
daily-python
to 0.19.7.
Deprecated
FrameProcessor.wait_for_task()
is deprecated. Useawait task
orawait asyncio.wait_for(task, timeout)
instead.
Removed
-
Watchdog timers have been removed. They were introduced in 0.0.72 to help diagnose pipeline freezes. Unfortunately, they proved ineffective since they required developers to use Pipecat-specific queues, iterators, and events to correctly reset the timer, which limited their usefulness and added friction.
-
Removed unused
FrameProcessor.set_parent()
andFrameProcessor.get_parent()
.
Fixed
-
Fixed an issue that would cause
PipelineRunner
andPipelineTask
to not handle external asyncio task cancellation properly. -
Added
SpeechmaticsSTTService
exception handling on connection and sending. -
Replaced
asyncio.wait_for()
forwait_for2.wait_for()
for Python < 3.12. because of issues regarding task cancellation (i.e. cancellation is never propagated).
See https://bugs.python.org/issue42130 -
Fixed an
AudioBufferProcessor
issues that would cause audio overlap when setting a max buffer size. -
Fixed an issue where
AsyncAITTSService
had very high latency in responding by addingforce=true
when sending the flush command.
Performance
-
Improve
PipelineTask
performance by using direct mode processors and by removing unnecessary tasks. -
Improve
ParallelPipeline
performance by using direct mode, by not creating a task for each frame and every sub-pipeline and also by removing other unnecessary tasks. -
Pipeline
performance improvements by using direct mode.
Other
-
Added
14w-function-calling-mistal.py
usingMistralLLMService
. -
Added
13j-azure-transcription.py
usingAzureSTTService
.
v0.0.80
Added
-
Added
GeminiTTSService
which uses Google Gemini to generate TTS output. The Gemini model can be prompted to insert styled speech to control the TTS output. -
Added Exotel support to Pipecat's development runner. You can now connect using the runner with
uv run bot.py -t exotel
and an ngrok connection to HTTP port 7860. -
Added
enable_direct_mode
argument toFrameProcessor
. The direct mode is for processors which require very little I/O or compute resources, that is processors that can perform their task almost immediately. These type of processors don't need any of the internal tasks and queues usually created by frame processors which means overall application performance might be slightly increased. Use with care. -
Added TTFB metrics for
HeyGenVideoService
andTavusVideoService
. -
Added
endpoint_id
parameter toAzureSTTService
. (Custom EndpointId)
Changed
-
WatchdogPriorityQueue
now requires the items to be inserted to always be tuples and the size of the tuple needs to be specified in the constructor when creating the queue with thetuple_size
argument. -
Updated Moondream to revision
2025-01-09
. -
Updated
PlayHTHttpTTSService
to no longer use thepyht
client to remove compatibility issues with other packages. Now you can use the PlayHT HTTP service with other services, like GoogleLLMService. -
Updated
pyproject.toml
to once again pinnumba
to>=0.61.2
in order to resolve package versioning issues. -
Updated the
STTMuteFilter
to includeVADUserStartedSpeakingFrame
andVADUserStoppedSpeakingFrame
in the list of frames to filter when the filtering is on.
Performance
-
Improving the latency of the
HeyGenVideoService
. -
Improved some frame processors performance by using the new frame processor direct mode. In direct mode a frame processor will process frames right away avoiding the need for internal queues and tasks. This is useful for some simple processors. For example, in processors that wrap other processors (e.g.
Pipeline
,ParallelPipeline
), we add one processor before and one after the wrapped processors (internally, you will see them as sources and sinks). These sources and sinks don't do any special processing and they basically forward frames. So, for these simple processors we now enable the new direct mode which avoids creating any internal tasks (and queues) and therefore improves performance.
Fixed
-
Fixed an issue with the
BaseWhisperSTTService
where the language was specified as an enum and not a string. -
Fixed an issue where
SmallWebRTCTransport
ended before TTS finished. -
Fixed an issue in
OpenAIRealtimeBetaLLMService
where specifying atext
modalities
didn't result in text being outputted from the model. -
Added SSML reserved character escaping to
AzureBaseTTSService
to properly handle special characters in text sent to Azure TTS. This fixes an issue where characters like&
,<
,>
,"
, and'
in LLM-generated text would cause TTS failures. -
Fixed a
WatchdogPriorityQueue
issue that could cause an exception when compating watchdog cancel sentinel items with other items in the queue. -
Fixed an issue that would cause system frames to not be processed with higher priority than other frames. This could cause slower interruption times.
-
Fixed an issue where retrying a websocket connection error would result in an error.
Other
-
Add foundation example
19b-openai-realtime-beta-text.py
, showing how to useOpenAIRealtimeBetaLLMService
to output text to a TTS service. -
Add vision support to release evals so we can run the foundational examples 12 series.
-
Added foundational example
15a-switch-languages.py
to release evals. It is able to detect if we switched the language properly. -
Updated foundational examples to show how to enclose complex logic (e.g.
ParallelPipeline
) into a single processor so the main pipeline becomes simpler. -
Added
07n-interruptible-gemini.py
, demonstrating how to useGeminiTTSService
.
v0.0.79
v0.0.78
Added
-
Added
enable_emulated_vad_interruptions
toLLMUserAggregatorParams
. When user speech is emulated (e.g. when a transcription is received but VAD doesn't detect speech), this parameter controls whether the emulated speech can interrupt the bot. Default is False (emulated speech is ignored while the bot is speaking). -
Added new
handle_sigint
andhandle_sigterm
toRunnerArguments
. This allows applications to know what settings they should use for the environment they are running on. Also, addedpipeline_idle_timeout_secs
to be able to control thePipelineTask
idle timeout. -
Added
processor
field toErrorFrame
to indicateFrameProcessor
that generated the error. -
Added new language support for
AWSTranscribeSTTService
. All languages supporting streaming data input are now supported: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html -
Added support for Simli Trinity Avatars. A new
is_trinity_avatar
parameter has been introduced to specify whether the providedfaceId
corresponds to a Trinity avatar, which is required for optimal Trinity avatar performance. -
The development runner how handles custom
body
data forDailyTransport
. Thebody
data is passed to the Pipecat client. You can POST to the/start
endpoint with a request body of:{ "createDailyRoom": true, "dailyRoomProperties": { "start_video_off": true }, "body": { "custom_data": "value" } }
The
body
information is parsed and used in the application. ThedailyRoomProperties
are currently not handled. -
Added detailed latency logging to
UserBotLatencyLogObserver
, capturing average response time between user stop and bot start, as well as minimum and maximum response latency. -
Added Chinese, Japanese, Korean word timestamp support to
CartesiaTTSService
. -
Added
region
parameter toGladiaSTTService
. Accepted values: eu-west (default), us-west.
Changed
-
System frames are now queued. Before, system frames could be generated from any task and would not guarantee any order which was causing undesired behavior. Also, it was possible to get into some rare recursion issues because of the way system frames were executed (they were executed in-place, meaning calling
push_frame()
would finish after the system frame traversed all the pipeline). This makes system frames more deterministic. -
Changed the default model for both
ElevenLabsTTSService
andElevenLabsHttpTTSService
toeleven_turbo_v2_5
. The rationale for this change is that the Turbo v2.5 model exhibits the most stable voice quality along with very low latency TTFB; latencies are on par with the Flash v2.5 model. Also, the Turbo v2.5 model outputs word/timestamp alignment data with correct spacing. -
The development runners
/connect
and/start
endpoint now both returndailyRoom
anddailyToken
in place of the previousroom_url
andtoken
. -
Updated the
pipecat.runner.daily
utility to only a takeDAILY_API_URL
andDAILY_SAMPLE_ROOM_URL
environment variables instead of argparsing-u
and-k
, respectively. -
Updated
daily-python
to 0.19.6. -
Changed
TavusVideoService
to send audio or video frames only after the transport is ready, preventing warning messages at startup. -
The development runner now strips any provided protocol (e.g. https://) from the proxy address and issues a warning. It also strips trailing
/
.
Deprecated
-
In the
pipecat.runner.daily
, theconfigure_with_args()
function is deprecated. Use theconfigure()
function instead. -
The development runner's
/connect
endpoint is deprecated and will be removed in a future version. Use the/start
endpoint in its place. In the meantime, both endpoints work and deliver equivalent functionality.
Fixed
-
Fixed a
DailyTransport
issue that would result in an unhandledconcurrent.futures.CancelledError
when a future is cancelled. -
Fixed a
RivaSTTService
issue that would result in an unhandledconcurrent.futures.CancelledError
when a future is cancelled when reading from the audio chunks from the incoming audio stream. -
Fixed an issue in the
BaseOutputTransport
, mainly reproducible withFastAPIWebsocketOutputTransport
when the audio mixer was enabled, where the loop could consume 100% CPU by continuously returning without delay, preventing other asyncio tasks (such as cancellation or shutdown signals) from being processed. -
Fixed an issue where
BotStartedSpeakingFrame
andBotStoppedSpeakingFrame
were not emitted when usingTavusVideoService
orHeyGenVideoService
. -
Fixed an issue in
LiveKitTransport
where emptyAudioRawFrame
s were pushed down the pipeline. This resulted in warnings by the STT processor. -
Fixed
PiperTTSService
to send text as a JSON object in the request body,
resolving compatibility with Piper's HTTP API. -
Fixed an issue with the
TavusVideoService
where an error was thrown due to missing transcription callbacks. -
Fixed an issue in
SpeechmaticsSTTService
where theuser_id
was set toNone
when diarization is not enabled.
Performance
- Fixed an issue in
TaskObserver
(a proxy to all observers) that was degrading global performance.
Other
- Added
07aa-interruptible-soniox.py
,07ab-interruptible-inworld-http.py
,07ac-interruptible-asyncai.py
and07ac-interruptible-asyncai-http.py
release evals.
v0.0.77
Added
-
Added
InputTextRawFrame
frame type to handle user text input with Gemini Multimodal Live. -
Added
HeyGenVideoService
. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/) -
Added the ability to switch voices to
RimeTTSService
. -
Added unified development runner for building voice AI bots across multiple transports
pipecat.runner.run
– FastAPI-based development server with automatic bot discoverypipecat.runner.types
– Runner session argument types (DailyRunnerArguments
,SmallWebRTCRunnerArguments
,WebSocketRunnerArguments
)pipecat.runner.utils.create_transport()
– Factory function for creating transports from session argumentspipecat.runner.daily
andpipecat.runner.livekit
– Configuration utilities for Daily and LiveKit setups- Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo
- Automatic telephony provider detection and serializer configuration
- ESP32 WebRTC compatibility with SDP munging
- Environment detection (
ENV=local
) for conditional features
-
Added Async.ai TTS integration (https://async.ai/)
AsyncAITTSService
– WebSocket-based streaming TTS with interruption supportAsyncAIHttpTTSService
– HTTP-based streaming TTS service- Example scripts:
examples/foundational/07ac-interruptible-asyncai.py
(WebSocket demo)examples/foundational/07ac-interruptible-asyncai-http.py
(HTTP demo)
-
Added
transcription_bucket
params support to theDailyRESTHelper
. -
Added a new TTS service,
InworldTTSService
. This service provides low-latency, high-quality speech generation using Inworld's streaming API. -
Added a new field
handle_sigterm
toPipelineRunner
. It defaults toFalse
. This field handles SIGTERM signals. Thehandle_sigint
field still defaults toTrue
, but now it handles only SIGINT signals. -
Added foundational example
14u-function-calling-ollama.py
for Ollama function calling. -
Added
LocalSmartTurnAnalyzerV2
, which supports local on-device inference with the newsmart-turn-v2
turn detection model. -
Added
set_log_level
toDailyTransport
, allowing setting the logging level for Daily's internal logging system. -
Added
on_transcription_stopped
andon_transcription_error
to Daily callbacks.
Changed
-
Changed the default
url
forNeuphonicTTSService
towss://api.neuphonic.com
as it provides better global performance. You can set the URL to other URLs, such as the previous default:wss://eu-west-1.api.neuphonic.com
. -
Update
daily-python
to 0.19.5. -
STTMuteFilter
now pushes theSTTMuteFrame
upstream and downstream, to allow for more flexibleSTTMuteFilter
placement. -
Play delayed messages from
ElevenLabsTTSService
if they still belong to the current context. -
Dependency compatibility improvements: Relaxed version constraints for core dependencies to support broader version ranges while maintaining stability:
aiohttp
,Markdown
,nltk
,numpy
,Pillow
,pydantic
,openai
,numba
: Now support up to the next major version (e.g.numpy>=1.26.4,<3
)pyht
: Relaxed to>=0.1.6
to resolvegrpcio
conflicts withnvidia-riva-client
fastapi
: Updated to support versions>=0.115.6,<0.117.0
torch
/torchaudio
: Changed from exact pinning (==2.5.0
) to compatible range (~=2.5.0
)aws_sdk_bedrock_runtime
: Added Python 3.12+ constraint via environment markernumba
: Reduced minimum version to0.60.0
for better compatibility
-
Changed
NeuphonicHttpTTSService
to use a POST based request instead of thepyneuphonic
package. This removes a package requirement, allowing Neuphonic to work with more services. -
Updated
ElevenLabsTTSService
to handle the case whereallow_interruptions=False
. Now, when interruptions are disabled, the same context ID will be used throughout the conversation. -
Updated the
deepgram
optional dependency to 4.7.0, which downgrades thetasks cancelled error
to a debug log. This removes the log from appearing in Pipecat logs upon leaving. -
Upgraded the
websockets
implementation to the new asyncio implementation. Along with this change, we're updating support for versions >=13.1.0 and <15.0.0. All services have been update to use the asyncio implementation. -
Updated
MiniMaxHttpTTSService
with abase_url
arg where you can specify the Global endpoint (default) or Mainland China. -
Replaced regex-based sentence detection in
match_endofsentence
with NLTK's punkt_tab tokenizer for more reliable sentence boundary detection. -
Changed the
livekit
optional dependency fortenacity
totenacity>=8.2.3,<10.0.0
in order to support thegoogle-genai
package. -
For
LmntTTSService
, changed the defaultmodel
toblizzard
, LMNT's recommended model. -
Updated
SpeechmaticsSTTService
:- Added support for additional diarization options.
- Added foundational example
07a-interruptible-speechmatics-vad.py
, which
uses VAD detection provided bySpeechmaticsSTTService
.
Fixed
-
Fixed a
LLMUserResponseAggregator
issue where interruptions were not being handled properly. -
Fixed
PiperTTSService
to work with newer Piper GPL. -
Fixed a race condition in
FastAPIWebsocketClient
that occurred when attempting to send a message while the client was disconnecting. -
Fixed an issue in
GoogleLLMService
where interruptions did not work when an interruption strategy was used. -
Fixed an issue in the
TranscriptProcessor
where newline characters could cause the transcript output to be corrupted (e.g. missing all spaces). -
Fixed an issue in
AudioBufferProcessor
when usingSmallWebRTCTransport
where, if the microphone was muted, track timing was not respected. -
Fixed an error that occurs when pushing an
LLMMessagesFrame
. Only some LLM services, like Grok, are impacted by this issue. The fix is to remove the optionalname
property that was being added to the message. -
Fixed an issue in
AudioBufferProcessor
that caused garbled audio whenenable_turn_audio
was enabled and audio resampling was required. -
Fixed a dependency issue for uv users where an
llvmlite
version required python 3.9. -
Fixed an issue in
MiniMaxHttpTTSService
where thepitch
param was the incorrect type. -
Fixed an issue with OpenTelemetry tracing where the
enable_tracing
flag did not disable the internal tracing decorator functions. -
Fixed an issue in
OLLamaLLMService
where kwargs were not passed correctly to the parent class. -
Fixed an issue in
ElevenLabsTTSService
where the word/timestamp pairs were calculating word boundaries incorrectly. -
Fixed an issue where, in some edge cases, the
EmulateUserStartedSpeakingFrame
could be created even if we didn't have a transcription. -
Fixed an issue in
GoogleLLMContext
where it would inject thesystem_message
as a "user" message into cases where it was not meant to; it was only meant to do that when there were no "regular" (non-function-call) messages in the context, to ensure that inference would run properly. -
Fixed an issue in
LiveKitTransport
where theon_audio_track_subscribed
was never emitted.
Other
-
Added new quickstart demos:
- examples/quickstart: voice AI bot quickstart
- examples/client-server-web: client/server starter example
- examples/phone-bot-twilio: twilio starter example
-
Removed most of the examples from the pipecat repo. Examples can now be found in: https://github.com/pipecat-ai/pipecat-examples.
v0.0.76
Added
- Added
SpeechControlParamsFrame
, a newSystemFrame
that notifies downstream processors of the VAD and Turn analyzer params. This frame is pushed by theBaseInputTransport
at Start and any time aVADParamsUpdateFrame
is received.
Changed
- Two package dependencies have been updated:
numpy
now supports 1.26.0 and newertransformers
now supports 4.48.0 and newer
Fixed
-
Fixed an issue with RTVI's handling of
append-to-context
. -
Fixed an issue where using audio input with a sample rate requiring resampling could result in empty audio being passed to STT services, causing errors.
-
Fixed the VAD analyzer to process the full audio buffer as long as it contains more than the minimum required bytes per iteration, instead of only analyzing the first chunk.
-
Fixed an issue in ParallelPipeline that caused errors when attempting to drain the queues.
-
Fixed an issue with emulated VAD timeout inconsistency in
LLMUserContextAggregator
. Previously, emulated VAD scenarios (where transcription is received without VAD detection) used a hardcodedaggregation_timeout
(default 0.5s) instead of matching the VAD'sstop_secs
parameter (default 0.8s). This created different user experiences between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts automatically synchronize with the VAD'sstop_secs
parameter. -
Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the user started early, while the bot was still working through
trigger_assistant_response()
.
v0.0.75 [YANKED]
This release has been yanked due to resampling issues affecting audio output quality and critical bugs impacting ParallelPipelines
functionality.
Please upgrade to version 0.0.76 or later.
Added
-
Added an
aggregate_sentences
arg inCartesiaTTSService
,ElevenLabsTTSService
,NeuphonicTTSService
andRimeTTSService
, where the default value is True. Whenaggregate_sentences
is True, theTTSService
aggregates the LLM streamed tokens into sentences by default. Note: setting the value to False requires a custom processor before theTTSService
to aggregate LLM tokens. -
Added
kwargs
to theOLLamaLLMService
to allow for configuration args to be passed to Ollama. -
Added call hang-up error handling in
TwilioFrameSerializer
, which handles the case where the user has hung up before theTwilioFrameSerializer
hangs up the call.
Changed
-
Updated
RTVIObserver
andRTVIProcessor
to match the new RTVI 1.0.0 protocol.
This includes:- Deprecating support for all messages related to service configuaration and actions.
- Adding support for obtaining and logging data about client, including its RTVI version and optionally included system information (OS/browser/etc.)
- Adding support for handling the new
client-message
RTVI message through either aon_client_message
event handler or listening for a newRTVIClientMessageFrame
- Adding support for responding to a
client-message
with aserver-response
via either a direct call on theRTVIProcessor
or via pushing a newRTVIServerResponseFrame
- Adding built-in support for handling the new
append-to-context
RTVI message which allows a client to add to the user or assistant llm context. No extra code is required for supporting this behavior. - Updating all JavaScript and React client RTVI examples to use versions 1.0.0 of the clients.
Get started migrating to RTVI protocol 1.0.0 by following the migration guide:
https://docs.pipecat.ai/client/migration-guide -
Refactored
AWSBedrockLLMService
andAWSPollyTTSService
to work asynchronously usingaioboto3
instead of theboto3
library. -
The
UserIdleProcessor
now handles the scenario where function calls take longer than the idle timeout duration. This allows you to use theUserIdleProcessor
in conjunction with function calls that take a while to return a result.
Fixed
-
Updated the
NeuphonicTTSService
to work with the updated websocket API. -
Fixed an issue with
RivaSTTService
where the watchdog feature was causing an error on initialization.
Performance
- Remove unncessary push task in each
FrameProcessor
.
v0.0.74 [YANKED]
This release has been yanked due to resampling issues affecting audio output
quality and critical bugs impacting ParallelPipelines
functionality.
Please upgrade to version 0.0.76 or later.
Added
-
Added a new STT service,
SpeechmaticsSTTService
. This service provides real-time speech-to-text transcription using the Speechmatics API. It supports partial and final transcriptions, multiple languages, various audio formats, and speaker diarization. -
Added
normalize
andmodel_id
toFishAudioTTSService
. -
Added
http_options
argument toGoogleLLMService
. -
Added
run_llm
field toLLMMessagesAppendFrame
andLLMMessagesUpdateFrame
frames. If true, a context frame will be pushed triggering the LLM to respond. -
Added a new
SOXRStreamAudioResampler
for processing audio in chunks or streams. If you write your own processor and need to use an audio resampler, use the newcreate_stream_resampler()
. -
Added new
DailyParams.audio_in_user_tracks
to allow receiving one track per user (default) or a single track from the room (all participants mixed). -
Added support for providing "direct" functions, which don't need an accompanying
FunctionSchema
or function definition dict. Instead, metadata (i.e.name
,description
,properties
, andrequired
) are automatically extracted from a combination of the function signature and docstring.Usage:
# "Direct" function # `params` must be the first parameter async def do_something(params: FunctionCallParams, foo: int, bar: str = ""): """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. """ result = await process(foo, bar) await params.result_callback({"result": result}) # ... llm.register_direct_function(do_something) # ... tools = ToolsSchema(standard_tools=[do_something])
-
user_id
is now populated in theTranscriptionFrame
andInterimTranscriptionFrame
when using a transport that provides auser_id
, likeDailyTransport
orLiveKitTransport
. -
Added
watchdog_coroutine()
. This is a watchdog helper for couroutines. So, if you have a coroutine that is waiting for a result and that takes a long time, you will need to wrap it withwatchdog_coroutine()
so the watchdog timers are reset regularly. -
Added
session_token
parameter toAWSNovaSonicLLMService
. -
Added Gemini Multimodal Live File API for uploading, fetching, listing, and deleting files. See
26f-gemini-multimodal-live-files-api.py
for example usage.
Changed
-
Updated all the services to use the new
SOXRStreamAudioResampler
, ensuring smooth transitions and eliminating clicks. -
Upgraded
daily-python
to 0.19.4. -
Updated
google
optional dependency to usegoogle-genai
version1.24.0
.
Fixed
-
Fixed an issue where audio would get stuck in the queue when an interrupt occurs during Azure TTS synthesis.
-
Fixed a race condition that occurs in Python 3.10+ where the task could miss the
CancelledError
and continue running indefinitely, freezing the pipeline. -
Fixed a
AWSNovaSonicLLMService
issue introduced in 0.0.72.
Deprecated
- In
FishAudioTTSService
, deprecatedmodel
and replaced withreference_id
. This change is to better align with Fish Audio's variable naming and to reduce confusion about what functionality the variable controls.