Skip to main content

Overview

SambaNovaSTTService provides speech-to-text capabilities using SambaNova’s hosted Whisper API with Voice Activity Detection (VAD) for optimized processing. It efficiently processes speech segments to deliver accurate transcription with SambaNova’s high-performance inference platform.

Installation

To use SambaNova services, install the required dependency:
pip install "pipecat-ai[sambanova]"

Prerequisites

SambaNova Account Setup

Before using SambaNova STT services, you need:
  1. SambaNova Account: Sign up at SambaNova Cloud
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure access to Whisper transcription models

Required Environment Variables

  • SAMBANOVA_API_KEY: Your SambaNova API key for authentication

Configuration

SambaNovaSTTService

model
str
default:"Whisper-Large-v3"
Whisper model to use for transcription.
api_key
str
default:"None"
SambaNova API key. Falls back to the SAMBANOVA_API_KEY environment variable.
base_url
str
default:"https://api.sambanova.ai/v1"
API base URL.
language
Language
default:"Language.EN"
Language of the audio input.
prompt
str
default:"None"
Optional text to guide the model’s style or continue a previous segment.
temperature
float
default:"None"
Sampling temperature between 0 and 1. Lower values produce more deterministic results.
push_empty_transcripts
bool
default:"False"
If true, allow empty TranscriptionFrame frames to be pushed downstream instead of discarding them. This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.

Usage

Basic Setup

from pipecat.services.sambanova import SambaNovaSTTService

stt = SambaNovaSTTService(
    api_key=os.getenv("SAMBANOVA_API_KEY"),
)

With Custom Configuration

from pipecat.services.sambanova import SambaNovaSTTService
from pipecat.transcriptions.language import Language

stt = SambaNovaSTTService(
    api_key=os.getenv("SAMBANOVA_API_KEY"),
    model="Whisper-Large-v3",
    language=Language.ES,
    prompt="Transcribe the following conversation about technology.",
    temperature=0.0,
)

Notes

  • Segmented transcription: SambaNovaSTTService extends SegmentedSTTService (via BaseWhisperSTTService), processing complete audio segments after VAD detects the user has stopped speaking.
  • Whisper API compatible: Uses the OpenAI-compatible Whisper API interface hosted on SambaNova’s infrastructure.
  • Probability metrics not supported: SambaNova’s Whisper API does not support probability metrics. The include_prob_metrics parameter has no effect.