Skip to main content

Overview

GroqSTTService provides high-accuracy speech recognition using Groq’s hosted Whisper API with ultra-fast inference speeds. It uses Voice Activity Detection (VAD) to process speech segments efficiently for optimal performance and accuracy.

Installation

To use Groq services, install the required dependency:
pip install "pipecat-ai[groq]"

Prerequisites

Groq Account Setup

Before using Groq STT services, you need:
  1. Groq Account: Sign up at Groq Console
  2. API Key: Generate an API key from your console dashboard
  3. Model Access: Ensure access to Whisper transcription models

Required Environment Variables

  • GROQ_API_KEY: Your Groq API key for authentication

Configuration

model
str
default:"whisper-large-v3-turbo"
Whisper model to use for transcription.
api_key
str
default:"None"
Groq API key. If not provided, uses GROQ_API_KEY environment variable.
base_url
str
default:"https://api.groq.com/openai/v1"
API base URL. Override for custom or proxied deployments.
language
Language
default:"Language.EN"
Language of the audio input.
prompt
str
default:"None"
Optional text to guide the model’s style or continue a previous segment.
temperature
float
default:"None"
Sampling temperature between 0 and 1. Lower values are more deterministic. Defaults to 0.0.
ttfs_p99_latency
float
default:"GROQ_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.
push_empty_transcripts
bool
default:"False"
If true, allow empty TranscriptionFrame frames to be pushed downstream instead of discarding them. This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.

Usage

Basic Setup

from pipecat.services.groq import GroqSTTService

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
)

With Custom Model and Language

from pipecat.services.groq import GroqSTTService
from pipecat.transcriptions.language import Language

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    model="whisper-large-v3-turbo",
    language=Language.ES,
)

With Prompt and Temperature

from pipecat.services.groq import GroqSTTService

stt = GroqSTTService(
    api_key=os.getenv("GROQ_API_KEY"),
    prompt="This is a conversation about artificial intelligence and machine learning.",
    temperature=0.0,
)

Notes

  • Segmented processing: GroqSTTService inherits from SegmentedSTTService (via BaseWhisperSTTService), which buffers audio during speech (detected by VAD) and sends complete segments for transcription. This means it does not provide interim results — only final transcriptions after each speech segment.
  • Whisper API compatible: Groq uses the OpenAI-compatible Whisper API format. The service sends audio in WAV format and receives JSON transcription responses.
  • Ultra-fast inference: Groq’s LPU (Language Processing Unit) infrastructure provides significantly faster inference than CPU/GPU-based Whisper deployments, making it suitable for real-time applications despite the segmented processing approach.
  • Prompt guidance: Use the prompt parameter to provide context that helps the model with domain-specific terminology or to maintain consistency across segments.