Skip to content

Audio

rtvoice ships ready-to-use implementations for microphone input and speaker output. For custom hardware or special requirements (e.g. file playback, virtual devices) you can implement the abstract base classes directly.

Quickstart

from rtvoice.audio import MicrophoneInput, SpeakerOutput

agent = RealtimeAgent(
    audio_input=MicrophoneInput(sample_rate=24000),
    audio_output=SpeakerOutput(sample_rate=24000),
)

Built-in devices

Bases: AudioInputDevice

Default microphone input powered by PyAudio.

Streams raw 16-bit PCM audio from the system microphone in a non-blocking loop. Requires the pyaudio package — install it with pip install rtvoice[audio].

Example
mic = MicrophoneInput(sample_rate=24000)
agent = RealtimeAgent(audio_input=mic)

Parameters:

Name Type Description Default
device_index int | None

PyAudio device index. Defaults to the system default input device.

None
sample_rate int

Sample rate in Hz. Must match the model's expected rate (24 000 Hz).

24000
chunk_size int

Number of frames per read. Smaller values reduce latency.

4800

Bases: AudioOutputDevice

Default speaker output powered by PyAudio.

Plays raw 16-bit PCM audio through the system speaker using a dedicated background thread, keeping the async event loop unblocked. Requires the pyaudio package — install it with pip install rtvoice[audio].

Example
speaker = SpeakerOutput(sample_rate=24000)
agent = RealtimeAgent(audio_output=speaker)

Parameters:

Name Type Description Default
device_index int | None

PyAudio device index. Defaults to the system default output device.

None
sample_rate int

Sample rate in Hz. Must match the model's output rate (24 000 Hz).

24000

Custom devices

Both built-in classes implement an abstract base interface. Use these if you need to bring your own audio source or sink.

Bases: ABC

Abstract base class for audio input devices.

Implement this interface to provide a custom microphone or audio source. The default implementation is MicrophoneInput.

is_active abstractmethod property

is_active: bool

Whether the device is currently capturing audio.

start abstractmethod async

start() -> None

Open the device and begin audio capture.

stop abstractmethod async

stop() -> None

Stop audio capture and release all device resources.

stream_chunks abstractmethod

stream_chunks() -> AsyncIterator[bytes]

Yield raw 16-bit PCM audio chunks as they become available.

Bases: ABC

Abstract base class for audio output devices.

Implement this interface to provide a custom speaker or audio sink. The default implementation is SpeakerOutput.

is_playing abstractmethod property

is_playing: bool

Whether audio is currently being played or queued.

clear_buffer abstractmethod async

clear_buffer() -> None

Discard all queued audio to stop playback immediately.

play_chunk abstractmethod async

play_chunk(chunk: bytes) -> None

Enqueue a raw 16-bit PCM audio chunk for playback.

start abstractmethod async

start() -> None

Open the device and prepare it for playback.

stop abstractmethod async

stop() -> None

Stop playback and release all device resources.