rtvoice

rtvoice is a Python library for building real-time voice agents powered by the OpenAI Realtime API. It handles the full session lifecycle — microphone input, WebSocket streaming, turn detection, tool calling, and audio playback — so you can focus on what your agent does, not how it talks.

import asyncio
from rtvoice import RealtimeAgent

async def main():
    agent = RealtimeAgent(instructions="You are Jarvis, a helpful assistant.")
    await agent.run()

asyncio.run(main())

Features

One-class API — RealtimeAgent manages the full voice loop out of the box
Tool calling — register async functions with @tools.action(...) in seconds
Subagents — delegate complex tasks to an LLM-driven sub-agent with automatic handoff
MCP integration — connect any Model Context Protocol server via MCPServerStdio
Listener hooks — receive transcripts, speaking state, and errors through AgentListener
VAD options — semantic (default) or energy-based voice-activity detection
Inactivity timeout — automatically stop the session after a configurable silence window
Session recording — optionally save the full audio session to disk

Installation

pip install rtvoice

For microphone and speaker support (requires PortAudio):

pip install rtvoice[audio]

Set your OpenAI API key before running:

export OPENAI_API_KEY=sk-...

Or pass it directly:

agent = RealtimeAgent(api_key="sk-...")

Next steps

Quickstart — a minimal working agent in 10 lines
Tools guide — register functions the model can call
Subagent guide — delegate complex tasks to a sub-agent
MCP guide — connect external tool servers
Listener guide — hook into session events for UI integration