rtvoice
rtvoice is a Python library for building real-time voice agents powered by the OpenAI Realtime API. It handles the full session lifecycle — microphone input, WebSocket streaming, turn detection, tool calling, and audio playback — so you can focus on what your agent does, not how it talks.
import asyncio
from rtvoice import RealtimeAgent
async def main():
agent = RealtimeAgent(instructions="You are Jarvis, a helpful assistant.")
await agent.run()
asyncio.run(main())
Features
- One-class API —
RealtimeAgentmanages the full voice loop out of the box - Tool calling — register async functions with
@tools.action(...)in seconds - Subagents — delegate complex tasks to an LLM-driven sub-agent with automatic handoff
- MCP integration — connect any Model Context Protocol server via
MCPServerStdio - Listener hooks — receive transcripts, speaking state, and errors through
AgentListener - VAD options — semantic (default) or energy-based voice-activity detection
- Inactivity timeout — automatically stop the session after a configurable silence window
- Session recording — optionally save the full audio session to disk
Installation
For microphone and speaker support (requires PortAudio):
Set your OpenAI API key before running:
Or pass it directly:
Next steps
- Quickstart — a minimal working agent in 10 lines
- Tools guide — register functions the model can call
- Subagent guide — delegate complex tasks to a sub-agent
- MCP guide — connect external tool servers
- Listener guide — hook into session events for UI integration