Skip to content

rtvoice

rtvoice is a Python library for building real-time voice agents powered by the OpenAI Realtime API. It handles the full session lifecycle — microphone input, WebSocket streaming, turn detection, tool calling, and audio playback — so you can focus on what your agent does, not how it talks.

import asyncio
from rtvoice import RealtimeAgent

async def main():
    agent = RealtimeAgent(instructions="You are Jarvis, a helpful assistant.")
    await agent.run()

asyncio.run(main())

Features

  • One-class APIRealtimeAgent manages the full voice loop out of the box
  • Tool calling — register async functions with @tools.action(...) in seconds
  • Subagents — delegate complex tasks to an LLM-driven sub-agent with automatic handoff
  • MCP integration — connect any Model Context Protocol server via MCPServerStdio
  • Listener hooks — receive transcripts, speaking state, and errors through AgentListener
  • VAD options — semantic (default) or energy-based voice-activity detection
  • Inactivity timeout — automatically stop the session after a configurable silence window
  • Session recording — optionally save the full audio session to disk

Installation

pip install rtvoice

For microphone and speaker support (requires PortAudio):

pip install rtvoice[audio]

Set your OpenAI API key before running:

export OPENAI_API_KEY=sk-...

Or pass it directly:

agent = RealtimeAgent(api_key="sk-...")

Next steps