Quickstart
Get a working voice agent running in under a minute.
Prerequisites
- Python 3.13+
OPENAI_API_KEYset in your environment- PortAudio installed (required for
pyaudio)
Minimal example
import asyncio
from rtvoice import RealtimeAgent
async def main():
agent = RealtimeAgent(
instructions="You are Jarvis, a concise and helpful voice assistant.",
)
await agent.run()
asyncio.run(main())
Run it, speak into your microphone, and the agent will respond through your speakers.
Press Ctrl+C to stop the session.
What happens when you call run()
prepare()is called automatically — MCP servers connect, subagents warm up.- A WebSocket session opens to the OpenAI Realtime API.
- The microphone starts streaming audio to the API.
- The API detects when you finish speaking (semantic VAD by default) and generates a response.
- Audio is streamed back and played through the speaker in real time.
- The session runs until you call
stop(), an inactivity timeout fires, or the process is interrupted.
Adding your first tool
import asyncio
from typing import Annotated
from rtvoice import RealtimeAgent, Tools
tools = Tools()
@tools.action("Get the current time in a given city")
async def get_time(city: Annotated[str, "The city name"]) -> str:
return f"It's 14:32 in {city}." # replace with real logic
async def main():
agent = RealtimeAgent(
instructions="You are a helpful assistant. Answer time questions with get_time.",
tools=tools,
)
await agent.run()
asyncio.run(main())
See the Tools guide for the full decorator API including long-running tools and auto-injected parameters.
Printing transcripts
import asyncio
from rtvoice import RealtimeAgent, AgentListener
class ConsolePrinter(AgentListener):
async def on_user_transcript(self, transcript: str) -> None:
print(f"You: {transcript}")
async def on_assistant_transcript(self, transcript: str) -> None:
print(f"Assistant: {transcript}")
async def main():
agent = RealtimeAgent(
instructions="You are a concise voice assistant.",
listener=ConsolePrinter(),
)
await agent.run()
asyncio.run(main())
See the Listener guide for all available callbacks.
Auto-stop after silence
agent = RealtimeAgent(
instructions="...",
inactivity_timeout_enabled=True,
inactivity_timeout_seconds=30,
)
The session ends automatically after 30 seconds without the user speaking. Useful for kiosk or embedded applications.
Next steps
- Tools — register functions the model can call
- Subagents — delegate complex tasks to an LLM-driven sub-agent
- MCP Servers — connect stdio-based tool servers
- Listener — react to session lifecycle events
- API Reference — full parameter list for
RealtimeAgent