gemini-voice-assistant
// Voice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI assistant powered by Google's Gemini models.
$ git log --oneline --stat
stars:1,933
forks:367
updated:March 4, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
namegemini-voice-assistant
descriptionVoice-to-voice AI assistant using Gemini Live API. Speak to the AI and get spoken responses. Use when you want to have natural voice conversations with an AI assistant powered by Google's Gemini models.
metadata[object Object]
Gemini Voice Assistant
A voice-to-voice AI assistant powered by Google's Gemini Live API. Speak to the AI and it responds with natural-sounding voice.
Usage
Text Mode
cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py "Your question or message"
Voice Mode
cd ~/.openclaw/agents/kashif/skills/gemini-assistant && python3 handler.py --audio /path/to/audio.ogg "optional context"
Response Format
The handler returns a JSON response:
{
"message": "[[audio_as_voice]]\nMEDIA:/tmp/gemini_voice_xxx.ogg",
"text": "Text response from Gemini"
}
Configuration
Set your Gemini API key:
export GEMINI_API_KEY="your-api-key-here"
Or create a .env file in the skill directory:
GEMINI_API_KEY=your-api-key-here
Model Options
The default model is gemini-2.5-flash-native-audio-preview-12-2025 for audio support.
To use a different model, edit handler.py:
MODEL = "gemini-2.0-flash-exp" # For text-only
Requirements
google-genai>=1.0.0numpy>=1.24.0soundfile>=0.12.0librosa>=0.10.0(for audio input)- FFmpeg (for audio conversion)
Features
- 🎙️ Voice input/output support
- 💬 Text conversations
- 🔧 Configurable system instructions
- ⚡ Fast responses with Gemini Flash