streamware

Voice Shell - Browser-based Voice Interface

Real-time voice chat with Streamware using WebSocket and browser audio.

Quick Start

# Start voice shell server
sq voice-shell

# Or with custom port
sq voice-shell --port 9000

# Open in browser
# http://localhost:8766

Features

🎀 Voice Input (Browser STT)

πŸ”Š Voice Output (Browser TTS)

πŸ–₯️ Shell Output Streaming

πŸ“‘ Event-Driven Architecture

Browser Interface

+------------------------------------------+
|        🎀 Streamware Voice Shell         |
|  [●] Connected    [●] Voice Ready        |
+------------------------------------------+
|                    |                     |
|  πŸ–₯️ Shell Output    |  🎀 Voice Control   |
|                    |                     |
|  > detect person   |      [🎀]           |
|  βœ… Start person    |   Click to talk     |
|     detection...   |                     |
|  $ sq watch ...    |  [____________]     |
|  🎯 Watch: detect  |   Type command      |
|                    |                     |
|                    |  [βœ“ Yes] [βœ— No]     |
|                    |  [⏹ Stop]           |
|                    |                     |
|                    |  πŸ“Ή URL: rtsp://... |
|                    |  πŸ“§ Email: (not set)|
+------------------------------------------+

Voice Commands

Detection

"detect person"
"track cars for 10 minutes"
"count people and email me"

Confirmation

"yes" / "execute" / "okay"  β†’ Confirm command
"no" / "cancel"             β†’ Cancel command

Control

"stop"      β†’ Stop running process
"help"      β†’ Show help
"context"   β†’ Show current settings

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     WebSocket      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Browser UI    │◄──────────────────►│  VoiceShellServerβ”‚
β”‚                 β”‚                    β”‚                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Web Speechβ”‚  β”‚   voice_input      β”‚  β”‚  LLMShell  β”‚  β”‚
β”‚  β”‚    API    │──┼───────────────────►│  β”‚            β”‚  β”‚
β”‚  β”‚  (STT)    β”‚  β”‚                    β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                    β”‚  β”‚  β”‚ LLM  β”‚  β”‚  β”‚
β”‚                 β”‚   command_parsed   β”‚  β”‚  β”‚Parserβ”‚  β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  │◄───────────────────┼───  β””β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β”‚ Web Speechβ”‚  β”‚                    β”‚  β”‚            β”‚  β”‚
β”‚  β”‚    API    β”‚  β”‚   tts_speak        β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
β”‚  β”‚  (TTS)    │◄─┼────────────────────┼───  β”‚Executβ”‚  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                    β”‚  β”‚  β”‚ or   β”‚  β”‚  β”‚
β”‚                 β”‚   command_output   β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  │◄───────────────────┼───            β”‚  β”‚
β”‚  β”‚  Output   β”‚  β”‚                    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”‚  Panel    β”‚  β”‚                    β”‚                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚                 β”‚                    β”‚  β”‚ EventStore β”‚  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚  β”‚  (Events)  β”‚  β”‚
                                       β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Event Types

Event Direction Description
voice_input Client→Server Voice transcription from browser
text_input Client→Server Text input from form
confirm Client→Server Confirm pending command
cancel Client→Server Cancel pending command
stop Client→Server Stop running process
command_parsed Server→Client LLM parsing result
command_executed Server→Client Command started
command_output Server→Client Command stdout line
command_error Server→Client Error occurred
command_completed Server→Client Command finished
tts_speak Server→Client Text for TTS
context_updated Server→Client Session context

Configuration

Environment Variables

# Default video source
SQ_DEFAULT_URL=rtsp://admin:pass@192.168.1.100:554/stream

# LLM settings
SQ_OLLAMA_URL=http://localhost:11434
SQ_MODEL=llama3.2

CLI Options

sq voice-shell --help

  --host HOST       Host to bind (default: 0.0.0.0)
  --port PORT       WebSocket port (default: 8765)
  --model MODEL     LLM model (default: llama3.2)

Browser Compatibility

Browser STT TTS WebSocket
Chrome βœ… βœ… βœ…
Edge βœ… βœ… βœ…
Safari βœ… βœ… βœ…
Firefox ❌ βœ… βœ…

Note: Firefox doesn’t support Web Speech API for STT. Use text input instead.

Example Session

# Start server
$ sq voice-shell
🎀 Voice Shell Server starting...
   WebSocket: ws://0.0.0.0:8765
   HTTP UI: http://localhost:8766
   Model: llama3.2

βœ… Server running. Open http://localhost:8766 in browser
   Press Ctrl+C to stop

# In browser:
[User clicks 🎀 and says: "detect person and email me when found"]

> detect person and email me when found
βœ… Start person detection, send email notification
   Command: sq watch --detect person --email user@example.com --notify-mode instant
πŸ”Š Start person detection, send email notification. Say yes to execute.

[User says: "yes"]

$ sq watch --url rtsp://... --detect person --email user@example.com
🎯 Watch: detect person
   πŸ“§ Email: user@example.com

Troubleshooting

Voice not working

WebSocket connection failed

Commands not executing