Real-time voice chat with Streamware using WebSocket and browser audio.
# Start voice shell server
sq voice-shell
# Or with custom port
sq voice-shell --port 9000
# Open in browser
# http://localhost:8766
voice_input, command_parsed, command_executed, tts_speak, etc.+------------------------------------------+
| π€ Streamware Voice Shell |
| [β] Connected [β] Voice Ready |
+------------------------------------------+
| | |
| π₯οΈ Shell Output | π€ Voice Control |
| | |
| > detect person | [π€] |
| β
Start person | Click to talk |
| detection... | |
| $ sq watch ... | [____________] |
| π― Watch: detect | Type command |
| | |
| | [β Yes] [β No] |
| | [βΉ Stop] |
| | |
| | πΉ URL: rtsp://... |
| | π§ Email: (not set)|
+------------------------------------------+
"detect person"
"track cars for 10 minutes"
"count people and email me"
"yes" / "execute" / "okay" β Confirm command
"no" / "cancel" β Cancel command
"stop" β Stop running process
"help" β Show help
"context" β Show current settings
βββββββββββββββββββ WebSocket ββββββββββββββββββββ
β Browser UI βββββββββββββββββββββΊβ VoiceShellServerβ
β β β β
β βββββββββββββ β β ββββββββββββββ β
β β Web Speechβ β voice_input β β LLMShell β β
β β API ββββΌββββββββββββββββββββΊβ β β β
β β (STT) β β β β ββββββββ β β
β βββββββββββββ β β β β LLM β β β
β β command_parsed β β βParserβ β β
β βββββββββββββ ββββββββββββββββββββββΌβββ€ ββββββββ β β
β β Web Speechβ β β β β β
β β API β β tts_speak β β ββββββββ β β
β β (TTS) ββββΌβββββββββββββββββββββΌβββ€ βExecutβ β β
β βββββββββββββ β β β β or β β β
β β command_output β β ββββββββ β β
β βββββββββββββ ββββββββββββββββββββββΌβββ€ β β
β β Output β β β ββββββββββββββ β
β β Panel β β β β
β βββββββββββββ β β ββββββββββββββ β
β β β β EventStore β β
βββββββββββββββββββ β β (Events) β β
β ββββββββββββββ β
ββββββββββββββββββββ
| Event | Direction | Description |
|---|---|---|
voice_input |
ClientβServer | Voice transcription from browser |
text_input |
ClientβServer | Text input from form |
confirm |
ClientβServer | Confirm pending command |
cancel |
ClientβServer | Cancel pending command |
stop |
ClientβServer | Stop running process |
command_parsed |
ServerβClient | LLM parsing result |
command_executed |
ServerβClient | Command started |
command_output |
ServerβClient | Command stdout line |
command_error |
ServerβClient | Error occurred |
command_completed |
ServerβClient | Command finished |
tts_speak |
ServerβClient | Text for TTS |
context_updated |
ServerβClient | Session context |
# Default video source
SQ_DEFAULT_URL=rtsp://admin:pass@192.168.1.100:554/stream
# LLM settings
SQ_OLLAMA_URL=http://localhost:11434
SQ_MODEL=llama3.2
sq voice-shell --help
--host HOST Host to bind (default: 0.0.0.0)
--port PORT WebSocket port (default: 8765)
--model MODEL LLM model (default: llama3.2)
| Browser | STT | TTS | WebSocket |
|---|---|---|---|
| Chrome | β | β | β |
| Edge | β | β | β |
| Safari | β | β | β |
| Firefox | β | β | β |
Note: Firefox doesnβt support Web Speech API for STT. Use text input instead.
# Start server
$ sq voice-shell
π€ Voice Shell Server starting...
WebSocket: ws://0.0.0.0:8765
HTTP UI: http://localhost:8766
Model: llama3.2
β
Server running. Open http://localhost:8766 in browser
Press Ctrl+C to stop
# In browser:
[User clicks π€ and says: "detect person and email me when found"]
> detect person and email me when found
β
Start person detection, send email notification
Command: sq watch --detect person --email user@example.com --notify-mode instant
π Start person detection, send email notification. Say yes to execute.
[User says: "yes"]
$ sq watch --url rtsp://... --detect person --email user@example.com
π― Watch: detect person
π§ Email: user@example.com
sq voice-shell --port 9000ollama serveollama list