| Location | Issue | Solution |
|---|---|---|
api/generate calls |
12+ places calling Ollama directly | Use llm_client.py everywhere |
_call_vision_model |
Duplicated in motion_diff.py, tracking.py | Extract to llm_client |
| TTS code | Was in live_narrator.py + voice.py | β
Fixed β tts.py |
| env file updates | Was in cli.py + setup.py | β
Fixed β setup_utils.py |
| βNo significantβ filtering | In 5+ components | Centralize in response validator |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β quick_cli.py β
β (121KB - main CLI) β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
βΌ βΌ βΌ
βββββββββββββ βββββββββββββββββ ββββββββββββββ
β core.py β β components/* β β config.py β
β (flow) β β (12 modules) β β β
βββββββββββββ βββββββββ¬ββββββββ ββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββ ββββββββββββββ βββββββββββββββββββ
β llm_client.py β β tts.py β β image_optimize.pyβ
β (centralized) β β (unified) β β β
βββββββββββββββββ ββββββββββββββ βββββββββββββββββββ
Status: llm_client.py created, needs adoption
Tasks:
motion_diff.py β use llm_client.vision_query()tracking.py β use llm_client.vision_query()stream.py β use llm_client.vision_query()smart_monitor.py β use llm_client.vision_query()live_narrator.py β β
Done for _describe_frame_advancedmedia.py β use llm_client.vision_query()Goal: Donβt log βnothing happenedβ responses
Create response_filter.py:
def is_significant_response(response: str, mode: str = "general") -> bool:
"""Check if LLM response contains meaningful information."""
noise_patterns = [
"no significant changes",
"no movement detected",
"no person visible",
"nothing to report",
"VISIBLE: NO",
"PRESENT: NO",
"CHANGED: NO",
]
response_lower = response.lower()
return not any(p in response_lower for p in noise_patterns)
def filter_for_logging(response: str) -> Optional[str]:
"""Return response only if significant, else None."""
if is_significant_response(response):
return response
return None
Create api_server.py:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI(title="Streamware API")
class AnalyzeRequest(BaseModel):
image_url: str
prompt: str
model: str = "llava:7b"
@app.post("/api/v1/analyze")
async def analyze_image(req: AnalyzeRequest):
from .llm_client import vision_query
result = vision_query(req.image_url, req.prompt)
return {"result": result}
@app.post("/api/v1/live/start")
async def start_live(source: str, mode: str = "track", focus: str = "person"):
"""Start live narrator session"""
...
@app.get("/api/v1/live/{session_id}/status")
async def get_live_status(session_id: str):
...
@app.post("/api/v1/speak")
async def speak(text: str, engine: str = "auto"):
from .tts import speak
success = speak(text)
return {"success": success}
Create llm_agent.py:
class StreamwareAgent:
"""Agent that can be controlled by LLM."""
SYSTEM_PROMPT = '''You are a Streamware automation agent.
Available commands:
- analyze_image(path, prompt) - Analyze image with vision LLM
- start_watch(url, focus) - Start watching camera
- speak(text) - Speak via TTS
- query_network() - Scan network for devices
Respond with JSON: {"action": "...", "params": {...}}
'''
def execute(self, llm_response: dict):
action = llm_response.get("action")
params = llm_response.get("params", {})
if action == "analyze_image":
return vision_query(**params)
elif action == "start_watch":
return self._start_watch(**params)
elif action == "speak":
return speak(**params)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Layer β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββββββ β
β β REST API β β CLI (sq) β β Python DSL (Pipeline) β β
β β (FastAPI) β β (quick_cli) β β (dsl.py) β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ βββββββββββββ¬ββββββββββββββ β
βββββββββββΌβββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Core Service Layer β
β βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ β
β β llm_client.py β β response_ β β session_manager β β
β β (all LLM) β β filter.py β β (live sessions) β β
β βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Component Layer β
β ββββββββββββ ββββββββββββββ ββββββββββββ βββββββββββββββββββ β
β β stream β β tracking β β media β β live_narrator β β
β ββββββββββββ ββββββββββββββ ββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Infrastructure Layer β
β βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ β
β β tts.py β β setup_utils.py β β image_optimize β β
β β (voice) β β (install) β β (preprocessing) β β
β βββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User β quick_cli.py β LiveNarratorComponent
β
βββ ffmpeg (capture)
βββ FrameAnalyzer (motion)
βββ requests.post(ollama) β DUPLICATE
βββ tts.speak()
User β quick_cli.py β LiveNarratorComponent
β
βββ ffmpeg (capture)
βββ FrameAnalyzer (motion)
βββ llm_client.analyze_image()
β β
β βββ response_filter.is_significant()
β β
β βββ True: log + tts
β βββ False: skip
βββ tts.speak() (only if significant)
response_filter.py - filter noise from LLM responses--guarder flag - enable LLM validation via CLI--lite flag - reduce memory usagellm_client.py (partial)Small LLM that validates vision model responses before logging:
# Enable via CLI
sq live narrator --url "rtsp://..." --guarder
# Or via .env
SQ_USE_GUARDER=true
SQ_GUARDER_MODEL=qwen2.5:3b
Recommended models (3B, fast):
| Model | Size | Speed | Quality |
|ββ-|ββ|ββ-|βββ|
| qwen2.5:3b | 2GB | Fast | Best |
| phi3:mini | 2.3GB | Fast | Good |
| gemma2:2b | 1.6GB | Fastest | OK |
| llama3.2:3b | 2GB | Fast | Good |
Flow:
Vision LLM (llava:7b) β Response β Guarder (qwen2.5:3b) β YES/NO β Log/Skip
Auto-install at startup:
sq live narrator --url "rtsp://..."
β οΈ Model qwen2.5:3b not found. Install with: ollama pull qwen2.5:3b
Guarder model 'qwen2.5:3b' is needed for smart response filtering.
Recommended models (small, fast):
1. qwen2.5:3b - Best quality (2GB)
2. gemma2:2b - Fastest (1.6GB)
3. phi3:mini - Good balance (2.3GB)
4. Skip - Use regex filtering only
Install model? [1-4, default=1]: 1
Pulling qwen2.5:3b... (this may take a few minutes)
β
qwen2.5:3b installed successfully
streamware/
βββ cli.py # Main CLI (streamware command)
βββ quick_cli.py # sq command (121KB)
βββ core.py # Flow engine
βββ config.py # Configuration management
βββ dsl.py # Python DSL (Pipeline)
β
βββ llm_client.py # Centralized LLM client β
βββ tts.py # Unified TTS module β
βββ response_filter.py # Smart response filtering β
βββ image_optimize.py # Image preprocessing
βββ setup_utils.py # Cross-platform setup β
β
βββ prompts/ # External prompt templates
β βββ __init__.py
β βββ stream_diff.txt
β βββ live_narrator_*.txt
β βββ ...
β
βββ components/ # Core components
βββ live_narrator.py
βββ stream.py
βββ tracking.py
βββ motion_diff.py
βββ smart_monitor.py
βββ ...