streamware

Streamware Architecture

πŸ“Š Current State Analysis

Identified Duplications

Location Issue Solution
api/generate calls 12+ places calling Ollama directly Use llm_client.py everywhere
_call_vision_model Duplicated in motion_diff.py, tracking.py Extract to llm_client
TTS code Was in live_narrator.py + voice.py βœ… Fixed β†’ tts.py
env file updates Was in cli.py + setup.py βœ… Fixed β†’ setup_utils.py
β€œNo significant” filtering In 5+ components Centralize in response validator

Module Dependencies (Current)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        quick_cli.py                          β”‚
β”‚                       (121KB - main CLI)                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό              β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  core.py  β”‚  β”‚ components/*  β”‚  β”‚  config.py β”‚
β”‚  (flow)   β”‚  β”‚ (12 modules)  β”‚  β”‚            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό              β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ llm_client.py β”‚ β”‚   tts.py   β”‚ β”‚ image_optimize.pyβ”‚
β”‚ (centralized) β”‚ β”‚ (unified)  β”‚ β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🎯 Refactoring Plan

Phase 1: Consolidate LLM Calls βœ… (Partial)

Status: llm_client.py created, needs adoption

Tasks:

  1. Update motion_diff.py β†’ use llm_client.vision_query()
  2. Update tracking.py β†’ use llm_client.vision_query()
  3. Update stream.py β†’ use llm_client.vision_query()
  4. Update smart_monitor.py β†’ use llm_client.vision_query()
  5. Update live_narrator.py β†’ βœ… Done for _describe_frame_advanced
  6. Update media.py β†’ use llm_client.vision_query()

Phase 2: Response Validation / Filtering

Goal: Don’t log β€œnothing happened” responses

Create response_filter.py:

def is_significant_response(response: str, mode: str = "general") -> bool:
    """Check if LLM response contains meaningful information."""
    noise_patterns = [
        "no significant changes",
        "no movement detected", 
        "no person visible",
        "nothing to report",
        "VISIBLE: NO",
        "PRESENT: NO",
        "CHANGED: NO",
    ]
    response_lower = response.lower()
    return not any(p in response_lower for p in noise_patterns)

def filter_for_logging(response: str) -> Optional[str]:
    """Return response only if significant, else None."""
    if is_significant_response(response):
        return response
    return None

Phase 3: REST API Server

Create api_server.py:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI(title="Streamware API")

class AnalyzeRequest(BaseModel):
    image_url: str
    prompt: str
    model: str = "llava:7b"

@app.post("/api/v1/analyze")
async def analyze_image(req: AnalyzeRequest):
    from .llm_client import vision_query
    result = vision_query(req.image_url, req.prompt)
    return {"result": result}

@app.post("/api/v1/live/start")
async def start_live(source: str, mode: str = "track", focus: str = "person"):
    """Start live narrator session"""
    ...

@app.get("/api/v1/live/{session_id}/status")
async def get_live_status(session_id: str):
    ...

@app.post("/api/v1/speak")
async def speak(text: str, engine: str = "auto"):
    from .tts import speak
    success = speak(text)
    return {"success": success}

Phase 4: LLM-Driven Client

Create llm_agent.py:

class StreamwareAgent:
    """Agent that can be controlled by LLM."""
    
    SYSTEM_PROMPT = '''You are a Streamware automation agent.
    Available commands:
    - analyze_image(path, prompt) - Analyze image with vision LLM
    - start_watch(url, focus) - Start watching camera
    - speak(text) - Speak via TTS
    - query_network() - Scan network for devices
    
    Respond with JSON: {"action": "...", "params": {...}}
    '''
    
    def execute(self, llm_response: dict):
        action = llm_response.get("action")
        params = llm_response.get("params", {})
        
        if action == "analyze_image":
            return vision_query(**params)
        elif action == "start_watch":
            return self._start_watch(**params)
        elif action == "speak":
            return speak(**params)

πŸ“ Target Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         API Layer                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ REST API    β”‚  β”‚ CLI (sq)    β”‚  β”‚ Python DSL (Pipeline)  β”‚ β”‚
β”‚  β”‚ (FastAPI)   β”‚  β”‚ (quick_cli) β”‚  β”‚ (dsl.py)               β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                β”‚                     β”‚
          β–Ό                β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Core Service Layer                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ llm_client.py β”‚  β”‚ response_      β”‚  β”‚ session_manager  β”‚   β”‚
β”‚  β”‚ (all LLM)     β”‚  β”‚ filter.py      β”‚  β”‚ (live sessions)  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                β”‚                     β”‚
          β–Ό                β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Component Layer                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ stream   β”‚ β”‚ tracking   β”‚ β”‚ media    β”‚ β”‚ live_narrator   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                β”‚                     β”‚
          β–Ό                β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Infrastructure Layer                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ tts.py        β”‚  β”‚ setup_utils.py β”‚  β”‚ image_optimize   β”‚   β”‚
β”‚  β”‚ (voice)       β”‚  β”‚ (install)      β”‚  β”‚ (preprocessing)  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ Communication Flow

Current Flow (sq live –tts)

User β†’ quick_cli.py β†’ LiveNarratorComponent
                              β”‚
                              β”œβ”€β†’ ffmpeg (capture)
                              β”œβ”€β†’ FrameAnalyzer (motion)
                              β”œβ”€β†’ requests.post(ollama) ← DUPLICATE
                              └─→ tts.speak()

Target Flow

User β†’ quick_cli.py β†’ LiveNarratorComponent
                              β”‚
                              β”œβ”€β†’ ffmpeg (capture)
                              β”œβ”€β†’ FrameAnalyzer (motion)
                              β”œβ”€β†’ llm_client.analyze_image()
                              β”‚         β”‚
                              β”‚         └─→ response_filter.is_significant()
                              β”‚                    β”‚
                              β”‚                    β”œβ”€β†’ True: log + tts
                              β”‚                    └─→ False: skip
                              └─→ tts.speak() (only if significant)

βœ… Completed

  1. βœ… response_filter.py - filter noise from LLM responses
  2. βœ… Guarder model - small LLM (qwen2.5:3b) validates responses before logging
  3. βœ… Significance check in live_narrator - skip β€œno change” responses
  4. βœ… --guarder flag - enable LLM validation via CLI
  5. βœ… --lite flag - reduce memory usage

πŸ”„ Next Steps

  1. Migrate all Ollama calls to llm_client.py (partial)
  2. Create REST API with FastAPI (optional)
  3. Create LLM Agent for natural language control

πŸ›‘οΈ Guarder Model

Small LLM that validates vision model responses before logging:

# Enable via CLI
sq live narrator --url "rtsp://..." --guarder

# Or via .env
SQ_USE_GUARDER=true
SQ_GUARDER_MODEL=qwen2.5:3b

Recommended models (3B, fast): | Model | Size | Speed | Quality | |β€”β€”-|β€”β€”|β€”β€”-|β€”β€”β€”| | qwen2.5:3b | 2GB | Fast | Best | | phi3:mini | 2.3GB | Fast | Good | | gemma2:2b | 1.6GB | Fastest | OK | | llama3.2:3b | 2GB | Fast | Good |

Flow:

Vision LLM (llava:7b) β†’ Response β†’ Guarder (qwen2.5:3b) β†’ YES/NO β†’ Log/Skip

Auto-install at startup:

sq live narrator --url "rtsp://..."

⚠️  Model qwen2.5:3b not found. Install with: ollama pull qwen2.5:3b

   Guarder model 'qwen2.5:3b' is needed for smart response filtering.
   Recommended models (small, fast):
   1. qwen2.5:3b  - Best quality (2GB)
   2. gemma2:2b   - Fastest (1.6GB)
   3. phi3:mini   - Good balance (2.3GB)
   4. Skip        - Use regex filtering only

   Install model? [1-4, default=1]: 1

   Pulling qwen2.5:3b... (this may take a few minutes)
   βœ… qwen2.5:3b installed successfully

πŸ“ Module Structure

streamware/
β”œβ”€β”€ cli.py              # Main CLI (streamware command)
β”œβ”€β”€ quick_cli.py        # sq command (121KB)
β”œβ”€β”€ core.py             # Flow engine
β”œβ”€β”€ config.py           # Configuration management
β”œβ”€β”€ dsl.py              # Python DSL (Pipeline)
β”‚
β”œβ”€β”€ llm_client.py       # Centralized LLM client βœ…
β”œβ”€β”€ tts.py              # Unified TTS module βœ…
β”œβ”€β”€ response_filter.py  # Smart response filtering βœ…
β”œβ”€β”€ image_optimize.py   # Image preprocessing
β”œβ”€β”€ setup_utils.py      # Cross-platform setup βœ…
β”‚
β”œβ”€β”€ prompts/            # External prompt templates
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ stream_diff.txt
β”‚   β”œβ”€β”€ live_narrator_*.txt
β”‚   └── ...
β”‚
└── components/         # Core components
    β”œβ”€β”€ live_narrator.py
    β”œβ”€β”€ stream.py
    β”œβ”€β”€ tracking.py
    β”œβ”€β”€ motion_diff.py
    β”œβ”€β”€ smart_monitor.py
    └── ...