Guide to optimizing StreamWare for maximum performance.
StreamWare is optimized for real-time video analysis. This guide covers timing analysis, GPU acceleration, and performance tuning.
sq live narrator --url "rtsp://..." --turbo
Enables:
sq live narrator --url "rtsp://..." --dsl-only --realtime --fps 20
sq live narrator --url "rtsp://..." --realtime --fps 5
All performance parameters are now configurable through environment variables:
# YOLO Detection Sensitivity
SQ_YOLO_CONFIDENCE_THRESHOLD=0.15 # Lower = more detections, slower
SQ_YOLO_CONFIDENCE_THRESHOLD_HIGH=0.3 # Higher = fewer false positives
# Motion Detection Sensitivity
SQ_MOTION_DIFF_THRESHOLD=25 # Lower = more motion detected
SQ_MOTION_CONTOUR_MIN_AREA=100 # Higher = ignore small motion
# Frame Processing
SQ_FRAME_SCALE=0.5 # Process at 50% resolution
SQ_IMAGE_PRESET=fast # Smaller images for LLM
SQ_MAX_CONCURRENT_LLM=2 # Limit parallel LLM calls
SQ_MAX_FRAME_QUEUE=10 # Limit frame buffer size
# RAM Disk Configuration
SQ_RAMDISK_ENABLED=true # Use RAM for temporary files
SQ_RAMDISK_SIZE_MB=512 # Allocate 512MB RAM disk
SQ_RAMDISK_PATH=/dev/shm/streamware # Custom RAM disk path
Performance Impact:
Timing logs are automatically generated in real-time mode:
sq live narrator --url "rtsp://..." --realtime --turbo
# Creates: dsl_timing_*.csv
frame_num,timestamp,total_ms,capture_ms,grayscale_ms,blur_ms,diff_ms,threshold_ms,contours_ms,tracking_ms,thumbnail_ms,blobs,motion_pct
1,16:43:24.463,18.3,1.7,0.2,0.3,13.0,0.8,0.2,0.0,0.7,0,100.00
2,16:43:26.467,10.8,1.7,0.2,0.2,5.5,0.7,0.1,0.0,0.6,2,0.94
⏱️ F1: 18ms total | 0 blobs | 100.0% motion | blur:0ms | capture:2ms | diff:13ms
⏱️ F2: 11ms total | 2 blobs | 0.9% motion | blur:0ms | capture:2ms | diff:6ms
Fine-tune LLM timeouts for your hardware and network conditions:
# Fast Hardware (GPU + Fast Network)
SQ_ANALYZE_TIMEOUT=5 # Quick LLM analysis
SQ_SUMMARIZE_TIMEOUT=10 # Faster summarization
SQ_GUARDER_TIMEOUT=3 # Quick guarder checks
# Standard Hardware (CPU + Average Network)
SQ_ANALYZE_TIMEOUT=8 # Balanced timeouts
SQ_SUMMARIZE_TIMEOUT=15 # Standard summarization
SQ_GUARDER_TIMEOUT=5 # Standard guarder checks
# Slow Hardware (Old CPU + Slow Network)
SQ_ANALYZE_TIMEOUT=15 # Patient timeouts
SQ_SUMMARIZE_TIMEOUT=30 # Long summarization
SQ_GUARDER_TIMEOUT=10 # Patient guarder checks
Timeout Strategy:
Monitoring Timeouts:
# Check timeout performance
sq live narrator --url "rtsp://..." --verbose --log-format yaml
# Look for "timeout" in logs
| Step | Time | % Total |
|---|---|---|
| diff (background subtraction) | 5-15ms | 50-70% |
| capture | 1-2ms | 10-20% |
| threshold | 0.5-1ms | 5-10% |
| thumbnail | 0.5-1ms | 5-10% |
| blur/grayscale | 0.2-0.5ms | 2-5% |
| Model | Typical | Notes |
|---|---|---|
| moondream | 300-800ms | Fastest |
| llava:7b | 800-2000ms | Good quality |
| llava:13b | 2000-5000ms | Best quality |
| Component | GPU | Notes |
|---|---|---|
| YOLO detection | ✅ | CUDA acceleration |
| Ollama LLM | ✅ | GPU inference |
| OpenCV DSL | ❌ | CPU only |
| FastCapture | ⚠️ | Optional NVDEC |
# Monitor GPU
watch -n 1 nvidia-smi
# Check Ollama GPU
curl http://localhost:11434/api/tags | jq
export SQ_CAPTURE_BACKEND=opencv # Uses CUDA if available
| Variable | Default | Description |
|---|---|---|
SQ_FAST_CAPTURE |
true | Use FastCapture |
SQ_RAMDISK_PATH |
/dev/shm/streamware | Frame storage |
SQ_CAPTURE_BACKEND |
auto | opencv/ffmpeg |
SQ_GUARDER_MODEL |
gemma:2b | Filter model |
# High performance
export SQ_FAST_CAPTURE=true
export SQ_RAMDISK_PATH=/dev/shm/streamware
# Memory optimization
export SQ_LITE_MODE=true
Average: 10-15ms per frame
Effective FPS: 65-100 FPS (theoretical)
Bottleneck: Background subtraction (diff)
moondream: ~500ms (2 FPS)
llava:7b: ~1200ms (0.8 FPS)
llava:13b: ~3000ms (0.3 FPS)
DSL streaming: 5-20 FPS (smooth)
LLM analysis: 0.3-1 FPS (background)
--turbo mode--dsl-only when AI not needed--fps value--lite modebuffer_size=3Related: