Live Narrator to komponent Streamware do analizy strumieni wideo w czasie rzeczywistym z wykorzystaniem AI (LLM).
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
β RTSP ββββββΆβ FastCapture ββββββΆβ SmartDetect ββββββΆβ Vision LLM β
β Stream β β (FFmpeg/CV) β β (HOG+Motion)β β (moondream) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
β RAM Disk β β Cache β β Guarder LLM β
β /dev/shm/ β β (images) β β (gemma:2b) β
ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
β
βΌ
ββββββββββββββββ
β TTS β
β (pyttsx3) β
ββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LIVE NARRATOR PIPELINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. CAPTURE STAGE
ββββββββββββ ββββββββββββββββ ββββββββββββββββ
β RTSP βββββΆβ FastCapture βββββΆβ RAM Disk β
β Stream β β (OpenCV/FFmpeg)β β /dev/shm/ β
ββββββββββββ ββββββββββββββββ ββββββββββββββββ
β
βΌ
2. DETECTION STAGE
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Frame βββββΆβ Motion DetectβββββΆβ HOG Person β
β Buffer β β (diff %) β β Detection β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β
βΌ
3. TRACKING STAGE (NEW)
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Motion βββββΆβ Object βββββΆβ Tracked β
β Regions β β Tracker β β Objects β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β β β
β βββββββ΄ββββββ β
β β IoU Match β β
β β ID Assign β β
β β Direction β β
β βββββββββββββ β
βΌ βΌ
4. ANALYSIS STAGE
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Movement βββββΆβ Vision LLM βββββΆβ Description β
β Context β β (moondream) β β (verbose) β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β
βΌ
5. FILTER STAGE
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Verbose βββββΆβ Guarder LLM βββββΆβ Short β
β Description β β (gemma:2b) β β Summary β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β
βΌ
6. OUTPUT STAGE
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Filtered βββββΆβ TTS βββββΆβ Log β
β Response β β (pyttsx3) β β (CSV/TXT) β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
fast_capture.py)Zoptymalizowany moduΕ przechwytywania klatek z RTSP.
Cechy:
/dev/shm/streamware dla szybkiego zapisuWydajnoΕΔ: | Przed | Po | |ββ-|ββ| | ~4000ms/klatkΔ | 0ms (z bufora) |
smart_detector.py)Inteligentna detekcja obiektΓ³w z YOLO i fallback na HOG.
Pipeline:
Frame β Motion Detection β YOLO Detection β [fallback] HOG β [opcjonalnie] Small LLM
β β β β
<0.5% change? Auto-installed No YOLO? Not vision model?
β β β β
SKIP Fast & Accurate Use HOG ASSUME PRESENT
YOLO Detection (NEW - domyΕlnie wΕΔ czone):
ultralytics instalowane przy pierwszym uΕΌyciuKluczowe optymalizacje:
PorΓ³wnanie detektorΓ³w: | Detektor | Czas | DokΕadnoΕΔ | Wymaga GPU | |βββ-|ββ|ββββ|ββββ| | YOLO (yolov8n) | ~10ms | β β β β β | Nie (szybszy z) | | HOG (OpenCV) | ~100ms | β β β | Nie | | Small LLM | ~500ms | β β β β | Nie |
moondream)GΕΓ³wny model do analizy obrazu.
WybΓ³r modelu: | Model | Czas | JakoΕΔ | RAM | |ββ-|ββ|βββ|ββ| | moondream | ~1.5s | β β β | 2GB | | llava:7b | ~2-3s | β β β β | 4GB | | llava:13b | ~4-5s | β β β β β | 8GB |
Prompt optymalizacje:
gemma:2b)Filtr i sumaryzator odpowiedzi tekstowych.
Funkcje:
UWAGA: gemma:2b NIE jest modelem wizyjnym - nie uΕΌywaΔ do analizy obrazΓ³w!
Prompt dla guardera:
Summarize in max 8 words. Focus: person.
Input: [verbose LLM response]
Output format: "Person: [what they're doing]" or "No person visible"
object_tracker.py) πModuΕ Εledzenia wielu obiektΓ³w miΔdzy klatkami.
Architektura:
Motion Regions β Extract Detections β IoU Association β Track Objects
β
βββββββββββββββ΄ββββββββββββββ
β Tracked Object #1 β
β - ID: 1 β
β - Position: (0.3, 0.5) β
β - Direction: moving_right β
β - State: tracked β
β - History: [...] β
βββββββββββββββββββββββββββββ
Cechy:
Kierunki ruchu:
| Direction | Opis |
|ββββ|ββ|
| entering | Obiekt pojawiΕ siΔ w kadrze |
| exiting | Obiekt wychodzi z kadru |
| moving_left | Ruch w lewo |
| moving_right | Ruch w prawo |
| approaching | ZbliΕΌa siΔ do kamery |
| leaving | Oddala siΔ od kamery |
| stationary | Brak ruchu |
Stany obiektu:
| State | Opis |
|ββ-|ββ|
| new | WΕaΕnie pojawiΕ siΔ |
| tracked | Aktywnie Εledzony |
| lost | Tymczasowo zgubiony (max 5 klatek) |
| gone | OpuΕciΕ kadr |
PrzykΕad wyjΕcia:
2 objects tracked. #1: Person moving right in center_middle. #2: Person stationary in left_bottom. Person #3 left.
DescriptionCache (pamiΔΔ RAM):
Frame Cache (ramdisk /dev/shm/streamware):
/dev/shm/streamware
moondream
gemma:2b
animal_detector.py)
# Instalowane automatycznie przy pierwszym uruchomieniu
ollama pull moondream # Vision model (~1.7GB)
ollama pull gemma:2b # Guarder model (~1.7GB)
# Modele
SQ_MODEL=moondream
SQ_GUARDER_MODEL=gemma:2b
# Stream
SQ_STREAM_MODE=track
SQ_STREAM_FOCUS=person
SQ_STREAM_INTERVAL=3
# Optymalizacje
SQ_FAST_CAPTURE=true
SQ_RAMDISK_ENABLED=true
SQ_RAMDISK_PATH=/dev/shm/streamware
sq live narrator --url "rtsp://user:pass@ip:554/stream" --mode track --focus person --tts
sq live narrator --url "rtsp://..." --mode track --focus person --tts --verbose
sq live narrator --url "rtsp://..." --file report.html --frames-dir ./frames
capture: ~0ms (FastCapture buffer)
smart_detect: ~300ms (HOG + motion)
vision_llm: ~1500ms (moondream)
guarder_llm: ~250ms (gemma:2b)
βββββββββββββββββββββββββββββββββ
Total: ~2s/frame
Throughput: ~0.5 FPS
| Etap | Przed | Po | Poprawa | |ββ|ββ-|ββ|βββ| | capture | 4000ms | 0ms | 100% | | vision_llm | 4000ms | 1500ms | 62% | | guarder_llm | 2700ms | 250ms | 91% | | Total | 10s | 2s | 80% |
llm_no_person mimo osoby na obraziePrzyczyna: Guarder model (gemma:2b) nie jest wizyjny RozwiΔ zanie: Zaktualizuj do najnowszej wersji - naprawione automatycznie
[Action]. [Direction/Position]Przyczyna: Stary cache z przykΕadami z promptu
RozwiΔ
zanie: rm -f /dev/shm/streamware/*.jpg
Przyczyna: Fallback do subprocess FFmpeg
RozwiΔ
zanie: SprawdΕΊ czy FastCapture dziaΕa: SQ_FAST_CAPTURE=true