Streamware includes powerful AI-powered multimedia analysis:
| Resource | Description |
|---|---|
| Examples: Media Processing | Working code examples |
| video_captioning.py | Video analysis demo |
| video_modes_demo.py | Compare all modes |
| Quick CLI Reference | All sq commands |
| Source: media.py | Implementation |
# Install LLaVA for vision (video/images)
ollama pull llava
# Install Whisper for audio transcription
pip install openai-whisper
# Install Bark for text-to-speech
pip install bark
# Describe video
sq media describe_video --file video.mp4 --model llava
# Describe image
sq media describe_image --file photo.jpg --model llava
# Transcribe audio
sq media transcribe --file audio.mp3
# Text to speech
sq media speak --text "Hello World" --output hello.wav
# Auto-detect and caption
sq media caption --file media_file.mp4
Streamware offers 3 different modes for video analysis. See examples/media-processing/ for working code.
| Mode | Description | Best For |
|---|---|---|
full |
Coherent narrative (default) | Summaries, SEO, accessibility |
stream |
Frame-by-frame details | Documentation, training data |
diff |
Track changes between frames | Surveillance, activity tracking |
full (default)Creates a coherent narrative tracking subjects through the video.
sq media describe_video --file video.mp4 --mode full
# Output:
{
"mode": "full",
"description": "The video shows a presenter explaining...",
"num_frames": 8,
"scenes": 8,
"duration": "2:34"
}
streamDetailed frame-by-frame analysis with subjects, objects, actions.
sq media describe_video --file video.mp4 --mode stream
# Output:
{
"mode": "stream",
"frames": [
{"frame": 1, "timestamp": "0:00", "description": "SUBJECTS: Person... OBJECTS: ..."},
{"frame": 2, "timestamp": "0:15", "description": "SUBJECTS: ..."}
]
}
diffTracks changes between frames - what appeared, moved, or disappeared.
sq media describe_video --file video.mp4 --mode diff
# Output:
{
"mode": "diff",
"timeline": [
{"frame": 1, "type": "start", "description": "Empty room..."},
{"frame": 2, "type": "change", "changes": "NEW: Person entered..."}
],
"summary": "Person enters room, sits at desk...",
"significant_changes": 5
}
📚 Full documentation: examples/media-processing/README.md
#!/bin/bash
# Monitor camera with AI
while true; do
# Capture frame
ffmpeg -i rtsp://camera/stream -vframes 1 frame.jpg -y
# Analyze
desc=$(sq media describe_image --file frame.jpg | jq -r '.description')
# Alert on person detection
if echo "$desc" | grep -i "person"; then
sq slack security --message "⚠️ Person detected: $desc"
fi
sleep 5
done
# Describe image
sq media describe_image --file photo.jpg
# Custom prompt
sq media describe_image --file artwork.jpg \
--prompt "Describe the artistic style and techniques used"
# Check if image is appropriate
desc=$(sq media describe_image --file upload.jpg | jq -r '.description')
result=$(echo "$desc" | sq llm "is this appropriate?" --analyze)
if echo "$result" | grep -i "no"; then
echo "Content flagged"
mv upload.jpg quarantine/
fi
# Basic transcription
sq media transcribe --file audio.mp3
# Save to file
sq media transcribe --file interview.mp3 --output transcript.txt
# Specific language
sq media transcribe --file spanish.mp3 --language es
#!/bin/bash
# Complete podcast workflow
# Download
sq get https://podcast.com/episode.mp3 --save episode.mp3
# Transcribe
sq media transcribe --file episode.mp3 --output transcript.txt
# Summarize
cat transcript.txt | sq llm "summarize key points" > summary.txt
# Generate blog post
sq llm "write blog post from: $(cat transcript.txt)" > blog.md
# Generate speech
sq media speak --text "Hello, welcome to our service" --output welcome.wav
# Long text
cat announcement.txt | sq media speak --output announcement.wav
# Analyze music properties
sq media analyze_music --file song.mp3
# Output:
{
"tempo": 120.5,
"duration": 180.0,
"sample_rate": 44100
}
# Describe music mood
sq media analyze_music --file song.mp3 | \
sq llm "describe the mood and style" --analyze
# Create Flask service
cat > media_api.py << 'EOF'
from flask import Flask, request, jsonify
import subprocess
import json
app = Flask(__name__)
@app.route('/analyze/video', methods=['POST'])
def analyze_video():
file = request.files['video']
file.save('temp.mp4')
result = subprocess.run(
['sq', 'media', 'describe_video', '--file', 'temp.mp4'],
capture_output=True, text=True
)
return result.stdout
@app.route('/transcribe', methods=['POST'])
def transcribe():
file = request.files['audio']
file.save('temp.mp3')
result = subprocess.run(
['sq', 'media', 'transcribe', '--file', 'temp.mp3'],
capture_output=True, text=True
)
return result.stdout
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)
EOF
# Install as service (no Docker/systemd needed!)
sq service install --name media-api --command "python media_api.py"
# Start
sq service start --name media-api
# Check status
sq service status --name media-api
# Use API
curl -X POST -F "video=@video.mp4" http://localhost:8080/analyze/video
# Start service
sq service start --name media-api
# Stop service
sq service stop --name media-api
# Restart
sq service restart --name media-api
# Status
sq service status --name media-api
# List all services
sq service list
# Uninstall
sq service uninstall --name media-api
#!/bin/bash
# Complete video analysis
VIDEO="lecture.mp4"
# Visual description
visual=$(sq media describe_video --file "$VIDEO" | jq -r '.description')
# Audio transcription
audio=$(sq media transcribe --file "$VIDEO" | jq -r '.text')
# Combined summary
cat << EOF | sq llm "create comprehensive summary"
Visual Content: $visual
Spoken Content: $audio
EOF
# Transcribe multiple languages
for file in uploads/*.mp3; do
lang=$(detect_language "$file")
sq media transcribe --file "$file" --language "$lang" \
--output "${file%.mp3}.txt"
done
# Complete content processing pipeline
sq media describe_video --file raw.mp4 | \
sq llm "generate social media posts" | \
sq post https://api.social.com/posts
| Model | Type | Provider | Use Case |
|---|---|---|---|
| llava | Vision-Language | Ollama | Video/Image description |
| whisper | Speech Recognition | OpenAI | Audio transcription |
| bark | TTS | Suno | Text-to-speech |
| musicgen | Music Gen | Music generation |
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull LLaVA model
ollama pull llava
pip install openai-whisper
pip install bark
Monitor cameras with AI alerts
Automatically flag inappropriate content
Convert audio to searchable text
Generate captions and audio descriptions
AI-powered summaries and social posts
Analyze and categorize music libraries
AI-Powered Media Analysis with Streamware! 🎬🤖✨