Prerequisites
This step is a one-time thing. scene-studio has the most external-tool dependencies of any base, so this step is heavier than others. Since you only install the tools for the track you’ll use, deciding the track first makes it lighter.
Which track will you use
Section titled “Which track will you use”Decide your track first. The tools to install differ.
- Generative only — ffmpeg, OPENAI_API_KEY, codex CLI
- Remix only — ffmpeg, OPENAI_API_KEY, yt-dlp, whisper-cli (optional)
- Both — all of the above
If you can’t decide, install both. The tools for a track you don’t use can be installed later when you start that track.
OS-by-OS differences
Section titled “OS-by-OS differences”| Item | macOS | Windows | Linux |
|---|---|---|---|
| Terminal you use | Terminal.app or iTerm2 | Ubuntu on WSL2 recommended | the shell your distro ships |
| Verified state | verified | install WSL2 first, then run every command inside it | verified |
On Windows, finish the WSL2 install guide first, then come back here.
1) ffmpeg — both tracks
Section titled “1) ffmpeg — both tracks”The core tool for video composition. Needed on both tracks.
# macOSbrew install ffmpeg
# Linux and WSL2sudo apt update && sudo apt install -y ffmpegVerify
Section titled “Verify”ffmpeg -version # a line like ffmpeg version ... means it's fine2) OPENAI_API_KEY — both tracks
Section titled “2) OPENAI_API_KEY — both tracks”Used for Codex image generation and the Whisper API. Don’t plant it in .env; export it in a shell startup file (~/.zshrc, etc.), or keep it in .env while making sure it drops out of git tracking.
export OPENAI_API_KEY="sk-..."Put this line in ~/.zshrc and open a new terminal, and it’s picked up automatically from then on.
3) Generative track — codex CLI
Section titled “3) Generative track — codex CLI”The generative track’s image generation needs the OpenAI Codex CLI. Without it, generative entry is blocked. Follow the official OpenAI Codex CLI docs for install.
codex --version # a version means it's fineREPLICATE_API_TOKEN and FAL_KEY are optional. They’re for preview mode’s fast image backend, so without them it uses only codex, which is a bit slower.
4) Remix track — yt-dlp and whisper
Section titled “4) Remix track — yt-dlp and whisper”The remix track takes and transcribes a source video.
# yt-dlp (source download)brew install yt-dlp # macOS# Linux: pip install yt-dlp or your package manager
# whisper-cli (optional, local transcription)brew install whisper-cpp # macOS, fast on M-seriesWithout whisper-cli, it falls back to the Whisper API via OPENAI_API_KEY. To use a local model, fetch the model file once.
mkdir -p ~/.whisper/modelscurl -L -o ~/.whisper/models/ggml-base.en.bin \ https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin5) Claude Code
Section titled “5) Claude Code”The official install guide is at docs.claude.com/en/docs/claude-code/quickstart.
claude --version # a version number means you're readyYou can use the OpenAI Codex CLI as the entry point instead of Claude Code. In that case the repository root AGENTS.md is the entry point.
6) Fonts are optional
Section titled “6) Fonts are optional”Subtitles use Pretendard for Korean and Inter for English. If not installed, it falls back to sans-serif automatically, so you don’t strictly need them. To keep subtitles clean, install the two fonts on your system.
Once the tools for your track are in place, go to clone-and-install.