Skip to content

First generative video

The generative track is the spot for building original video. It flows from plan to scenes, images, and video, and each stage stops at an approval gate. If you use the remix track, go to first-remix-video.

Say the video you’ll make in natural language.

make a POV short of a robot character wrecking its morning routine

Signals like “original,” “character,” “new video,” “POV” enter the generative track (scene-flow-orchestrator). Attaching a URL goes to remix, so for generative, state only the requirement with no URL.

The orchestrator makes the plan. It picks one of five structures (problem_twist_solution, listicle, story_arc, curiosity_gap, transformation) and fills the slots. It’s the spot that uses a validated skeleton rather than a free scenario. It lands in plan.md.

The plan gets split into scenes. In scenes.json, each scene is planted with one of hook, beat, cta. This type splits the image tone, subtitle style, and cut length. When there’s a recurring character, a character sheet is made first, so the same face is kept across every scene.

Codex makes each scene’s image. In preview mode it pulls two candidates each, quickly. Pick the candidate you like, or say “redo the image for this scene” to remake just that scene.

ffmpeg composites the scene images, subtitles, and BGM in one pass into a video draft (draft_v1.mp4). Here you watch the actual video and approve, or roll back to any stage. “Redo from planning,” “just scene 3,” “redo the image” all work.

fast-preview — to the video without human waiting

Section titled “fast-preview — to the video without human waiting”

The default of preview mode is fast-preview. Instead of stopping to approve at each planning gate (A, B-1, B-2), it auto-passes and runs through to a video draft in one go. A human reviews the actual video — not intermediate JSON — once at Gate C.

Normal screen: a one-line log like “Gate A auto-passed (fast-preview)” runs, and after a moment draft_v1.mp4 is made. The gates haven’t disappeared; all outputs remain in _workspace/ and can be rolled back to any stage at Gate C.

To see the gates one at a time, say “I’ll review the gates one by one” or set fast_preview.txt to off. That returns to the per-stage approval flow.

The first video takes about 12–20 minutes in preview mode. From the second episode of a series, the cache is alive and it shortens to under 12 minutes. After final confirmation, the final render runs once more, and final always runs only after an explicit Gate C approval. fast-preview is ignored in final.

Once the video draft is out, go to publish, the spot for making publish metadata and uploading it yourself.