Skip to content

The two tracks

scene-studio’s two tracks differ in input, flow, and philosophy. Generative builds a new video; remix discovers a hook in an existing one. Picking the wrong entry point throws off the whole flow, so trigger routing comes first.

User signalEntry
URL attachment (youtube.com, youtu.be, tiktok.com, instagram.com, vimeo.com, x.com, etc.)Remix
”remix,” “YouTube clip,” “drama splice,” “trim this video,” “cut this part”Remix
”original,” “character,” “new video,” “POV,” “my own character”Generative
Ambiguous (just “make a short-form video”)Stop and ask (Is there a URL? Character/scenario?)

The key is not entering on a guess for ambiguous input. A wrong entry leads to an accident — feeding a copyrighted video into the track with no license gate, or running the generative flow with no source.

The generative track doesn’t write a free scenario. It picks one of five validated structures and fills the slots.

  • problem_twist_solution — problem, twist, solution
  • listicle — list form
  • story_arc — narrative arc
  • curiosity_gap — curiosity hook
  • transformation — before and after

The reason free scenarios are forbidden is that they drift toward AI-smell. Handing AI a scenario with no structure falls into a flat list or a cliché. The 5 structures are validated skeletons, so filling the slots holds the quality.

The scene type becomes a downstream branching key. Each scene has one of hook, beat, cta, and this type splits the image tone, subtitle style, and cut length. This is L2 enforcement, not L1 prompting. The type field in scenes.json creates the branch at the code level.

It flows from planning (plan.md) to scene breakdown (scenes.json), to images (Codex candidates a/b/c), to video (ffmpeg). When there’s a recurring character, the sheet comes before the scene. If scenes.json has characters[], the character-sheet skill makes sheet.png first, and image-director plants it via Codex --reference to keep the same character across every scene.

Remix doesn’t write a new scenario. The source timeline is the answer, and it lays cuts, subtitles, zoom, BGM, and SFX on top of it. The 5 structures don’t apply to remix.

The flow goes from collection (downloading the source with yt-dlp) to analysis (Whisper transcription and hook detection), to auto-editing (ffmpeg auto-cut). source-analyst runs ffprobe, Whisper, and hook LLM evaluation in parallel to find the strongest spot in the source, and remix-editor cuts that spot and converts it to 9:16.

Remix has license-policy L2 planted at every stage. A source with no license label is refused. Fair Use of 30% or less, at least three transformation duties, automatic source attribution, automatic copyright-claim blocking. The only path that bypasses this is the spot where the user explicitly accepts responsibility in license_responsibility.json. The generative track doesn’t take a source, so it has no such gate.

The two tracks differ only in entry point and share several assets: caption-styling (subtitle style), publish-meta (publish metadata format), pipeline-performance (performance policy), global-virality (globalization), approval-gate (approval gates), brief-intake (requirement formatting), publish-copywriter (publish copy). Even running both tracks on one channel, the subtitle look and publish format stay consistent.

The next section is verification. It covers the approval gate and the license gate, the pipeline-performance policies, and the security baseline S1–S8.