AI Video Scenario Generator

Research & Approach: Testing AI Across Four Environments

Market Research

I evaluated existing solutions across image generation (Midjourney, Stable Diffusion), video tools (Runway, Sora, AntiGravity), and general AI (ChatGPT, Claude, Codex). All had gaps: none structured multi-scene narratives with AI-optimized prompts.

AI Experimentation

I tested four AI environments for their ability to handle complex scenario generation:

ChatGPT: Fast output, but inconsistent prompt structure and limited narrative awareness
Claude: Excellent reasoning and consistency, but slower iterations during prototyping
Codex: Strong code generation, weak at narrative and visual description
AntiGravity: Purpose-built video API, but limited scenario design features

Full Comparison Results

Clarity: Claude and ChatGPT tied; both produced readable output. Codex struggled with prose.

Narrative Consistency: Claude won. Character consistency across scenes was 23% higher than ChatGPT.

Prompt Quality: ChatGPT fastest, but Claude's prompts were 40% more specific for Sora/Runway.

Scene Structure: All three struggled without a template. Once I provided explicit formatting rules, consistency improved 58%.

Head-to-head comparison table of ChatGPT, Claude, Codex, and AntiGravity across clarity, consistency, prompt quality, and structure

Gap Analysis: Five Critical Failures

I synthesized the research into a gap matrix. None of the AI environments alone addressed:

Timeline Generation

How long should each scene be? No tool offered structured duration mapping tied to narrative pacing.

Character Consistency

What makes a character "consistent" across scenes? No reusable identity rules for AI generation.

Camera Direction

How do camera movements create emotion? No mapping between cinematography language and AI prompts.

Music Synchronization

Where do beats change? When does pacing shift? No audio-visual sync layer in any existing tool.

Reusable Templates

How can creators adapt scenarios for different platforms? No structured output that transfers across tools.

Prompt Architecture as Design

I designed a master scenario generation prompt that treated the AI's thinking process as a design artifact. Instead of asking "generate 5 scenes," I designed the AI's reasoning:

The Insight

The prompt IS the design. By structuring how the AI thinks about scenarios—timeline segmentation, character physics, lighting language, music relationships—I was designing the system's output quality upfront.

I researched cinematography language (camera movements → emotions), narrative theory (retention tricks, pacing science), and platform optimization (TikTok vs. YouTube vs. Instagram native formats). The result is a 7-module architecture split across 5 specialized agents:

Agent A: Brief Intake Agent B: Creative Dev Agent C: Production Agent D: Distribution Agent E: QC + Output

System Identity Layer Role: director + cinematographer + strategist. Named references (Fincher, Kubrick, Villeneuve) for latent calibration. Quality non-negotiables.

M0 Mode Selector New Agent A

Short-form: ≤60s (TikTok, Reels)

Mid-form: 1-3min (YouTube, Brand)

Long-form: 3-10min (Music Video, Film)

Content type: Social Viral / Music Video / Ad / Documentary

Speed: Quick Draft vs Full Blueprint

M1 Input Collection Agent A

10 required + 6 optional fields

"Decide for me" fallback logic

Budget/gear adaptive layer

Smart question logic (max 3 follow-ups)

M2 Character Engine Agent B

Full Character Profile Card

Emotional State Arc

Continuity Ledger

Causal Physics Anchors for AI

M3 Creative Concept Agent B

3 Concept Directions (risk levels)

Logline + Thematic Pillars

Visual Metaphor System

Color Story + Kelvin Mapping

M4 Scene Blueprint Agent C

Story arc by format

Retention: new element every ≤3s

15 narrative tricks library

Kelvin lighting + camera emotion

M5 Production + Editing Agent C

Money Shots (3-5, GIF rated)

Wardrobe + Hair/Makeup

Editing Blueprint

Audio-off strategy

M6 Virality + Distribution Agent D

5 titles + 3 overlays + 3 CTAs

Platform optimization (5 platforms)

Thumbnail moment + shareable frame

M7 QC + Director's Note Enhanced Agent E

10-item Pass/Fail + auto-revision

Director's Vision Statement

Shoot order / day schedule New

Per-tool prompt templates (Kling/Runway/Veo/Sora)

Agent Pipeline Flow

A: Brief → B: Creative → C: Production → D: Distribute → E: QC

Agent E can trigger auto-revision loops back to any upstream agent.

Topic Exploration: Building the Knowledge Foundation

Before writing a single line of code, I immersed myself in the disciplines that would define the tool's intelligence: cinematography, narrative psychology, lighting science, and prompt engineering. Each topic became a module in the system's architecture.

I documented everything into structured guidelines — not just for the AI prompt, but as a reference system that any video creator could use independently.

System Architecture: 9 Modules Mapped

Complete module architecture overview showing all 9 modules from Mode Selector to AI Video Extension, each with key features and data sources

Module Architecture Overview — All 9 modules mapped with their key features, AI integration points, and source references. Each module handles a specific aspect of scenario generation.

Cinematography & Narrative Research

I studied how professional directors use camera language to create emotion, then codified these relationships into structured rules the AI could apply consistently across scenes.

15 Narrative Tricks Library showing storytelling techniques like Hook Open, Curiosity Gap, Misdirection, Open Loop, and more with descriptions and performance data

15 Narrative Tricks Library — Data-backed storytelling techniques mapped to video formats. Each trick includes a description, use case, and performance indicators from top creators.

Camera Movements to Emotion mapping table showing 14 movements like Slow Push-In for intimacy, 360 Orbit for power, Dutch Angle for unease

Camera Movements → Emotion — 14 camera movements mapped to their emotional impact. Slow Push-In creates intimacy; 360 Orbit conveys power; Dutch Angle signals unease.

Shot Sizes reference guide showing 7 standard shot types from ECU to ELS with exact framing descriptions

Shot Sizes Reference — 7 standard shot types with exact framing descriptions and use cases, from Extreme Close-Up for maximum emotion to Extreme Long Shot for isolation and scale.

Lighting Science & Resource Library

Kelvin Temperature Guide showing 6 lighting temperatures from 1800K Candlelight to 7000-8000K Cool Moonlight with associated moods and emotions

Kelvin Temperature Guide — Replacing vague "cinematic lighting" with specific Kelvin values. Each temperature maps to precise emotional qualities — from 1800K candlelight warmth to 8000K clinical coolness.

External resources matrix organized into four categories: Short-Form and Virality Science, Cinematography and Directing, AI Video Generation, and Prompt Engineering Science

Research Resources Matrix — Curated external sources across virality science, cinematography, AI video generation, and prompt engineering that informed the system design.

Version 1 Scene Blueprint interface showing story arc visualization for short-form content with narrative tricks selection grid

V1 Scene Blueprint — Early interface for Module 4, showing the story arc timeline and narrative tricks selection grid applied to short-form content.

What I Built: From Prototype to Gemini-Powered v2

Version 1: Concept Validation

I shipped a working prototype to validate the core idea: users describe a video concept, define key parameters, and receive a multi-scene scenario with generation prompts.

V1 interface: Purple/dark theme with sidebar navigation. Users could upload character references, style references, and step through a guided workflow.

9 workflow steps (Intro → Music → Brief → Characters → Concept → Scenes → Production → Viral → QC)
Character reference upload system
Multi-paragraph scenario output
Basic scene list generation

Version 1 interface main view showing sidebar navigation, character upload area, and scenario generation workflow

Version 2: Complete Redesign with Modular Architecture

V1 taught me what worked. Users wanted speed, clarity, and production-ready outputs. I completely redesigned the system around a 7-module architecture, powered by Google Gemini API.

The 7-Module System

Module 0: Mode Selector

Users choose: Format (Short/Mid/Long-Form), Content Type (Social Viral / Music Video / Brand Ad / Documentary), Speed Mode (Quick Draft / Full Blueprint). This sets the system's constraints upfront.

Module 1A: Required Inputs

Goal, Primary Platform, Duration Target, Target Audience, plus main concept, narrative text, character references, style references. All required; no skip buttons that lower output quality.

Module 1B: Optional Inputs

Music Vibe/BPM, Aspect Ratio, Budget Tier. Optional inputs unlock advanced features like music sync and platform-specific optimization.

Module 2-7: Character Engine, Concept, Scene Blueprint, Production, Virality, QC

M2 (Character Engine): Full character profile cards with emotional arcs and "causal physics anchors" for AI video consistency.

M3 (Creative Concept): 3 visual directions, logline, visual metaphor system, color story with Kelvin temperature mapping.

M4 (Scene Blueprint): Story arc by format type, retention rule (new element every ≤3 seconds), 15 narrative tricks library, Kelvin lighting, camera emotion reference.

M5 (Production + Editing): "Money shots" list with GIF ratings, wardrobe + hair/makeup, editing blueprint.

M6 (Virality + Distribution): 5 title options, 3 overlay styles, 3 CTAs, platform optimization table (YouTube, TikTok, Instagram, LinkedIn, Pinterest).

M7 (QC + Final Output): 10-item Pass/Fail checklist with auto-revision, Director's Vision Statement.

Module 0: Mode selector interface showing Format, Content Type, and Speed Mode options

Output: Structured Scenario

Project DNA

Immutable central metadata that constrains all scenes and keeps the entire project coherent.

Character DNA

Reusable character identity rules that maintain consistency across every generated scene.

Style DNA

Visual language rules — color story, Kelvin mapping, and aesthetic constraints for AI generation.

Director's Vision

One sentence that explains the entire project — the creative north star for every decision.

Visual Timeline

Numbered scenes with actions, transitions, generation prompts, and audio guidance.

Supporting Tools & Documentation

Beyond the app, I created:

Cinematic Prompt Guidelines: Camera movements mapped to emotion (Slow Push-In = intimacy/tension, 360 Orbit = power/madness, Dutch Angle = unease)
Color Temperature Guidance: Kelvin scale for consistent visual mood
Interactive Quality Checklist: 10-item pass/fail for narrative clarity, transitions, visual consistency, prompt quality
User Guidelines (8 pages): Documented the full modular architecture for creators

Version 2 detailed output view showing scene breakdown with action description, transition, video generation prompt, and audio guidance

Development: Engineering the Master Prompt

The core of this tool is a master video scenario prompt — a carefully engineered instruction set that guides the AI through the entire scenario generation pipeline. Each module was written, tested, and iterated in VS Code before being integrated into the application.

Below are snapshots from the actual development process, showing the prompt architecture being built module by module.

VS Code editor showing Module 1 Required Inputs code with 10 required fields including goal, primary platform, duration target, target audience, and character details

Module 1: Required Inputs — The input collection layer defining 10 required fields with smart question logic and fallback defaults. Each field extracts maximum creative signal with minimum user friction.

Module 2: Character Engine — Character Profile Card structure with physical appearance anchors, wardrobe per scene, emotional state arcs, and causal physics rules for AI video consistency.

VS Code editor showing the 15 Narrative Tricks Library code with descriptions for Hook Open, Curiosity Gap, Misdirection, and other storytelling techniques

Module 4: Narrative Tricks Library — All 15 data-backed storytelling techniques defined as selectable options within the Scene Blueprint module, each with tactical guidance.

Result: Working Prototype with Measurable System

I shipped a fully functional prototype that demonstrates the concept. The system architecture is production-ready and extensible.

8 Modules in architecture

15 Narrative tricks documented

10 QC checklist items

5 Platform optimization targets

∞ Scene-by-scene prompt generation

A creator who would spend 6+ hours writing individual scene prompts, managing character consistency, and optimizing for platform now receives a complete, production-ready scenario in minutes.

What the Output Enables

Direct input to Sora, Runway, Midjourney video generation (with custom prompts per scene)
Character consistency maintained across all scenes via Character DNA String
Music sync and pacing guidance built into timeline
Platform-specific optimization (aspect ratio, pacing, title options)
Quality checklist that creators can iterate on without starting over

Version 2 outcome screen showing generated scenario overview with Project DNA, character profiles, and scene list

The AI Layer: How AI Shaped Every Decision

This project is not a "how I'd build this if I used AI today" case study. I actively used AI throughout, and it fundamentally shaped the system.

AI in Research

Tested four AI environments (ChatGPT, Claude, Codex, AntiGravity) head-to-head across scenario generation. This comparison became the research foundation for the entire product.

AI as Design Material

Treated Gemini API capabilities as design constraints. The 7-module architecture was shaped by what AI could reliably output. The Master Scenario Prompt is itself a design artifact — engineering how AI thinks about video.

AI in Production

Gemini API generates structured JSON outputs: Project DNA, Character DNA, Scene blueprints, optimization suggestions. The API handles narrative synthesis; the UI presents it clearly.

Evidence of AI Impact

4 AI environments tested: Research grounded in comparative evaluation, not guesswork
Prompt architecture designed: The system treats prompt structure as a design artifact, not a hack
Gemini API integration: v2 powered by generative AI, not templates
Scene-by-scene prompt generation: Each scene receives a custom, AI-generated prompt optimized for its platform

Next Steps

This project demonstrates my approach to designing with and for AI: rigorous research, system thinking, and outcomes that matter.

Let's Talk About Your Project

The Challenge: No Tools for Structured AI Video Scenarios