AI Cinematic Direction: Structured Prompt Engineering with Kling AI
The warrior walks into the fog - not by accident, but by design.
Project Overview
This case study documents a structured approach to AI video generation, moving away from single vague text prompts toward a precise, shot-by-shot direction methodology. Using Kling AI v3, a cinematic warrior sequence was produced by engineering a multi-shot prompt with defined framing, camera movement, lighting, and mood for each individual shot.
The result is a reproducible, intentionally directed AI video output, not a random generation, but a cinematographer’s workflow applied to generative AI.
Platform:
Kling AI
Model:
V3 Master
Duration:
10 Sec
Shots:
3 Shots
Why Structured Prompt Engineering?
Most AI video tools accept a single text prompt. The problem: one paragraph of text gives the model no structure, shots blur together, camera motion is random, and consistency between scenes breaks down.
Structured prompt engineering solves this by treating each shot as a discrete unit of instruction, exactly how a director would brief a cinematographer on set.
Single Text Prompt
- Unpredictable shot transitions
- Random camera movement
- Inconsistent lighting & mood
- Difficult to iterate on one part
Structured Multi-Shot Prompt
- Each shot has a defined purpose
- Explicit camera instructions
- Consistent visual language
- Easy to tweak shot by shot
How a Structured Prompt Works
Layer 1 — Shot Label
[SHOT 1 – Wide Establishing Shot]
[SHOT 2 – Medium Close-Up]
[SHOT 3 – Low Angle Tracking Shot]
Layer 2 — Subject & Action
Defines who or what is in frame and what they are doing. The subject is always introduced clearly before camera instructions.
– A lone warrior stands on a cliff at sunset, wind blowing his cape.
– His face, eyes scanning the horizon, jaw clenched.
– He turns and walks into the fog.
Layer 3 — Camera Movement
Tells the model how the virtual camera should behave. This is the most critical layer for cinematic output.
– slow camera push in
– camera holds on face, shallow depth of field.
– camera follows at ground level.
Layer 4 — Lighting & Mood
Sets the emotional tone and visual atmosphere of the shot. Lighting direction changes the entire feel.
– golden hour lighting.
– dramatic side lighting.
– cinematic slow motion, fog atmosphere.
Layer 5 — Global Style Directive
Applied once at the end, covers the visual language for the entire video. Keeps all shots consistent.
Style: Epic cinematic, 4K, no shaky cam, film grain
The Prompts - JSON | Plain Text
Format 1: JSON (API / Developer Use)
{
"model_name": "kling-v3-master",
"duration": "10",
"aspect_ratio": "16:9",
"cfg_scale": 0.7,
"negative_prompt": "blurry, shaky cam, low quality, watermark",
"multi_shot": true,
"shot_type": "customize",
"multi_prompt": [
{
"index": 0,
"shot_label": "Wide Establishing Shot",
"prompt": "A lone warrior stands on a cliff at sunset,
wind blowing his cape, slow camera push in,
golden hour lighting",
"duration": "3"
},
{
"index": 1,
"shot_label": "Medium Close-Up",
"prompt": "His face, eyes scanning the horizon, jaw clenched,
dramatic side lighting, shallow depth of field",
"duration": "3"
},
{
"index": 2,
"shot_label": "Low Angle Tracking Shot",
"prompt": "He turns and walks into the fog, camera follows
at ground level, cinematic slow motion",
"duration": "4"
}
],
"style": "Epic cinematic, 4K, film grain"
}
JSON Field Reference
Field
Value
What It Controls
model_name
kling-v3-master
Model version to use
duration
10
Total video length in seconds
aspect_ratio
16:9
Landscape / portrait / square
cfg_scale
0.5 – 1.0
Prompt adherence (0.7 = balanced)
negative_prompt
blurry, watermark etc
What to avoid generating
multi_shot
true
Enables multi-shot mode
index
0, 1, 2…
Shot sequence order
shot_label
Wide Shot, Close Up…
Human-readable shot name
prompt
Scene description
What happens in this shot
duration
3, 3, 4
Each shot’s length in seconds
Format 2: Plain Text (Kling UI Direct)
Paste this directly into the Kling AI web interface at app.klingai.com, no coding required. The model recognizes the bracketed shot labels and structured layout as directorial instructions.
[SHOT 1 – Wide Establishing Shot]
A lone warrior stands on a cliff at sunset, wind blowing his cape,
slow camera push in, golden hour lighting.
[SHOT 2 – Medium Close-Up]
His face, eyes scanning the horizon, jaw clenched,
dramatic side lighting, shallow depth of field.
[SHOT 3 – Low Angle Tracking Shot]
He turns and walks into the fog, camera follows at ground level,
cinematic slow motion.
Style: Epic cinematic, 4K, no shaky cam, film grain
Negative: blurry, distorted, watermark, jerky motion
Why Both Formats?
Shot Grammar Reference
Shot Types
Use For
Example
Wide / Establishing
Open a scene, show scale
“Aerial view over Mumbai at dusk”
Medium Shot
Character in context
“Vendor at his stall, waist up”
Close-Up
Emotion, detail
“His face, jaw clenched, side lit”
Extreme Close-Up
Tension, intimacy
“Eyes reflecting distant fire”
Low Angle
Power, drama
“Looking up at the warrior”
POV
Immersion
“First person running through crowd”
Tracking Shot
Motion, follow
“Camera follows from behind”
Drone / Aerial
Scale, beauty
“Slow descending over city”
Camera Movements
Movement
Effect
Prompt Keyword
Push In
Builds intimacy / tension
“slow camera push in”
Pull Out
Reveals scale or isolation
“camera slowly pulls back”
Pan Left/Right
Follows action
“camera pans left slowly”
Tilt Up/Down
Reveals height or depth
“camera tilts up to sky”
Hold / Lock
Stability, focus
“camera holds steady”
Handheld
Raw, documentary feel
“handheld slight shake”
Follow / Track
Dynamic pursuit
“camera follows at ground level”
Lighting Keywords
Lighting
Mood It Creates
When to Use
Golden hour
Warmth, hope, epic
Hero moments, beauty shots
Dramatic side lighting
Tension, mystery
Character close-ups
Neon lit
Cyberpunk, urban
City night scenes
Backlit / silhouette
Mysterious, poetic
Silhouette reveals
Soft diffused light
Gentle, emotional
Intimate scenes
Harsh shadows
Danger, noir
Thriller, conflict