AI Cinematic Direction: Structured Prompt Engineering with Kling AI

The warrior walks into the fog - not by accident, but by design.

Project Overview

This case study documents a structured approach to AI video generation, moving away from single vague text prompts toward a precise, shot-by-shot direction methodology. Using Kling AI v3, a cinematic warrior sequence was produced by engineering a multi-shot prompt with defined framing, camera movement, lighting, and mood for each individual shot.

The result is a reproducible, intentionally directed AI video output, not a random generation, but a cinematographer’s workflow applied to generative AI.

Platform:

Kling AI

Model:

V3 Master

Duration:

10 Sec

Shots:

3 Shots

Why Structured Prompt Engineering?

Most AI video tools accept a single text prompt. The problem: one paragraph of text gives the model no structure, shots blur together, camera motion is random, and consistency between scenes breaks down.

Structured prompt engineering solves this by treating each shot as a discrete unit of instruction, exactly how a director would brief a cinematographer on set.

Single Text Prompt

Unpredictable shot transitions
Random camera movement
Inconsistent lighting & mood
Difficult to iterate on one part

Structured Multi-Shot Prompt

Each shot has a defined purpose
Explicit camera instructions
Consistent visual language
Easy to tweak shot by shot

How a Structured Prompt Works

A structured multi-shot prompt is organized into layers. Each layer adds a level of directorial control. Understanding what each layer does, and why it matters, is what separates an intentional AI film director from someone just typing descriptions.

Layer 1 — Shot Label

Names the shot type and its position in the sequence. Signals to the model what kind of framing and scale to use.

[SHOT 1 – Wide Establishing Shot]

[SHOT 2 – Medium Close-Up]

[SHOT 3 – Low Angle Tracking Shot]

Layer 2 — Subject & Action

Defines who or what is in frame and what they are doing. The subject is always introduced clearly before camera instructions.

– A lone warrior stands on a cliff at sunset, wind blowing his cape.

– His face, eyes scanning the horizon, jaw clenched.

– He turns and walks into the fog.

Layer 3 — Camera Movement

Tells the model how the virtual camera should behave. This is the most critical layer for cinematic output.

– slow camera push in
– camera holds on face, shallow depth of field.

– camera follows at ground level.

Layer 4 — Lighting & Mood

Sets the emotional tone and visual atmosphere of the shot. Lighting direction changes the entire feel.

– golden hour lighting.

– dramatic side lighting.

– cinematic slow motion, fog atmosphere.

Layer 5 — Global Style Directive

Applied once at the end, covers the visual language for the entire video. Keeps all shots consistent.

Style: Epic cinematic, 4K, no shaky cam, film grain

The Prompts - JSON | Plain Text

Below are both the JSON format (for API / developer use) and the plain text format (for direct use in Kling’s UI). They carry identical instructions, only the structure differs.

Format 1: JSON (API / Developer Use)

Used when calling the Kling API or tools like fal.ai programmatically. Each shot is a separate object inside the multi_prompt array, giving precise control over each shot’s duration and content.

				
					{
  "model_name": "kling-v3-master",
  "duration": "10",
  "aspect_ratio": "16:9",
  "cfg_scale": 0.7,
  "negative_prompt": "blurry, shaky cam, low quality, watermark",
  "multi_shot": true,
  "shot_type": "customize",
  "multi_prompt": [
    {
      "index": 0,
      "shot_label": "Wide Establishing Shot",
      "prompt": "A lone warrior stands on a cliff at sunset,
                 wind blowing his cape, slow camera push in,
                 golden hour lighting",
      "duration": "3"
    },
    {
      "index": 1,
      "shot_label": "Medium Close-Up",
      "prompt": "His face, eyes scanning the horizon, jaw clenched,
                 dramatic side lighting, shallow depth of field",
      "duration": "3"
    },
    {
      "index": 2,
      "shot_label": "Low Angle Tracking Shot",
      "prompt": "He turns and walks into the fog, camera follows
                 at ground level, cinematic slow motion",
      "duration": "4"
    }
  ],
  "style": "Epic cinematic, 4K, film grain"
}

JSON Field Reference

Field

Value

What It Controls

model_name

kling-v3-master

Model version to use

duration

Total video length in seconds

aspect_ratio

16:9

Landscape / portrait / square

cfg_scale

0.5 – 1.0

Prompt adherence (0.7 = balanced)

negative_prompt

blurry, watermark etc

What to avoid generating

multi_shot

true

Enables multi-shot mode

index

0, 1, 2…

Shot sequence order

shot_label

Wide Shot, Close Up…

Human-readable shot name

prompt

Scene description

What happens in this shot

duration

3, 3, 4

Each shot’s length in seconds

Format 2: Plain Text (Kling UI Direct)

Paste this directly into the Kling AI web interface at app.klingai.com, no coding required. The model recognizes the bracketed shot labels and structured layout as directorial instructions.

[SHOT 1 – Wide Establishing Shot]

A lone warrior stands on a cliff at sunset, wind blowing his cape,
slow camera push in, golden hour lighting.

[SHOT 2 – Medium Close-Up]

His face, eyes scanning the horizon, jaw clenched,
dramatic side lighting, shallow depth of field.

[SHOT 3 – Low Angle Tracking Shot]

He turns and walks into the fog, camera follows at ground level,
cinematic slow motion.

Style: Epic cinematic, 4K, no shaky cam, film grain

Negative: blurry, distorted, watermark, jerky motion

Why Both Formats?

The plain text format works in Kling’s free UI, ideal for quick creative tests. The JSON format is used when automating batch generation or integrating with a production pipeline. Both produce the same visual output.

Shot Grammar Reference

These are the building blocks used to construct any multi-shot AI video prompt. Mix and match across shot types, camera movements, and lighting to build your own sequences.

Shot Types

Shot Type

Use For

Example

Wide / Establishing

Open a scene, show scale

“Aerial view over Mumbai at dusk”

Medium Shot

Character in context

“Vendor at his stall, waist up”

Close-Up

Emotion, detail

“His face, jaw clenched, side lit”

Extreme Close-Up

Tension, intimacy

“Eyes reflecting distant fire”

Low Angle

Power, drama

“Looking up at the warrior”

POV

Immersion

“First person running through crowd”

Tracking Shot

Motion, follow

“Camera follows from behind”

Drone / Aerial

Scale, beauty

“Slow descending over city”

Camera Movements

Movement

Effect

Prompt Keyword

Push In

Builds intimacy / tension

“slow camera push in”

Pull Out

Reveals scale or isolation

“camera slowly pulls back”

Pan Left/Right

Follows action

“camera pans left slowly”

Tilt Up/Down

Reveals height or depth

“camera tilts up to sky”

Hold / Lock

Stability, focus

“camera holds steady”

Handheld

Raw, documentary feel

“handheld slight shake”

Follow / Track

Dynamic pursuit

“camera follows at ground level”

Lighting Keywords

Lighting

Mood It Creates

When to Use

Golden hour

Warmth, hope, epic

Hero moments, beauty shots

Dramatic side lighting

Tension, mystery

Character close-ups

Neon lit

Cyberpunk, urban

City night scenes

Backlit / silhouette

Mysterious, poetic

Silhouette reveals

Soft diffused light

Gentle, emotional

Intimate scenes

Harsh shadows

Danger, noir

Thriller, conflict

Workflow & Results

Step-by-Step Workflow

1. Define the Story Arc

Decided on 3 acts: establish the world → reveal the character → show movement. Each shot serves a narrative function.

2. Assign Camera Grammar to Each Shot

Applied specific framing, movement, and lighting keywords — each chosen to serve the shot’s emotional purpose.

3. Write Both Formats

Translated the shot plan into both plain text (for Kling UI) and JSON (for API / future automation).

4. Run & Iterate

Generated in Kling AI, reviewed output, adjusted cfg_scale and lighting keywords for the final result.

5. Document & Publish

Archived the prompt structure, settings, and output for portfolio and future reuse in production pipelines.