Humanising Enterprise Brand Communication Through AI

Exploring how enterprises can leverage AI video production to deliver brand messaging at scale, without a camera, crew, or studio.

Concept & Intent

The video was designed to demonstrate what AI-produced brand content can feel like when it is directed rather than generated. One character. One voice. One brand message. Delivered in ten seconds.

Narrative Arc

Work → Rise → Connect

A woman at her desk, absorbed in work, present and focused. She rises and walks toward the camera with quiet confidence. By the time the close-up lands and she meets the lens, the distance between a brand and its audience has collapsed.

Wide shot establishes place and character → medium shot carries the turning point → close-up delivers the emotional and verbal payoff.

Workflow

The production ran across four tools in a deliberate sequence, each stage feeding directly into the next.

1. Voice Over : ElevenLabs

The script was voiced using Jane, a professional audiobook reader voice on ElevenLabs. The delivery was tuned to feel warm and measured, sitting slightly slower than a typical ad read to match the pace of the visuals.

2. Multi-Camera Video: Kling

Kling’s Multi-Cam feature generated the entire 10-second video in a single generation, three camera angles produced as one continuous output. A Global Prompt locked character, environment, and colour grade across all angles. Each Camera Prompt directed only the action, framing, and motion specific to that shot.

3. Lip Sync: Sync.so

The close-up shot was run through Sync.so using Lipsync 2.0. The trimmed voiceover audio was matched to the last three seconds of the video, syncing the character’s mouth to the final line of the script.

4. Edit: Premiere Pro

The clips were assembled in Premiere Pro with an original music track added underneath the voiceover and a simple colour correction applied across the timeline.

Video Platform:

Kling V3

Voice Over Model:

Eleven Multilingual v2

Lip Sync Model:

Sync.so – Lipsync 2.0 Pro

Editing

Adobe Premiere Pro

The Prompts

Kling’s Multi-Cam feature works in two layers. A Global Prompt defines everything that must stay consistent across all shots: the character, the environment, the lighting, and the colour grade. Each Camera Prompt then takes over for its individual shot, directing only the action, framing, and movement for that specific angle. This separation is what makes coherent multi-shot generation possible in a single output.

Format: Plain Text (Kling UI Direct - With Multi Camera)

[Global Prompt] (applied to all cameras)

Cinematic enterprise brand video. Ultra-realistic, 4K. Warm cinematic colour grade, amber and ivory tones. Professional woman, early 30s, warm beige fitted blazer, dark slim trousers, minimal gold jewellery, natural makeup, hair neatly tied back. Modern high-rise corporate office, sleek minimal furniture, large floor-to-ceiling windows, soft blurred city skyline. Warm ambient interior lighting. Consistent character appearance, lighting, and environment across all shots. Smooth camera movement throughout. No jump cuts. Photorealistic.

[Negative Prompt]

cartoon, animation, CGI look, cold blue tones, harsh shadows, overexposed, multiple people, character inconsistency, costume change, talking mouth, lips moving, open wide mouth, stock footage aesthetic, stiff robotic movement, blurry face, low resolution, watermark, text on screen, logo overlay

[Camera 1: Establishing Wide Shot – 0 to 3s]

Wide shot. A woman is seated at a sleek, minimal glass-top desk. Laptop open in front of her, screen glowing on her face. She is absorbed in her work, focused, thoughtful. One hand rests on the keyboard. The other holds a pen lightly. Slow rack focus from the blurred city window behind her to her face. Static camera with very subtle push-in.Warm morning light fills the frame.

Camera Motion: Slight Push In – very slow
Focus: Rack focus: background to subject
CFG Scale: 0.5

[Camera 2: Stand and Walk – 3 to 7s]

Medium shot. The same woman closes her laptop with quiet intent, deliberate, not abrupt. She places both hands on the desk and rises from her chair with calm authority. She straightens, turns her gaze directly toward the camera, and begins walking forward with purpose. Smooth dolly push-in follows her movement. Shallow depth of field, desk and office blur behind her. Her posture is composed, unhurried, and confident.

Camera Motion: Dolly Push In – medium speed, follows subject

[Camera 3: Locked Close-Up – 7 to 10s]

Tight close-up portrait. The same woman has walked toward the camera – her face fills 70% of the frame. She holds steady, direct eye contact with the lens. Expression: composed, warm, approachable authority. Lips slightly parted and relaxed, neutral, not smiling wide. Head very slightly tilted. Minimal head movement. Soft bokeh office lights form a halo behind her. Warm amber light illuminates her skin evenly.

Stable shot – only natural breathing movement. This shot must hold perfectly still for lip sync in post.

Output Analysis - What Worked & What Didn't

The video largely achieved its intent. Character consistency held across all three angles, the warm colour grade read as cinematic rather than corporate, and the voiceover delivery felt grounded and human.

Worked:

Global Prompt Held the Character Together

The Global Prompt architecture was the deciding factor in character consistency. Without it, multi-shot AI generation tends to drift in wardrobe, lighting, and facial features between cuts. Locking those variables at the top kept the output coherent.

Flaws & Limitations

Laptop Close Did Not Execute

Camera 2 was prompted to show the woman closing her laptop before standing. Kling did not execute this action. She rises and walks without the closing beat, which weakens the intended narrative transition from doing to deciding. This is a prompt fidelity issue: Kling V3 does not always honour specific object interactions, particularly subtle hand actions. Breaking this into a more explicit instruction or simplifying the action may improve results in the next iteration.

The 9-Second Seam

At the 9-second mark, the voiceover and the lip sync end simultaneously, creating a visible frame shift where the Sync.so output cuts back to the original Kling footage. The root cause is a timing collision: when audio and lip sync end on the same frame, any inconsistency in how Sync.so closes the mouth becomes noticeable.