The Human-First AI Video Playbook

How we drove $3M in pipeline and 450K impressions in 24 hours — with $400 in AI tokens and one very important question nobody's asking

We just closed a $6M raise.

The announcement video drove $3M in pipeline and 450K impressions inside 24 hours. 150+ companies signed up in the first 48 hours. And I started getting the same DM over and over from founders, marketers, and creators who watched it:

"What tool did you use?"

I get why that's the first instinct. The results look like something that required a studio, a production team, and a serious budget. So naturally, people assume there's a secret tool. A better model. A prompt that unlocks magic.

But that question is exactly why most AI video makes you cringe.

And once you see why — you'll never approach AI video the same way again.

Here’s a glimpse into the BTS:

swan AI video - BTS intro.mp4

Here’s the full video:

launch_vid.m4v

Why Your Brain Rejects AI Video Before You Even Know Why

There's a concept in robotics and animation called the Uncanny Valley.

When something looks almost human — but not quite — your brain doesn't just notice. It rejects. Hard. Before your conscious mind has formed an opinion, your nervous system has already filed its verdict: fake. threat. don't trust.

This isn't a taste problem. It's not about production quality. It's a survival mechanism that's been running in humans for thousands of years. And right now, it's absolutely destroying AI video.

The cringe you feel watching most AI ads isn't because the tool was bad. It's because your nervous system is doing exactly what it was built to do — detecting something that looks human but isn't.

Here's the trap most people fall into: they try to fix this by finding a better AI model. Better lip sync. More realistic skin texture. Smoother motion.

But that's the wrong problem.

You cannot AI your way out of the Uncanny Valley. You can only human your way through it.

The Right Question

So forget what tool did you use.

The question that actually unlocks great AI video is this:

Who made the decisions?

In every moment of our raise video — every camera angle, every motion, every note in the soundtrack — a human made a creative decision first. The AI came after. Not to replace the decision. To produce it at scale.

That's the entire framework.

Human leads. AI amplifies.

Now let me show you exactly what that looked like in practice.

How We Made It: Decision by Decision

The Person Behind the Camera

Arian Topol is an AI creator and the creative director behind this production. Not a cameraman — a director. Every shot, every angle, every piece of direction came from him.

He filmed me on his phone. But the phone wasn't the point. His eye was the point.

Before a single AI tool was opened, Arian made dozens of creative decisions: how I should move, where the camera should sit, what energy the opening sequence needed. He directed me through every motion you see in the video.

The AI had nothing to work with until Arian gave it something worth working with.

The lesson: Before you open a single tool, answer this question — who is the human director on this production? If the answer is "nobody, the AI will figure it out" — stop. Find your Arian first.

The Image Layer: Nano Banana Pro → Video Models

We didn't start with video. We started with stills.

Arian used Nano Banana Pro to generate the photographic images that became the visual foundation of the piece. These weren't random generations. Each image was a deliberate creative decision — a specific mood, a specific aesthetic, a specific moment we wanted to capture.

Those images then became inputs for Seedance 1.5 and Kling 3.0 to animate into video clips.

This is a critical workflow insight most people miss: the best AI video starts with AI images, not AI video prompts. Images give you control over the frame before you ask the model to move anything. Garbage in, garbage out — but a beautiful still image gives the video model something extraordinary to work from.

The human decision: what does each frame need to feel like?The AI job: make it move.

The Performance Layer: Kling 2.6 Motion Control

For the opening monologue — the part where you see me speaking directly to camera — we used Kling 2.6 Motion Control.

But here's what actually made it work: I recorded myself performing the script first. Not just reading it. Performing it. The right tone. The right cadence. The pauses that felt natural, not generated.

Then Kling took that human performance as its reference.

This is the Uncanny Valley solution in its purest form. We didn't ask AI to generate a human performance. We gave it a human performance and asked it to produce it better. The humanity was already baked in before the model touched a single frame.

The Live Footage Layer: Higgsfield AI

Beyond the generated sequences, Arian filmed me physically moving — every motion you see in the video was performed by an actual human first.

Higgsfield AI was then layered on top of that footage.

Same pattern. Human input first. AI production second. The model had real human movement to work with — which is exactly why the output didn't trigger your nervous system's rejection reflex.

The Voice Layer: ElevenLabs

I recorded the voiceover myself. In the right tone. With the emotional weight each line needed.

Then ElevenLabs finished it.

Not replaced it. Finished it.

The soul of the delivery was already there. ElevenLabs just made it production-ready.

The Music Layer: Suno AI

My wife sat down at her piano and recorded herself playing and singing.

We ran that through Suno AI.

Think about that for a second. The warmth in that soundtrack — the thing that makes people feel something when they watch the video — came from a human being in a room with a piano. The AI didn't compose the emotion. It produced what a human had already felt.

This is the detail people never expect. And it's the one I'm most proud of.

The Framework, Distilled

Every layer of this video followed the same pattern:

A human made a creative decision — what should this feel like?
A human produced the input — performance, image, recording, movement
AI amplified the output — at speed, at scale, at production quality

The tools weren't the creative directors. Arian was. I was. My wife was.

The AI was the production team that would have cost us $400,000. We spent $400.

What This Means for You

If you're sitting on a video idea right now, here's where to start:

Don't open a single tool until you've answered three questions:

Who is the human director making the creative decisions?
What human inputs am I giving the AI to work from? (performance, image, voice, movement)
Where am I tempted to skip the human layer and let AI figure it out?

That third question is where the Uncanny Valley lives. Every time you let AI make the creative decision instead of just executing it — you're stepping into the valley.

Step back. Make the decision yourself. Then hand it to the model.

The Tools (In Order of Workflow)

For those who want the full stack:

Nano Banana Pro — AI image generation (the visual foundation)
Seedance 1.5 — Image-to-video animation
Kling 3.0 — Image-to-video animation (alongside Seedance)
Kling 2.6 Motion Control — Performance-referenced video generation (the monologue)
Higgsfield AI — AI layered on live human footage
ElevenLabs — Voice production from human-recorded input
Suno AI — Music production from human-recorded piano

Total spend: ~$400 in tokens.

But the tools aren't the point. The decisions that fed them are.

Built by Amos Bar-Joseph and Arian Topol at Swan AI.

Swan AI helps GTM teams build AI agents that amplify human performance — not replace it.→ getswan.com

‍

The Human-First AI Video Playbook

Key Takeaways

How we drove $3M in pipeline and 450K impressions in 24 hours — with $400 in AI tokens and one very important question nobody's asking

Why Your Brain Rejects AI Video Before You Even Know Why

The Right Question