In partnership with

You're paying $300-500 per UGC video.

Maybe more. The good creators charge $800+.

And here's what you get for that money:

  • 1 video

  • 1 script interpretation

  • 1 take (maybe 2 if you're lucky)

  • 1 creator who may or may not match your audience

  • 3-5 day turnaround

Want variations? Pay again. Different demographic? Pay again. A/B test hooks? You guessed it.

The economics are broken.

Last week, Kling 3.0 dropped. And buried in the announcement was a feature that changes everything.

It's called "Elements."

Upload a reference image of a person. The AI locks their identity across every video you generate. But here's the part most people miss: Elements isn't just for faces. Upload your SaaS dashboard, your app screen, your product packaging (and it appears in the video as a locked element too). Your AI creator actually using your actual product.

Combine that with native lip-sync in 5 languages, multi-shot storyboards (up to 6 shots), and 4K output...

You're not generating videos anymore. You're building a creator roster with your real product baked in.

I’m giving away a Skill you can use for free to make your UGC army. Here's exactly how I'm using it.

Ship the message as fast as you think

Founders spend too much time drafting the same kinds of messages. Wispr Flow turns spoken thinking into final-draft writing so you can record investor updates, product briefs, and run-of-the-mill status notes by voice. Use saved snippets for recurring intros, insert calendar links by voice, and keep comms consistent across the team. It preserves your tone, fixes punctuation, and formats lists so you send confident messages fast. Works on Mac, Windows, and iPhone. Try Wispr Flow for founders.

THE UNLOCK: Character & Product Persistence

Before Elements, AI video had a fatal flaw for UGC: consistency.

You'd generate a testimonial video. Great. Generate another with the same "person"? Completely different face. Different vibe. Different everything.

No brand would use that. UGC works because people recognize the creator. Trust compounds across videos.

Elements solves this. And it goes further than most people realize.

Upload one reference image of a person. The model creates an identity lock. Now that exact person (same face, same features, same vibe) appears in every video you generate.

But Elements supports multiple locked references per video. Upload your creator and your product (a SaaS dashboard screenshot, an app screen, product packaging) and both appear consistently. Your AI creator scrolling through your actual app. Holding your actual product. Reacting to your actual interface.

Same product. Different scripts. Different settings.

That's the foundation of a UGC factory.

THE SYSTEM: Director-First Pipeline

Here's the architecture I've been testing. The key insight: visuals before script. Design how each shot looks and moves first, then write dialogue that fits. The script serves the visuals, not the other way around.

UGC CONTENT FACTORY — DIRECTOR-FIRST PIPELINE

┌─────────────────────────────────────────┐
│  1. CHARACTER DESIGN                    │
│  → Generate reference images (Nano      │
│    Banana) matching your target demo    │
│  → Lock character in Kling Elements     │
└─────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  2. VISUAL STORYBOARD                   │
│  → Design shot-by-shot visual sequence  │
│  → Variable durations (3s/4s/5s)        │
│  → Kling prompt engineering             │
│  → NO dialogue yet — visuals first      │
└─────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  3. SYNCHRONIZED SCRIPT                 │
│  → Write dialogue fitted to shot        │
│    durations (~2.5 words/sec)           │
│  → Natural cadence + filler words       │
│  → Script serves the visuals            │
└─────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────┐
│  4. GENERATE + MULTIPLY                 │
│  → fal.ai Kling 3.0 with character lock │
│  → Multi-shot storyboard                │
│  → Native lip-sync                      │
│  → Variant multiplication               │
└─────────────────────────────────────────┘

Let me break down each step.

STEP 1: Build Your Creator Roster

First, you need characters that match your target audience.

(As always) I’m using Nano Banana Pro (Google's image model via Replicate) to generate reference images.

The secret: Most AI image prompts create obviously fake people. The fix is adding imperfections deliberately.

The Photorealistic Prompt Formula:

Raw iPhone [model] photo, [shot type] of [character details],
[hair with imperfections], [natural expression], [casual clothing],
[activity/pose], [specific location], [natural lighting + time],
candid moment, unfiltered, authentic Instagram aesthetic, f/1.8,
shallow depth of field, slight grain

Physical details: visible skin texture and pores, natural under-eye
area, flyaway hairs, subtle facial asymmetry, realistic hands
(5 fingers, natural pose)

Negative prompt: CGI, 3D render, perfect skin, plastic, beauty
filter, symmetrical, studio lighting, fake, artificial, model pose

Example for a DTC skincare brand:

Raw iPhone 15 Pro photo, front-facing selfie of 32 year old woman,
shoulder-length brown hair slightly messy, warm genuine smile with
slight laugh lines, wearing casual cream sweater, sitting at kitchen
counter, morning window light creating soft shadows, candid moment,
unfiltered, authentic Instagram aesthetic, f/1.8, slight grain

Physical details: visible skin texture and pores, natural under-eye
area, flyaway hairs, subtle facial asymmetry, realistic hands

Negative prompt: CGI, 3D render, perfect skin, plastic, beauty filter,
symmetrical, studio lighting, fake, artificial, influencer pose

Generate 5 variations. Pick the best. That's your creator.

The key: Natural, not perfect. Add imperfections deliberately. Flyaway hairs, under-eye texture, asymmetry. The negative prompt does half the work by telling the AI what not to do.

Once you have your reference, you pass it to the Elements system via fal.ai's API. Now every video you generate uses that locked identity.

I built a roster of 6 characters:

  • Female, 28-35, casual lifestyle (general consumer)

  • Female, 22-28, trendy/energetic (Gen-Z products)

  • Male, 30-40, professional casual (B2B/productivity)

  • Male, 25-35, fitness aesthetic (health/wellness)

  • Female, 35-45, put-together mom (family/home products)

  • Male, 22-30, creator aesthetic (tech/apps)

Different products call different creators from the roster.

STEP 2: Design Your Visual Storyboard (Director-First)

Key principle: visuals before script. Design how each shot looks and moves first, then write dialogue to fit. Most people do this backward—they write a script and try to make visuals match. That leads to talking-head videos where nothing changes visually.

This is where Kling 3.0's multi-shot feature shines. You can define up to 6 distinct shots in one generation. The AI handles transitions, shot composition, and character continuity automatically.

Variable shot durations: Each shot can be 3s, 4s, or 5s. Hooks should be punchy (3s). Demos need room to breathe (5s). Total must sum to your target duration.

Here's my testimonial storyboard template:

SHOT 1 (3s): HOOK
- Character at kitchen counter, picks up phone
- Selfie angle, looks at camera
- Expression: genuine excitement
- [Dialogue TBD — visuals first]

SHOT 2 (5s): SETUP + PRODUCT
- Same setting, character holds up product
- Shows product clearly, closer framing
- Expression: engaged, sharing
- [Dialogue TBD]

SHOT 3 (4s): PAYOFF
- **Stands up**, demonstrates product
- Camera angle change
- Expression: genuine conviction
- [Dialogue TBD]

SHOT 4 (3s): CTA
- Direct to camera
- Eye contact, warm smile
- Expression: recommendation
- [Dialogue TBD]

Notice: no dialogue yet. Just pure visual design. The script comes next, fitted to these durations.

The AI generates all 4 shots with smooth transitions. Same character throughout.

15 seconds of UGC in one generation.

The 2-of-4 Rule: At least 2 of your 4 shots must have a meaningful position or angle change. If every shot is "sitting at counter, talking" → transitions feel weird and there's no visual progression.

Build variety into every storyboard:

  • Shot 1: Picks up phone, starts talking

  • Shot 2: Leans in, holds up product (closer framing)

  • Shot 3: Stands up, demonstrates product

  • Shot 4: Direct to camera, different angle

STEP 3: Write the Script to Fit

Now that your visual storyboard is locked, write dialogue that fits each shot's duration.

The word budget: ~2.5 words per second. A 3-second shot gets ~7 words. A 5-second shot gets ~12. Go over and the lip-sync feels rushed. Go under and there's dead air.

Shot Duration

Word Budget

Example

3 seconds

~7 words

"Okay I have to tell you something"

4 seconds

~10 words

"The difference after three weeks is honestly unreal"

5 seconds

~12 words

"I've been using it every morning and I'm not going back"

Generic AI copy = obvious fake. Here's what actually sounds human:

The Testimonial Script (15 sec)

"Okay I need to talk about [product]. I've been using it for
[specific timeframe] and honestly? [Specific result]. Before
this I was [previous struggle]. Now I just [positive outcome]
every single time. If you're dealing with [pain point], you
need to try this."

What makes it work:

  • "Okay" and "honestly?" are filler words real people use

  • Specific timeframe (not "a while" but "three weeks")

  • Before/after structure

  • Direct recommendation at end

The Unboxing Script (15 sec)

"Look what just came in. I've been waiting for this [product].
[Opening reaction]. Oh wow okay, [first impression about
quality/design]. I'm going to try it for [timeframe] and update
you guys. But honestly [initial positive signal]. This is gonna
be good."

The Problem/Solution Script (15 sec)

"I used to [specific struggle] all the time. It was so [how it
made them feel]. Then I tried [product] and [what changed]. Now
I [positive result] without [previous hassle]. If you deal with
[pain point], this is literally the answer."

The pattern: Specific details + filler words + natural rhythm. Map every line to a shot from your beat sheet. The script serves the visuals, not the other way around.

THE PROMPTING SECRET (Think Shots, Not Keywords)

Most AI-generated UGC gets rejected by viewers in the first 2 seconds. Human brains detect unnatural movement.

We've spent our entire lives watching real people move. Our pattern recognition is insanely sophisticated. When movement is wrong, even slightly, we feel it immediately.

"Something's weird about this."
"This doesn't look right."
"Fake."

The issue: Most people prompt AI video like they prompt images. They describe what's in frame, not how it moves.

"Woman talking about skincare."

Kling generates motion. Your prompt needs to describe movement patterns that match what humans unconsciously expect.

Here's what human brains expect from real UGC:

  • Slight handheld drift (phones aren't tripods)

  • Natural weight shifts (people don't sit perfectly still)

  • Organic gestures (we talk with our hands)

  • Environmental interaction (touching products, moving objects)

  • Imperfect framing (real people don't compose perfect shots)

What makes human brains reject it:

  • Perfect stability (this was on a tripod = production)

  • Unnatural movements (AI physics that don't match reality)

  • Zero camera drift (phones always have micro-movement)

  • Robotic gestures (programmed, not natural)

The fix is prompting for the imperfections humans expect.

The 6 Elements of a Strong Prompt:

Element

What It Is

Example

Camera

Shot type + movement

Handheld shoulder-cam drifts with subtle sway

Subject

Who + action

Woman in cream sweater picks up phone, looks at camera

Environment

Setting

Bright modern kitchen, marble counter

Lighting

Source + feel

Morning window light creating soft shadows

Texture

Physical details

Visible coffee steam, slight hair flyaways

Emotion

Mood

Genuine excitement, warm energy

The 4 Rules:

  1. Motion verbs matter: "dolly push-in" not "camera moves closer"

  2. Texture = credibility: condensation, fabric sheen, visible breath

  3. Describe temporal flow: beginning → middle → end in one shot

  4. Name real light sources: "morning window light" not "nice lighting"

Weak vs Strong:

Weak

Strong

Woman talking

Woman in casual sweater picks up phone, looks at camera with genuine excited expression

In kitchen

Bright modern kitchen, morning window light, marble counter slightly blurred in background

Nice lighting

Soft window light creating natural shadows, not ring-lit

Write prompts like single-shot film directions. One flowing sentence, not a keyword list.

THE HOOK LAYER (This Is Where Most AI UGC Fails)

Your hook needs to happen in the first second. Not after buildup.

Most people focus on the script. But UGC hooks work in layers:

The Best performing UGC uses 2-3 layers simultaneously.

Here's the hook formula that actually works:

Discovery Hooks (High Performers):

"Sooo apparently I've been [doing X] WRONG"
"Been [doing X] for [time] and NOW I find this?"
"I could LITERALLY KISS the person that showed me this"

Disbelief Hooks:

"Wait... did that actually just work?"
"I'm not even kidding right now"
"Okay I need to talk about this"

Pattern Interrupt Hooks:

"Stop scrolling. This is important."
"If you [pain point], you need to see this"
"I've been gatekeeping this for too long"

The pattern: Audio hook + visual reaction + optional text overlay.

Your AI character says the hook while their face shows genuine surprise or excitement. Text reinforces with different words. Three layers, one second.

THE AUDIO TRAP (Your Brain Knows Something's Wrong Before You Do)

Here's what nobody talks about: your audience has been trained since
birth to detect fake human behavior.

Not consciously. Unconsciously.

When audio is too clean, too perfect, too "produced", people don't
think "that's an AI ad." They just feel something's off. And they scroll.

I tested this with a client last week. I generated 10 testimonial
videos. Clean audio on 5, imperfect on the other 5.

The clean videos: 0.6% CTR
The audio: 2.8% CTR

People just didn't engage.

Here's what authentic UGC audio actually sounds like (and why your
brain trusts it):

  • Room reverb (your brain expects enclosed spaces to have echo)

  • Faint background ambience (real life has ambient sound)

  • Casual cadence with natural pauses (nobody speaks in perfect sentences)

  • Slight volume variation (phone mics aren't studio equipment)

  • "Um"s and self-corrections (thinking while talking is human)

The algorithm doesn't judge. But your audience's unconscious pattern
recognition? That's been training for 200,000 years.

THE "STEALTH CHECK" (Does It Smell Like An Ad?)

Here's the uncomfortable truth: most AI-generated UGC fails not because the video quality is bad, but because it smells like an ad.

Your audience can't articulate what's wrong. They just feel it.

"This feels like an ad."
"Something's off."
"I don't trust this."

Before generating, I run every concept through 4 mandatory checks. They're about matching what humans unconsciously expect from authentic content.

The 4 HUMAN DETECTOR CHECKS:

  1. CAMOUFLAGE (0.5s Test): Would a human scroller identify this as "someone's real content" or "a brand's ad" in the first half-second?

    What makes humans scroll:
    • Lighting too perfect (our brains know what phone cameras look like)
    • Fonts look corporate (we've seen 10,000 ads, we know the patterns)
    • Audio is stock music (triggers "commercial" association instantly)

    The fix: Make the first frame look like a person filmed it on their
    phone. Because that's what we're claiming it is.

  2. VIBE (3s Test): Did the viewer stop scrolling but leave within 3 seconds? That's the "something's off" signal. They gave it a chance. Their unconscious
    mind rejected it.

    What triggers rejection:
    • Story is boring (no reason to keep watching)
    • Setup takes too long (real UGC gets to the point)
    • Creator sounds scripted (our brains detect rehearsed speech patterns)

    The fix: Lead with value (entertainment or education) before any
    product mention. Earn the attention.

  3. INTEGRATION (when the product appears): When the product shows up, does the viewer swipe away? That's the moment humans detect "oh, this is a sales pitch."

    • What makes humans bail:
      • Transition from content to product is clumsy
      • Tone shifts to "sales voice" (we've heard this voice 1000 times)
      • The product appearance feels forced, not organic

      The fix: Make the product appearance inevitable. A natural detail in
      a genuine moment, not a scripted reveal.

  4. IMPERFECTION (Authenticity Audit): Does this feel like a real person's content or a marketing team's output?

    Humans know the difference, even if they can't explain it.

    • What triggers the "fake" detector:
      • Too polished (real people don't have production teams)
      • Generic scripts ("This product is amazing" - nobody talks like this)
      • Perfect delivery (real humans pause, restart, say "um")

      The fix: Add deliberate imperfection. Slightly off framing. Natural
      pauses. Casual language ("Honestly?" "Okay so" "You guys").

If any test fails humans scroll. Meta sees low engagement. The ad dies.

You're not gaming the system. You're creating content that passes the
test humans have been running unconsciously since childhood.

STEP 4: The Variant Multiplier

Now we’re going to flip the economics.

At $500 per real UGC video, that's $7,500 in creative testing.

With this system: maybe $50 in API costs.

WHAT I'M ACTUALLY USING THIS FOR

Client UGC at Scale

We run paid social for DTC brands. UGC is always the bottleneck.

We're not replacing human UGC creators. We're filling the gap between "we need 10 creative variants" and "our budget is 2 videos."

Testing Before Investing

Here's the smart play: AI-generated UGC as creative testing before paying real creators.

Generate 10 AI testimonials with different hooks, angles, demographics. Run them at low spend. See what resonates.

The winner? Now brief a real creator to recreate that winning combo.

You're not guessing anymore. You're validating with data before writing the bigger check.

THE HONEST LIMITATIONS

Is this perfect? No. Here's what doesn't work yet:

Lip Sync Drift: Once you hit 10 seconds the lip sync starts to drift. Ask the skill to generate in smaller chunks if you find this.

Zero editing: You'll still want to add captions, music, and a cut or two. This isn't "press button, receive finished ad."

Generation artifacts: Sometimes you get weird stuff, object morphing (phone turns into product), extra hands appearing, face drift mid-video. It's stochastic failure. Just reroll. Expect around 4 regenerations per final video.

Scene transitions: If your shots are too similar (sitting → still sitting → still sitting), transitions feel awkward. Build visual variety into your storyboard. Position changes, angle changes, framing changes.

But for the core job, consistent character + natural testimonial + multiple variants, it's shockingly good.

HERE’S THE MATH

Traditional UGC

AI UGC System

Cost per video

$300-800

~$5-10

Time to first video

3-5 days

5 minutes

Variants from one concept

1-2

10-15

Character consistency

Depends on creator

Perfect

Script changes

Pay again

Free re-run

A/B testing hooks

Budget killer

Default workflow

This definitely DOES NOT replace your best-performing human creators. What it replaces is the $15,000/month you're spending on "good enough" volume.

THE BIGGER PICTURE

Six months ago, AI video was a novelty, unreliable. Interesting demos, unusable for real work.

But that gap between "cool tech demo" and "wait this might be actually useful for my business" is closing fast.

But now that we have character persistence, product consistency, and realistic motion, that changes the calculus. The power here isn’t creating one-off videos. Think of this running across dozens of stealth accounts, finding the best hooks, angles, strategies all while getting eyeballs, and then giving your human creators the best of the best. That's a content strategy.

The brands that figure this out first will have 10x the creative testing velocity at 1/10th the cost.

The UGC arbitrage window is open. It won't stay open forever.

Get the Skill (Free)

I'm giving away the entire UGC Content Factory skill. The same system I use above, automated.

You get:

  • The complete Claude Code skill (drop in and run /ugc-content-factory)

  • Full director-first pipeline: Creative Director → Cinematographer → Screenwriter → Engineer

  • Character library with archetypes and saved character templates

  • Scene settings reference

  • Kling 3.0 prompt templates and API integration

  • Just add your fal.ai API key and go

Takes about 10 minutes to set up. Then you're generating.

Want Us to Build Your UGC Army For You?

If you'd rather skip the learning curve and just get the content, we're opening a few UGC strategy sessions this month.

We'll build your first character roster, generate test videos for your product live, and map out a variant testing plan so you know exactly which hooks and creators to scale.

Reply and tell me: what product would you test first with this system? I read every response.

Let's build,

Matt

P.S. The irony isn't lost on me that I spent years building relationships with great UGC creators... and now I'm teaching you to replace some of that work with AI. The difference? The best human creators are still irreplaceable. What's replaceable is the $400 "good enough" middle tier. AI just became the new floor.

If this newsletter lit a fire under you, forward it to one person who would benefit.

🎁 Get rewarded for sharing! My team grew NEW accounts to over 50 million views in just a few months. We made an AI viral hook generator so you can follow the same hook frameworks that we do.

Invite your friends to join, and when they sign up, you’ll unlock our AI Viral Hook Generator—the ultimate tool for creating scroll-stopping content.

{{rp_personalized_text}}

Copy and paste this link: {{rp_refer_url}}

Keep Reading