How to Make a 90-Second AI Short Drama in 1 Hour: Complete Workflow
A complete step-by-step workflow for making a 90-second AI short drama in under an hour, with consistent characters across 30+ shots.
Most “AI short film” tutorials online produce 15-30 second clips with one character and call it done. That’s not a drama — that’s a moodboard.
A real short drama needs:
- A complete narrative arc (setup → conflict → resolution)
- Multiple shots from different angles
- A character who looks like the same person throughout
- Proper pacing
- Production-quality finish (no watermarks, captions, or AI artifacts)
This guide walks through how to do all of that in under an hour, using current 2026 tooling.
We’ll use a real example: a 90-second short drama called 《孟婆嫌我烦》 (“Lady Mengpo is annoyed with me”), which a creator made on Juying in 60 minutes. The full piece has 30+ shots, the lead character appears identically throughout, and it went viral on Asian short-form platforms.
The workflow below is the exact one used.
Before you start: what you need
Tools:
- An AI video platform that supports character consistency across shots. We’ll use Juying for this guide; the principles transfer.
- An LLM for script generation. Claude or GPT-4 work well. Many AI video platforms include this step.
- One reference image of your main character (real photo, AI-generated portrait, or sketch).
Time budget: 60 minutes total.
Skill level: Beginner. No prior AI video experience required.
Step 1: The story idea (1 minute)
Start with one sentence. Just one.
For Mengpo, the seed sentence was:
“Lady Mengpo, the goddess who serves the soup of forgetfulness in the Chinese underworld, is annoyed with a soul who keeps chattering.”
That’s it. Don’t over-plan at this stage. The structure comes in step 2.
The constraint: pick a story that doesn’t require more than 2-3 distinct characters and fits 60-90 seconds. Most viral short dramas have one or two leads, a clear conflict, and a quick resolution.
If you’re stuck, three story patterns that work well for AI shorts:
- The reaction beat: something happens, character reacts strongly, twist resolution. (Mengpo follows this.)
- The misunderstanding: A thinks B is doing X, B is actually doing Y, reveal.
- The escalation: small thing keeps happening, gets worse, climaxes.
Step 2: Generate the script (5 minutes)
Feed your one-sentence idea to an LLM with this prompt:
Write a 90-second short drama script based on this idea:
[your one-sentence idea]
Requirements:
- 8-12 scenes, each scene 6-10 seconds
- Specify camera framing for each scene (close-up, medium, wide, etc.)
- Specify lighting and mood
- Include 2-3 lines of dialogue or voiceover where appropriate
- Build a clear arc: setup, conflict, resolution
- End with a memorable beatOutput is a structured script. Review it. Adjust pacing, swap dull scenes for stronger beats. Don’t be precious — short drama scripts get rewritten constantly.
For Mengpo, the LLM produced 11 scenes. The creator kept 9, dropped 2 that didn’t pull weight.
Step 3: Lock the main character (5 minutes)
This is the step most workflows skip and pay for later.
Upload one good reference photo of your main character to your AI video platform. The reference should be:
- High resolution (1024×1024 minimum)
- Front-facing or 3/4 view
- Even lighting (no heavy shadows on the face)
- Single character clearly framed (no other faces in the image)
The platform processes the reference and creates a character asset — typically takes 30-90 seconds. Once the asset is created, every future generation that references this character will use the locked identity.
Why this matters: without character lock, by shot 6 you’ll be looking at a different person. With it, shot 30 still looks like the same character as shot 1.
If your platform doesn’t support persistent character assets, this is where multi-shot AI dramas fail.
For Mengpo, the reference was a single AI-generated portrait of a stern, kind older woman in red robes. Five minutes uploaded, processed, locked.
Step 4: Auto-generate the storyboard (15 minutes)
Modern AI video platforms include a storyboard planner. Feed it your script + your locked character; it produces a shot-by-shot storyboard with:
- Shot framing (close-up, medium shot, wide)
- Camera motion (static, push-in, pan, dolly)
- Lighting setup
- Character pose / expression
- Time code (when this shot starts and ends)
If your platform doesn’t auto-generate storyboards, you can do this manually by writing prompts for each shot. Plan for 15-20 minutes if doing it manually.
A well-planned storyboard prevents the “every shot looks the same” problem that beginners hit. Vary your framings: alternate close-ups with mediums and wides; use a dolly or push-in to add motion; don’t shoot every scene at eye level.
For Mengpo, the storyboard had 30+ shots across the 9 scenes — including reaction close-ups of the soul, hand details on the soup, wide shots of the underworld setting, and subjective POV through the steam.
Step 5: Generate the shots (30 minutes)
This is the longest step but mostly idle time — your platform generates shots in parallel.
Click generate. Walk away. Come back in 30 minutes.
What’s happening behind the scenes:
- 30+ shots queue in parallel (if your platform supports it; serial generation will take much longer)
- Each shot uses your locked character embedding
- Auto-generated negative prompts prevent common drift modes
- Post-hoc consistency checks regenerate any shot that drifts too far
If your platform doesn’t have parallel generation or no-queue dedicated capacity, this step can take hours instead of minutes. That’s the difference between a 60-minute workflow and a one-day workflow.
For Mengpo, this step took 28 minutes — 30 shots, all in parallel, all consistent.
Step 6: Assemble (3 minutes)
Most modern AI video platforms produce a rough assembly automatically — they string the shots together in storyboard order.
Review the assembly. Look for:
- Pacing problems (a shot that lingers too long, a cut that’s too quick)
- Continuity errors (lighting jumps, character pose discontinuity)
- Any shot where character drift slipped through
For real continuity issues, regenerate that single shot. For pacing, trim or extend in the platform’s editor.
Mengpo needed two shots regenerated and a 1-second trim on the closing shot. Three minutes total.
Step 7: Caption removal + upscale (5 minutes)
Most generated AI video has subtle artifacts: small text glitches, watermark-like elements, occasional anomalies. Smart-removal tools clean these without degrading the underlying frame.
Then upscale. 4K outputs look more professional than 1080p, especially for short-form content played on large modern screens.
Both of these are now built into integrated platforms. If using disconnected tools, expect 15-30 minutes here instead of 5.
Step 8: Final polish (1 minute)
Add:
- Title card (1-2 seconds at start)
- End card with credit / handle (1 second at end)
- Background music if appropriate (most platforms include a music selection)
- Subtitle track for platform compatibility
Mengpo’s final touch: a single Chinese-character title card and a watermark crediting both the creator and Juying.
Total time check
| Step | Time |
|---|---|
| 1. Story idea | 1 min |
| 2. Script generation | 5 min |
| 3. Character lock | 5 min |
| 4. Storyboard | 15 min |
| 5. Generate shots | 30 min (mostly idle) |
| 6. Assembly | 3 min |
| 7. Caption removal + upscale | 5 min |
| 8. Final polish | 1 min |
| Total | ~65 min |
The 30 minutes in step 5 are mostly idle. If you start it and walk away, total active time is ~35 minutes.
Tips for higher quality
Pick the right reference image. Bad reference = bad character lock. A blurry or oddly-lit reference will haunt every shot. Spend 5 minutes finding the right one.
Vary framing aggressively. Beginners shoot everything at eye level, medium shot. Pros use close-ups, low angles, high angles, dollies. The variety makes it feel cinematic.
Use silence. A 90-second drama doesn’t need 90 seconds of dialogue. Some of the best short dramas are 50% silent reactions.
Watch real short films before making yours. TikTok and YouTube Shorts have surprisingly cinematic shorts in the first page of any “short film” search. Steal pacing patterns.
Don’t fight the model. If your script asks for something the AI struggles with, simplify. Work with what the model does well.
Common questions
Can this workflow handle multiple characters?
Yes. Lock 2-3 characters at the start of step 3, then reference them by name in prompts. Limitation: if two characters share screen time and have similar features (same gender, age, ethnicity), expect occasional identity bleed in shared frames — about 10% of multi-character scenes need a manual cleanup pass.
Does this work for longer videos (5+ minutes)?
Theoretically yes, but: cost grows linearly, and narrative coherence beyond ~3 minutes is genuinely hard right now. We’ve seen creators stitch three 90-second arcs into 5-minute episodes. Pure 5-minute end-to-end is doable but more work than 90 seconds.
What if I can’t draw or photograph a reference image?
Generate one with an image AI (Midjourney, DALL-E, Stable Diffusion). Pick the result that best matches your character vision. Use that as your reference for the video step.
My platform doesn’t have character lock. Can I still do this?
You can, but expect to spend 3-5x the time on consistency cleanup. Workarounds:
- Use the same prompt verbatim for the character description in every shot
- Always include a reference image
- Generate 3 versions of each shot, pick the most consistent
- Plan to regenerate ~30% of shots when drift is too obvious
For narrative work, switching to a tool with native character consistency is usually worth it.
How much does this cost in credits / dollars?
Varies wildly by platform. On Juying, a 90-second project with 30 shots typically uses 200-400 credits, which is well within the free tier (500 credits/month) or trivial on Pro ($49/mo with 3000 credits).
On per-clip platforms, expect $5-30 per project depending on length and quality settings.
The thing nobody tells you
The 60-minute workflow is real, but most beginners’ first attempt takes 3-4 hours. The slowdown isn’t the AI; it’s:
- Spending too long on the script (just write something, iterate later)
- Picking a bad reference image (spend the 5 minutes to find a good one)
- Skipping the storyboard step (every shot becomes “wide medium shot”; the result feels flat)
- Regenerating everything (regenerate the worst 10%, leave the rest)
After 2-3 projects, the workflow compresses to under an hour. After 5 projects, you can do it in 40 minutes.
Try the workflow
Juying supports this entire workflow end-to-end with a free tier. If you make something with this workflow, we’d love to see it.