heygen-com · vanceingalls · Apr 8, 2026
diff --git a/packages/cli/src/commands/validate.ts b/packages/cli/src/commands/validate.ts
@@ -44,7 +44,7 @@ async function validateInBrowser(
     const runtimeSource = readFileSync(runtimePath, "utf-8");
     html = html.replace(
       /<script[^>]*data-hyperframes-preview-runtime[^>]*src="[^"]*"[^>]*><\/script>/,
-      `<script data-hyperframes-preview-runtime="1">${runtimeSource}</script>`,
+      () => `<script data-hyperframes-preview-runtime="1">${runtimeSource}</script>`,
     );
   }
 

diff --git a/skills/hyperframes/SKILL.md b/skills/hyperframes/SKILL.md
@@ -102,6 +102,14 @@ Video must be `muted playsinline`. Audio is always a separate `<audio>` element:
 
 **Synchronous timeline construction:** Never build timelines inside `async`/`await`, `setTimeout`, or Promises. The capture engine reads `window.__timelines` synchronously after page load. If you need fonts loaded first, use a synchronous `document.fonts.load()` call or rely on `font-display: block` — the engine waits for the page `load` event.
 
+**Centering text:** Use `width: 100%; left: 0; text-align: center` — NOT `left: 50%; transform: translateX(-50%)`. The second approach breaks when GSAP animates `x`, because GSAP's `x` compounds with the existing translateX.
+
+**GSAP `x` and `y` are RELATIVE offsets, not screen coordinates.** An element at CSS `left: 400px` with GSAP `x: 200` renders at 600px. Never use GSAP `x` values above 300 or below -300 for content slides — those are almost certainly absolute coordinates confused for relative offsets.
+
+**Preventing overlap:** Calculate the actual bottom edge of every element before placing the next one. Bottom edge = `top` + (`fontSize` × `lineHeight` × `numberOfLines`). A 200px font with `line-height: 1` and a `<br>` (2 lines) at `top: 350px` has its bottom at 350 + 200×1×2 = 750px — the next element must start at 770px+, not 610px. Always do this math. After placing all elements in a scene, check for collisions: list every element visible at the same timestamp with its top and bottom edge, and verify no ranges overlap.
+
+**Safe zones:** Keep text inset 40px from all edges. The bottom 80px is too close to playback controls — keep baselines above `data-height - 80`.
+
 **Never do:**
 
 1. Forget `window.__timelines` registration
@@ -160,6 +168,7 @@ Video must be `muted playsinline`. Audio is always a separate `<audio>` element:
 - **[references/captions.md](references/captions.md)** — Captions, subtitles, lyrics, karaoke synced to audio. Tone-adaptive style detection, per-word styling, text overflow prevention, caption exit guarantees, word grouping. Read when adding any text synced to audio timing.
 - **[references/tts.md](references/tts.md)** — Text-to-speech with Kokoro-82M. Voice selection, speed tuning, TTS+captions workflow. Read when generating narration or voiceover.
 - **[references/audio-reactive.md](references/audio-reactive.md)** — Audio-reactive animation: map frequency bands and amplitude to GSAP properties. Read when visuals should respond to music, voice, or sound.
+- **[references/audio-choreography.md](references/audio-choreography.md)** — Beat-mapped choreography: analyze audio energy/structure, time animation events to beats, match intensity to energy phases. Read when visuals should be synced to music structure without per-frame pulsing — reveals land on beats, transitions match energy changes.
 - **[references/marker-highlight.md](references/marker-highlight.md)** — Animated text highlighting via canvas overlays: marker pen, circle, burst, scribble, sketchout. Read when adding visual emphasis to text.
 - **[house-style.md](house-style.md)** — Default motion, sizing, and color palettes when no style is specified.
 - **[patterns.md](patterns.md)** — PiP, title cards, slide show patterns.

diff --git a/skills/hyperframes/references/audio-choreography.md b/skills/hyperframes/references/audio-choreography.md
@@ -0,0 +1,234 @@
+# Beat Choreography
+
+Sync animation to music structure. The audio analysis gives you timestamps and energy data. This reference teaches you how to think about that data — not as a list of triggers, but as a story with setup, tension, and payoff.
+
+## Every Line in the Analysis Must Produce a Visual Reaction
+
+This is non-negotiable. The audio analysis is a script. Every section, every line must have a corresponding visual response in the composition:
+
+- **STRUCTURE**: Every phase boundary is a visual gear change — palette shift, scale change, density change. If structure says HIGH at 7s, the composition must look and feel measurably different at 7s than at 6s.
+- **KEY MOMENTS**: Every surge gets a build sequence before it (compress → hold → release). Every drop gets a visual exhale. No key moment should pass without the viewer SEEING it happen.
+- **SILENCES**: Every silence is a breath — strip to near-black, hold, let the viewer absorb.
+- **BUILDS**: Every detected build must have visual tension rising in sync — gradient compressing, density accelerating, scale shrinking toward the peak. The build section tells you EXACTLY where to start the visual tension and where it peaks.
+- **ACCENT PATTERNS**: Every detected accent train or roll gets a matching rapid-fire visual — stagger cascade, per-character reveal, or pulse sequence at the same rate.
+- **BEATS**: Every beat is an opportunity. You don't have to use every beat for a content reveal, but the composition should acknowledge beats through SOMETHING — a content entrance, an exit, a pulse on a decorative element, a scale bump, a flash. Beats that pass with zero visual response are wasted sync opportunities.
+
+Read the analysis top to bottom before writing any code. Plan what happens at each structural moment FIRST, then fill in the beats between them.
+
+## Analyze the Audio
+
+Run this to extract the structure. Replace `audio.mp3` with your file.
+
+```bash
+python3 << 'EOF'
+import subprocess, math, array
+
+path = 'audio.mp3'
+sr = 22050
+dur = float(subprocess.run(['ffprobe','-v','quiet','-show_entries','format=duration','-of','csv=p=0',path], capture_output=True, text=True).stdout.strip())
+
+# 7-band analysis
+band_defs = [("sub",20,60),("bass",60,250),("low_mid",250,500),("mid",500,2000),("upper_mid",2000,4000),("high",4000,8000),("air",8000,16000)]
+quarter = int(sr * 0.25)
+band_curves = {}
+for name,lo,hi in band_defs:
+    filt = f"lowpass=f={hi}" if name=="sub" else (f"highpass=f={lo}" if name=="air" else f"highpass=f={lo},lowpass=f={hi}")
+    p2 = subprocess.run(['ffmpeg','-i',path,'-af',filt,'-ac','1','-ar',str(sr),'-f','s16le','-'], capture_output=True)
+    bp = array.array('h', p2.stdout)
+    vals = [math.sqrt(sum(x*x for x in bp[s:s+quarter])/max(len(bp[s:s+quarter]),1)) if len(bp[s:s+quarter])>50 else 0 for s in range(0,len(bp)-quarter,quarter)]
+    bmx = max(vals) if max(vals)>0 else 1
+    band_curves[name] = [v/bmx for v in vals]
+
+# Broadband
+p = subprocess.run(['ffmpeg','-i',path,'-ac','1','-ar',str(sr),'-f','s16le','-'], capture_output=True)
+pcm = array.array('h', p.stdout)
+broad = [math.sqrt(sum(x*x for x in pcm[s:s+quarter])/max(len(pcm[s:s+quarter]),1)) if len(pcm[s:s+quarter])>50 else 0 for s in range(0,len(pcm)-quarter,quarter)]
+bmx = max(broad) if broad else 1
+broad_n = [v/bmx for v in broad]
+
+# 1s energy for structure
+rms_1s = [math.sqrt(sum(x*x for x in pcm[s*sr:min((s+1)*sr,len(pcm))])/max(len(pcm[s*sr:min((s+1)*sr,len(pcm))]),1)) if len(pcm[s*sr:min((s+1)*sr,len(pcm))])>100 else 0 for s in range(int(dur))]
+mx1 = max(rms_1s) if rms_1s else 1
+norms = [r/mx1 for r in rms_1s]
+
+# Phases
+phases = []; cur,cs = None,0
+for i,n in enumerate(norms):
+    l = "VOID" if n<0.2 else ("LOW" if n<0.4 else ("MEDIUM" if n<0.65 else "HIGH"))
+    if l!=cur:
+        if cur: phases.append((cs,i,cur))
+        cur,cs = l,i
+if cur: phases.append((cs,len(norms),cur))
+
+# Character at timestamp
+def char_at(t):
+    idx = min(int(t/0.25), len(broad_n)-1)
+    v = {n: (band_curves[n][idx] if idx<len(band_curves[n]) else 0) for n,_,_ in band_defs}
+    parts = []
+    if v["sub"]>0.7: parts.append("sub")
+    if v["bass"]>0.6: parts.append("bass")
+    if v["low_mid"]>0.5: parts.append("warm")
+    if v["mid"]>0.5: parts.append("vocal/melody")
+    if v["upper_mid"]>0.5: parts.append("presence")
+    if v["high"]>0.5: parts.append("bright")
+    if v["air"]>0.4: parts.append("airy")
+    if not parts: parts.append("thin" if any(x>0.2 for x in v.values()) else "silence")
+    has_bot = v["sub"]>0.5 or v["bass"]>0.5
+    has_top = v["high"]>0.5 or v["air"]>0.4
+    has_mid = v["mid"]>0.4 or v["upper_mid"]>0.4
+    feel = "full" if has_bot and has_top and has_mid else ("heavy" if has_bot and not has_top else ("bright" if has_top and not has_bot else ("intimate" if has_mid else ("scooped" if has_bot and has_top else "sparse"))))
+    return ", ".join(parts), feel
+
+# Onsets — sensitive then deduplicate
+blk = 512
+ef = [math.sqrt(sum(x*x for x in pcm[i:i+blk])/blk) for i in range(0,len(pcm)-blk,blk)]
+mu=sum(ef)/len(ef); sig=math.sqrt(sum((x-mu)**2 for x in ef)/len(ef))
+raw_ons = []
+for i,e in enumerate(ef):
+    if e > mu+1.0*sig and (not raw_ons or i-raw_ons[-1]>3): raw_ons.append(i)
+raw_beats = [(i*blk/sr, ef[i]) for i in raw_ons]
+beats = []; i = 0
+while i < len(raw_beats):
+    t,a = raw_beats[i]
+    if i+1<len(raw_beats) and raw_beats[i+1][0]-t < 0.15:
+        beats.append(raw_beats[i] if a>=raw_beats[i+1][1] else raw_beats[i+1]); i += 2
+    else: beats.append(raw_beats[i]); i += 1
+moa = max(a for _,a in beats) if beats else 1
+iois = [beats[i+1][0]-beats[i][0] for i in range(len(beats)-1)] if len(beats)>4 else []
+bpm = round(60/sorted(iois)[len(iois)//2],1) if iois else None
+
+# Silences
+sp=subprocess.run(['ffmpeg','-i',path,'-af','silencedetect=n=-35dB:d=0.5','-f','null','-'],capture_output=True,text=True)
+sils=[]
+for line in sp.stderr.split('\n'):
+    if 'silence_start' in line:
+        try: sils.append(('s',float(line.split('silence_start: ')[1].strip())))
+        except: pass
+    elif 'silence_end' in line:
+        try: sils.append(('e',float(line.split('silence_end: ')[1].split('|')[0].strip())))
+        except: pass
+
+# Output
+print(f"AUDIO ANALYSIS — {dur:.1f}s\n{'='*60}\n")
+print(f"RHYTHM — {bpm} BPM, {len(beats)} beats")
+print(f"\nSTRUCTURE")
+for s,e,l in phases:
+    bd, feel = char_at((s+e)/2)
+    print(f"  {s:3d}-{e:3d}s  {l:6s}  {feel:8s}  ({bd})")
+print(f"\nKEY MOMENTS")
+mom=[(i,norms[i]-norms[i-1]) for i in range(1,len(norms)) if abs(norms[i]-norms[i-1])>0.12]
+for s,d in sorted(mom,key=lambda x:abs(x[1]),reverse=True)[:8]:
+    print(f"  {s:3d}s  {'DROP' if d<0 else 'SURGE'}  Δ{d:+.2f}")
+if sils:
+    print(f"\nSILENCES")
+    i=0
+    while i<len(sils):
+        if sils[i][0]=='s':
+            s=sils[i][1]; e=sils[i+1][1] if i+1<len(sils) and sils[i+1][0]=='e' else s+1
+            print(f"  {s:.1f}s - {e:.1f}s"); i+=2
+        else: i+=1
+# Build detection — gradual rises ending in sharp drops
+print(f"\nBUILDS (suspense → release)")
+half = int(sr * 0.5)
+all_bands = {"broadband": broad_n}
+for name,lo,hi in band_defs:
+    filt = f"lowpass=f={hi}" if name=="sub" else (f"highpass=f={lo}" if name=="air" else f"highpass=f={lo},lowpass=f={hi}")
+    p3 = subprocess.run(['ffmpeg','-i',path,'-af',filt,'-ac','1','-ar',str(sr),'-f','s16le','-'], capture_output=True)
+    bp3 = array.array('h', p3.stdout)
+    hv = [math.sqrt(sum(x*x for x in bp3[s:s+half])/max(len(bp3[s:s+half]),1)) if len(bp3[s:s+half])>50 else 0 for s in range(0,len(bp3)-half,half)]
+    hmx = max(hv) if max(hv)>0 else 1
+    all_bands[name] = [v/hmx for v in hv]
+
+builds = []
+for bname, curve in all_bands.items():
+    i = 0
+    while i < len(curve) - 3:
+        start = i
+        while i < len(curve)-1 and curve[i+1] >= curve[i]-0.05: i += 1
+        peak_idx = i; duration = (peak_idx-start)*0.5
+        if duration >= 1.5 and peak_idx < len(curve)-1:
+            drop = curve[peak_idx] - curve[min(peak_idx+2, len(curve)-1)]
+            rise = curve[peak_idx] - curve[start]
+            score = duration * rise * drop
+            if score > 0.2:
+                builds.append((start*0.5, peak_idx*0.5, duration, bname, rise, drop, curve[peak_idx], score))
+        i += 1
+
+# Deduplicate by peak time (within 2s), keep highest score
+builds.sort(key=lambda x: x[7], reverse=True)
+seen = set()
+for b in builds:
+    peak_key = round(b[1] / 2) * 2
+    if peak_key not in seen:
+        seen.add(peak_key)
+        print(f"  {b[0]:5.1f}s → {b[1]:5.1f}s  {b[2]:.1f}s build in {b[3]}, rises {b[4]:+.2f} → {b[6]:.2f}, drops {b[5]:.2f}")
+
+# Accent pattern detection (mid-to-high range rhythmic trains)
+print(f"\nACCENT PATTERNS")
+for filt_label, filt_str in [("upper_mid+high","highpass=f=2000,lowpass=f=8000"),("high+air","highpass=f=4000")]:
+    pa = subprocess.run(['ffmpeg','-i',path,'-af',filt_str,'-ac','1','-ar',str(sr),'-f','s16le','-'], capture_output=True)
+    ap = array.array('h', pa.stdout)
+    ablk = int(sr*0.03)
+    ae = [math.sqrt(sum(x*x for x in ap[s:s+ablk])/ablk) for s in range(0,len(ap)-ablk,ablk)]
+    amx = max(ae) if ae else 1
+    an = [e/amx for e in ae]
+    aw = 15
+    at = []
+    for ai in range(aw, len(an)-aw):
+        la = sum(an[max(0,ai-aw):ai+aw]) / min(2*aw, ai+aw-max(0,ai-aw))
+        if an[ai] > la+0.15 and an[ai] > 0.25:
+            tt = round(ai*0.03, 2)
+            if not at or tt-at[-1][0] > 0.06: at.append((tt, an[ai]))
+    # Find regular sequences
+    pats = []
+    for si in range(len(at)):
+        for ln in range(4, min(15, len(at)-si+1)):
+            grp = at[si:si+ln]; ts = [t for t,_ in grp]
+            gaps = [ts[i+1]-ts[i] for i in range(len(ts)-1)]
+            ag = sum(gaps)/len(gaps)
+            if ag < 0.01: continue
+            sg = math.sqrt(sum((g-ag)**2 for g in gaps)/len(gaps))
+            reg = 1.0-min(sg/ag, 1.0)
+            if reg > 0.6 and ln >= 4:
+                pats.append((ts[0], ts[-1], ln, 60/ag, reg, filt_label, ts))
+    pats.sort(key=lambda x: x[2], reverse=True)
+    seen = set()
+    for pt in pats[:5]:
+        tk = round(pt[0])
+        if tk in seen: continue
+        seen.add(tk)
+        ptype = "roll" if pt[3]>300 else ("rapid fill" if pt[3]>150 else ("accent train" if pt[3]>80 else "rhythmic hits"))
+        out_line = f"  {pt[0]:5.2f}s - {pt[1]:5.2f}s  {pt[2]} hits at {pt[3]:.0f}/min  ({ptype}) in {pt[5]}"
+        if pt[2] <= 8: out_line += f"  [{', '.join(f'{t:.2f}' for t in pt[6])}]"
+        print(out_line)
+
+print(f"\nBEATS ({len(beats)})")
+for t, amp in beats:
+    energy = broad_n[min(int(t/0.25), len(broad_n)-1)]
+    bd, feel = char_at(t)
+    print(f"  {t:6.2f}s  energy={energy:.2f}  hit={amp/moa:.2f}  {feel:8s}  {bd}")
+EOF
+```
+
+## Build Before Every Drop
+
+Every key moment in the analysis needs a setup phase. Before you plan what happens AT the surge, plan what happens in the 4-6 seconds LEADING TO it.
+
+Scale the build to the moment's importance:
+
+- First surge (always the most important): Full 4-6s build
+- Biggest surge (Δ > 0.4): Full 4-6s build
+- Medium surge (Δ 0.2-0.4): 2-3s compression
+- Minor surge (Δ 0.12-0.2): 1s tightening
+
+## Frequency Bands Shape the Feel
+
+Sub-bass entering doesn't just mean "louder."
+
+When sub-bass drops out at 22s but highs stay, everything should feel weightless and bright.
+
+When highs peak and bass is absent — maximum visual sharpness.
+
+When bass and sub-bass dominate with no highs — heavy, warm, submerged. The composition should feel like it has gravity.
+
+The band data tells you HOW a moment should feel, not just WHEN it happens.