← All posts
Engineering·

The Pipeline: Nine Modules, One Hard Constraint

Before any cloud infrastructure, before any API — there was a Python pipeline with one rule: every output pixel must be a direct copy of a source pixel.


The project starts from a single hard constraint that shapes every design decision that follows: every output pixel must be a direct copy of a source pixel. No upscaling. No padding. No compositing. If you can't produce a perfect crop, you fail the job.

The practical problem: social video is made in 16:9 (landscape) and consumed in 9:16 (portrait — TikTok, Reels, Shorts) and 1:1 (Instagram grid). Manual reframing is expensive and slow. Automated reframing almost universally upscales or pads when the crop would be too tight, silently trading spatial integrity for geometric convenience. This pipeline refuses that trade.

The nine modules

#ModuleWhat it does
1ingestion.pyProbes source with ffprobe. Validates landscape orientation, computes even-floor crop spec.
2attention.pyPer-frame attention vector. Weighted sum: face centroid (MediaPipe), pose, optical flow, YOLO objects, audio energy, Whisper speech.
3scene_detection.pyPySceneDetect-based scene boundary detection.
4effect_classifier.pyClassifies each scene: STATIC, BREATHE, PAN_FOLLOW, FAST_CUT, INTERVIEW.
5motion_solver.pyPer-frame crop X + zoom level. Smooths attention, solves temporal coherence, clips zoom to [1.0, ceiling].
6edl_generator.pyWrites Edit Decision List before rendering. Makes pipeline resumable with --from-edl.
7renderer.pyFFmpeg filter graph with per-frame sendcmd temp file.
8verifier.pyNon-bypassable. Tests: playability, dimensions, pixel provenance, sharpness, duration parity, audio.
9narrative_validator.pyPost-delivery validation of temporal coherence of the crop trajectory.

The even-floor formula

The crop width formula is: int(math.floor(h × 9/16) // 2 × 2). The double integer truncation (floor, then round down to even) means crop_w can be 2 pixels narrower than naive rounding would produce. The verifier had to be updated to accept this — it initially rejected every 1080p output because the AR check expected exact aspect ratio and the formula introduces a 1–2 pixel deviation.

The FFmpeg sendcmd trick

The per-frame crop position is written to a temp file as sendcmd instructions rather than encoded in the filter graph as FFmpeg expressions. At 30fps × 120 seconds = 3,600 frames, FFmpeg's expression parser hits a recursive stack limit — approximately one if node per frame, which overflows at ~3,600 nesting levels.

The sendcmd approach writes one line per frame to a file and uses sendcmd=filename=... — entirely bypassing the expression evaluator. This was the first and most fundamental correctness bug in the pipeline.

The non-bypassable verifier

Module 8 runs after every render, checks a random sample of output frames against their source positions, and fails the job if the match rate falls below threshold. It cannot be disabled by configuration or flag. The only way to skip it is to delete it from the source code.

This is an architectural choice, not a developer preference. A pipeline that can be told to skip its integrity check has no integrity guarantee.