← All posts
Engineering·

Tiered Verification: When 'Failed' Is the Wrong Answer

The all-or-nothing verifier was failing jobs that produced visually perfect output. Here's how three verdicts replaced a binary gate.


The float epsilon fix (previous post) solved the spurious scale step. But it exposed a second problem: some jobs with slightly degraded pixel match rates were being hard-failed even when the output was visually perfect. What should happen when a video is watchable but its pixel match rate is 0.73 instead of 0.90?

The old design

Pass or fail. Two retry attempts. If the third attempt fails, the job is marked failed and the user loses their work. This is the right design when failure means "the file is corrupted" — but wrong when failure means "we delivered a video that's 99% spatially accurate."

Three verdicts

The new verifier produces one of three outcomes:

  • VERIFIED_SPATIAL_CROP — all tests pass. Every pixel is a provable source copy.
  • DELIVERED_WITH_QUALITY_FLAGS — blocking tests pass; quality metrics are in the warn band (between floor and warn threshold). The user gets their video, but the quality data is attached.
  • VERIFICATION_FAILED — any blocking test fails, or quality metrics are below the hard floor. Job is marked failed.

Test classification

Tests are now typed by severity:

  • Gate (short-circuit): test_0_playability — can ffprobe decode the file and is duration > 0? If not, nothing else matters.
  • Blocking: dimensions, duration parity, audio preservation. Any failure → hard fail.
  • Quality (tiered): pixel match rate, sharpness ratio. These have both a warn threshold and a hard floor.

Four thresholds in config.yaml

ThresholdValueMeaning
min_pixel_match_rate0.90warn threshold — above this, VERIFIED
pixel_match_rate_floor0.70hard floor — below this, FAILED
min_sharpness_ratio0.75warn threshold
sharpness_ratio_floor0.60hard floor

The quality flags pipeline

verify() always returns quality_metrics. The entrypoint writes a quality_flags_<ratio>.json per output ratio, reads them back, and includes them in the complete callback to the API. A new quality_flags JSON column on the jobs table stores them. GET /api/jobs/{id} exposes the data.

Why this matters architecturally

The system is moving from gatekeeping to observability. Every delivered job now carries a quantitative quality signal. In future iterations, these metrics can drive: alerting on systematic quality degradation, A/B testing of effect classifier improvements, and data-driven floor adjustments as the platform processes more diverse content.

The verifier isn't just a pass/fail gate anymore — it's a continuous measurement instrument.