Tiered Verification: When 'Failed' Is the Wrong Answer
The all-or-nothing verifier was failing jobs that produced visually perfect output. Here's how three verdicts replaced a binary gate.
The float epsilon fix (previous post) solved the spurious scale step. But it exposed a second problem: some jobs with slightly degraded pixel match rates were being hard-failed even when the output was visually perfect. What should happen when a video is watchable but its pixel match rate is 0.73 instead of 0.90?
The old design
Pass or fail. Two retry attempts. If the third attempt fails, the job is marked failed and the user loses their work. This is the right design when failure means "the file is corrupted" — but wrong when failure means "we delivered a video that's 99% spatially accurate."
Three verdicts
The new verifier produces one of three outcomes:
- VERIFIED_SPATIAL_CROP — all tests pass. Every pixel is a provable source copy.
- DELIVERED_WITH_QUALITY_FLAGS — blocking tests pass; quality metrics are in the warn band (between floor and warn threshold). The user gets their video, but the quality data is attached.
- VERIFICATION_FAILED — any blocking test fails, or quality metrics are below the hard floor. Job is marked failed.
Test classification
Tests are now typed by severity:
- Gate (short-circuit):
test_0_playability— can ffprobe decode the file and is duration > 0? If not, nothing else matters. - Blocking: dimensions, duration parity, audio preservation. Any failure → hard fail.
- Quality (tiered): pixel match rate, sharpness ratio. These have both a warn threshold and a hard floor.
Four thresholds in config.yaml
| Threshold | Value | Meaning |
|---|---|---|
| min_pixel_match_rate | 0.90 | warn threshold — above this, VERIFIED |
| pixel_match_rate_floor | 0.70 | hard floor — below this, FAILED |
| min_sharpness_ratio | 0.75 | warn threshold |
| sharpness_ratio_floor | 0.60 | hard floor |
The quality flags pipeline
verify() always returns quality_metrics. The entrypoint writes a quality_flags_<ratio>.json per output ratio, reads them back, and includes them in the complete callback to the API. A new quality_flags JSON column on the jobs table stores them. GET /api/jobs/{id} exposes the data.
Why this matters architecturally
The system is moving from gatekeeping to observability. Every delivered job now carries a quantitative quality signal. In future iterations, these metrics can drive: alerting on systematic quality degradation, A/B testing of effect classifier improvements, and data-driven floor adjustments as the platform processes more diverse content.
The verifier isn't just a pass/fail gate anymore — it's a continuous measurement instrument.