Blog

Benchmarks, architecture decisions, and honest post-mortems from the team building CRTX.

Case StudyFebruary 2026

How We Killed Our Own Pipeline and Built the Loop

We built a multi-model pipeline, benchmarked it at 88% and $5.59/run, and rebuilt from scratch. The Loop scores 99% at $1.80/run.

Case StudyFebruary 2026

What Happens When You Stop Trusting One Model With Your Code

We pointed CRTX at a 265,000-line production codebase. Four models collaborated. Two providers went down. The pipeline completed anyway.

ArchitectureComing soon

Why Every Output Needs a Test Runner, Not Just a Review

Model-based review is unreliable. A test runner doesn't care how convincing the code looks — it either works or it doesn't.

GuideComing soon

Three-Tier Gap Closing: When the Fix Cycle Stalls

Diagnosis, minimal context, model escalation — how CRTX resolves stubborn test failures without giving up.