Blog
Benchmarks, architecture decisions, and honest post-mortems from the team building CRTX.
Case StudyFebruary 2026
How We Killed Our Own Pipeline and Built the Loop
We built a multi-model pipeline, benchmarked it at 88% and $5.59/run, and rebuilt from scratch. The Loop scores 99% at $1.80/run.
Case StudyFebruary 2026
What Happens When You Stop Trusting One Model With Your Code
We pointed CRTX at a 265,000-line production codebase. Four models collaborated. Two providers went down. The pipeline completed anyway.
ArchitectureComing soon
Why Every Output Needs a Test Runner, Not Just a Review
Model-based review is unreliable. A test runner doesn't care how convincing the code looks — it either works or it doesn't.
GuideComing soon
Three-Tier Gap Closing: When the Fix Cycle Stalls
Diagnosis, minimal context, model escalation — how CRTX resolves stubborn test failures without giving up.