Documentation

Everything you need to run multi-model pipelines with an independent referee.

Quick Start

Get your first pipeline running in under three minutes.

# Install CRTX
pip install crtx

# Set your API keys (use the providers you have)
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=AIza...
export XAI_API_KEY=xai-...

# Run your first pipeline
crtx run \
  "Build a REST API with JWT auth and rate limiting" \
  --arbiter bookend \
  --route quality_first

That's it. CRTX selects the best model for each pipeline stage, runs the Architect → Implement → Refactor → Verify sequence, and the Arbiter independently reviews the first and last stage outputs.

Installation

CRTX requires Python 3.12 or higher.

# Standard install
pip install crtx

# With the optional dashboard (real-time web visualization)
pip install crtx[dashboard]

# Verify installation
crtx --version
crtx --help

CRTX is a BYOK (Bring Your Own Keys) tool. You need at least one API key from a supported provider. The more providers you configure, the more options the routing engine has for model selection.

Configuration

CRTX uses TOML configuration files. Create a crtx.toml in your project root or pass --config path/to/config.toml.

# crtx.toml — Example configuration

[pipeline]
mode = "sequential"          # sequential | parallel | debate
arbiter = "bookend"          # bookend | full | final | off

[routing]
strategy = "quality_first"   # quality_first | cost_optimized | speed_first | hybrid
min_fitness = 0.6            # Minimum fitness score to consider a model

[models]
# Override default model assignments per stage
architect = "gemini-2.5-pro"
implement = "gpt-4o"
refactor = "claude-opus-4.6"
arbiter = "grok-4"

[arbiter]
max_retries = 2              # Max retries on REJECT verdict
inject_flags = true          # Inject FLAG warnings into next stage

[domain_rules]
# Custom rules the Arbiter enforces (see Domain Rules section)
rules = [
  "All database operations must use connection pooling",
  "All API endpoints must validate input with Pydantic models",
  "Error responses must use RFC 7807 problem details format",
]

CLI Reference

crtx run

Execute a pipeline. This is the primary command.

crtx run [TASK] [OPTIONS]

Required:
  TASK                    Task description (positional argument)

Options:
  --mode TEXT              Pipeline mode: sequential, parallel, debate
                           [default: sequential]
  --arbiter TEXT           Arbiter mode: bookend, full, final, off
                           [default: bookend]
  --route TEXT             Routing strategy: quality_first, cost_optimized,
                           speed_first, hybrid [default: quality_first]
  --models TEXT            Comma-separated model list (for parallel/debate)
  --context-dir PATH       Project directory for context injection
  --context-budget INT     Max tokens for injected context [default: 20000]
  --apply                  Enable apply mode (write to codebase)
  --confirm                Actually write files (requires --apply)
  --branch TEXT            Create git branch before applying
  --apply-include TEXT     Glob patterns for files to write
  --apply-exclude TEXT     Glob patterns for files to skip
  --rollback-on-fail       Auto-revert if post-apply tests fail [default: true]
  --test-command TEXT      Test command to run after apply
  --no-stream              Disable streaming display
  --config PATH            Path to TOML config file
  --output PATH            Save pipeline output to file
  --verbose                Show detailed pipeline execution logs
  --dry-run                Show model assignments without running
  --help                   Show this message and exit

crtx models

List available models and their fitness scores for each stage.

crtx models [OPTIONS]

Options:
  --route TEXT          Show scores for a specific routing strategy
  --stage TEXT             Filter to a specific stage
  --help                   Show this message and exit

crtx dashboard

Start the real-time web dashboard. Requires the dashboard optional dependency.

crtx dashboard [OPTIONS]

Options:
  --port INT               Server port [default: 8420]
  --no-browser             Don't auto-open browser
  --help                   Show this message and exit

crtx repl

Start an interactive REPL session. Run multiple pipelines without restarting, with session history and tab completion.

crtx repl [OPTIONS]

Options:
  --mode TEXT              Default pipeline mode
  --arbiter TEXT           Default arbiter mode
  --route TEXT             Default routing strategy
  --context-dir PATH       Project directory for context injection
  --help                   Show this message and exit

crtx setup

Interactive setup wizard. Configures API keys, tests provider connectivity, and creates your initial crtx.toml.

crtx setup

Pipeline Modes

CRTX supports five pipeline modes. Each is suited to a different class of problem.

Sequential

The default mode. Tasks flow through four stages linearly, each building on the previous output.

Architect→Implement→Refactor→Verify

The Architect designs the solution structure. The Implementer writes code against that scaffold. The Refactorer improves quality and adds tests. The Verifier validates the complete output. The Arbiter reviews at configured checkpoints.

crtx run "Build a user auth service" --mode sequential

Best for: standard development tasks, feature implementation, bug fixes, refactoring.

Parallel

Fan the same task out to multiple models simultaneously. Each model produces an independent solution, then cross-reviews the others' work.

The cross-review scoring evaluates each output on architecture (1-10), implementation quality (1-10), and correctness (1-10). The highest-scoring output wins. A synthesis step then merges the best ideas from all outputs into one.

crtx run \
  "Design a caching layer with TTL and invalidation" \
  --mode parallel \
  --models claude-opus,gpt-4o,gemini-pro

Best for: tasks where the approach matters more than speed — data pipelines, complex algorithms, system design.

Debate

Two models take opposing positions on a question. Each writes a proposal, rebuts the other's position, then makes a final argument incorporating the criticisms. A third model serves as judge.

Propose→Rebut→Final Argument→Judgment

crtx run \
  "Microservices vs monolith for our API gateway" \
  --mode debate

Best for: architectural decisions, technology selection, design tradeoffs. The structured adversarial reasoning produces insights that neither model would generate alone.

Review

Multi-model code analysis with cross-review. Multiple models independently review your code for issues — security vulnerabilities, performance problems, architectural concerns — then cross-review each other's findings. The Arbiter synthesizes all reviews into a unified report.

crtx review-code src/

# With specific focus
crtx review-code src/ --focus security,performance

Best for: code review, security audits, pre-merge quality checks, compliance verification.

Improve

Generate improvements, vote on best, synthesize. Models independently propose improvements to your code, then vote on which changes are most impactful. The best improvements are synthesized into a single changeset.

crtx improve src/

# Apply improvements directly
crtx improve src/ --apply --confirm

Best for: refactoring, optimization, code quality improvement, technical debt reduction.

The Arbiter

The Arbiter is an independent model that reviews pipeline stage outputs. It always uses a different model from the one that produced the work — no model ever grades itself. This cross-model enforcement is the core of CRTX's quality assurance.

Verdicts

The Arbiter produces one of four verdicts for each review:

APPROVEOutput meets all criteria. Pipeline continues to the next stage.

FLAGNon-blocking warnings detected. Warnings are automatically injected as context into the next pipeline stage so the next model addresses them.

REJECTOutput fails quality checks. The stage retries with specific feedback from the Arbiter. Maximum of 2 retries before escalation.

HALTCritical issue detected. The entire pipeline stops immediately. Requires human review.

Arbiter Modes

Control when the Arbiter reviews:

bookendReviews after the first stage (Architect) and the last stage (Verify). Best balance of quality and cost.

fullReviews after every stage. Maximum quality assurance, higher cost.

finalReviews only the final output. Lowest overhead.

offNo Arbiter reviews. Not recommended for production work.

# Bookend (default — reviews first and last)
crtx run "..." --arbiter bookend

# Full (reviews every stage)
crtx run "..." --arbiter full

# Final only
crtx run "..." --arbiter final

Smart Routing

CRTX's routing engine uses fitness scores to assign the best available model to each pipeline stage. Fitness scores are computed based on the model's strengths, the stage requirements, and the selected strategy.

Routing Strategies

quality_firstPicks the highest-scoring model for each stage regardless of cost. Best output quality.

cost_optimizedMinimizes total pipeline cost while maintaining a minimum fitness threshold.

speed_firstSelects the fastest models. Useful for iteration and prototyping.

hybridBalanced approach — weights quality, cost, and speed equally.

# See model fitness scores for your configured providers
crtx models --route quality_first

# Dry run to see assignments without executing
crtx run "..." --route cost_optimized --dry-run

Domain Rules

Domain rules are custom enforcement criteria the Arbiter checks against every stage output. Define them in your crtx.toml to enforce your team's standards.

[domain_rules]
rules = [
  "All database operations must use connection pooling",
  "All API endpoints must validate input with Pydantic models",
  "Error responses must use RFC 7807 problem details format",
  "All async operations must have timeout handling",
  "SQL queries must use parameterized statements, never string formatting",
]

The Arbiter evaluates each rule against the stage output and includes violations in its verdict. FLAG verdicts from domain rule violations are injected into the next stage so the model can address them.

Supported Models

CRTX currently supports models from four providers:

AnthropicClaude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5

OpenAIGPT-4o, GPT-4o-mini, o3, o3-mini

GoogleGemini 2.5 Pro, Gemini 2.5 Flash

xAIGrok 4, Grok 3

You only need API keys for the providers you want to use. CRTX works with a single provider, but the routing engine benefits from having multiple options.

API Keys

Set your API keys as environment variables. CRTX auto-detects which providers are available.

# Set in your shell profile or .env file
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GOOGLE_API_KEY=AIza...
export XAI_API_KEY=xai-...

# Or pass via config
# crtx.toml
[keys]
anthropic = "sk-ant-..."
openai = "sk-..."
google = "AIza..."
xai = "xai-..."

Never commit API keys to version control. Use environment variables or a .env file (add it to your .gitignore).

Context Injection

CRTX can scan your project directory and inject relevant code context into the pipeline. Models see your actual codebase — imports, patterns, conventions — and generate code that fits.

# Inject project context (scans Python files by default)
crtx run "Add error handling to the API routes" \
  --context-dir ./backend

# Custom budget (tokens allocated to context)
crtx run "Generate missing tests" \
  --context-dir . \
  --context-budget 32000

# With file filters
crtx run "Refactor the auth module" \
  --context-dir ./src \
  --include "*.py" \
  --exclude "**/tests/**"

The context injector uses AST scanning to extract file signatures, class definitions, function signatures, imports, and docstrings. The top 10 most relevant files are included in full; remaining files contribute signatures only. Budget defaults to 20,000 tokens.

Apply Mode

Apply mode writes pipeline output directly to your codebase with mandatory safety checks. Instead of copying files from crtx-output/, the pipeline resolves file paths and writes them in place.

Safe Direct Write (Phase 1)

The basic apply flow: resolve paths, check git state, preview diffs, write files, run tests, rollback on failure.

# Preview what would be written (dry run)
crtx run "Add error handling" \
  --context-dir ./backend \
  --apply

# Actually write after confirmation
crtx run "Add error handling" \
  --context-dir ./backend \
  --apply --confirm

# Write to a new branch with post-apply testing
crtx run "Refactor auth" \
  --context-dir ./backend \
  --apply --confirm \
  --branch crtx/refactor-auth \
  --test-command "pytest tests/ -q" \
  --rollback-on-fail

Safety gates run before any file is written: git repo required, dirty tree warning, protected branch blocking (main/master), arbiter REJECT verdict blocking, and file conflict detection for files modified since the context scan.

Intelligent Patching (Phase 2)

For existing files, CRTX can apply surgical edits using AST-aware structured patches instead of whole-file replacement. Patch anchors use function signatures and class names rather than line numbers, so they work even if the file has changed.

Seven patch operations are supported: insert_after, insert_before, replace, delete, insert_import, insert_method, and wrap. Post-patch validation checks syntax, import preservation, and signature integrity.

Streaming UI

By default, CRTX renders a real-time multi-panel display showing code as it generates, arbiter reasoning as it reviews, and running cost per model.

The display has four panels: pipeline progress (stage indicators with completion percentage), live output (syntax-highlighted code or refactor diffs), activity log (timestamped events with arbiter feedback), and cost ticker (per-model token counts and costs).

# Disable streaming (use classic status table)
crtx run "Build an API" --no-stream

Streaming is enabled by default for sequential pipelines in interactive terminals (80x24 minimum). It falls back to the classic display for non-interactive environments, parallel/debate modes, or when --no-stream is passed.

REPL Mode

Start an interactive session to run multiple pipelines without restarting. Session history and tab completion are built in.

crtx repl

# With defaults
crtx repl --mode sequential --arbiter bookend --context-dir ./src

Inside the REPL, type your task directly — no quotes needed. Use /mode, /arbiter, and /route commands to change settings mid-session. /history shows past runs.

Session History

Every pipeline run is persisted to a local SQLite database with full metadata: task, model assignments, arbiter verdicts, costs, tokens, and timing.

# List recent sessions
crtx history

# View a specific session
crtx history show <session-id>

# Replay a session (re-run with same config)
crtx replay <session-id>

Auto-Fallback

When a model provider is unavailable (rate limits, outages, timeouts), CRTX automatically falls back to the next-best model by fitness score. Provider health is tracked per-session — once a model fails, it is skipped immediately for subsequent stages.

In testing, CRTX completed a full pipeline with both Google Gemini and Claude Opus simultaneously down, falling back to o3 and Claude Sonnet automatically. No configuration needed — fallback is always active.