pragmatiq
Reference

Acceptance gates

The eight phase gates that keep pragmatiq honest — what each asserts and how to run it.

pragmatiq is built in eight phases, each with an acceptance gate — an executable script under scripts/gates/ that must pass before the next phase begins. Gates are the reproducible contract: they run the real pipeline and assert the property that phase promises.

GateAsserts
gate_1Synthetic generator acceptance — realism + byte-identical determinism.
gate_2Key–value–time tokenizer — round-trip, [UNK] fallbacks, vocab range.
gate_3Sharding + varlen collation — packed (varlen) attention ≡ padded attention.
gate_4Model — the encoder stack and tied MLM head match the spec.
gate_5End-to-end training — the probe beats the same-classifier raw-count baseline.
gate_6AML GNN ablation — a graph over transfers recovers signal an isolated probe misses.
gate_7Inference / serving / demo.
gate_8Polish — nano end-to-end on CPU, validation, packaging + attribution.

Running them

bash scripts/gates/gate_5.sh                 # CI scale (small N, CPU, minutes)
PRAGMATIQ_GATE_FULL=1 bash scripts/gates/gate_5.sh   # full scale (GPU box)

Each gate is runnable standalone and honors PRAGMATIQ_GATE_FULL=1 for full-scale runs. The CI-scale defaults run on a CPU in minutes; the full-scale variants (50k–100k users) are intended for a GPU host — see Pretrain (and scale).

Never start phase N+1 with a red gate N

The gates are ordered. A red gate means the property it guards is broken; the fix comes before any new work on top of it.

On this page