Design decisions
Paper fidelity
Exactly what follows the PRAGMA paper, and what pragmatiq adds on top.
pragmatiq's goal is not novelty over PRAGMA — it is to make the implementation path concrete and honest. This page is the explicit ledger of what is the paper's and what is ours.
| Component | Source |
|---|---|
Key–value–time tokenization + 8·ln(1+Δt/8) time transform | Paper |
| TimeRoPE on continuous log-seconds (profile = since milestone; history = to last event) | Paper |
| Profile / event / history encoders; tied 3d MLM head + label smoothing | Paper |
Masking 15% token / 10% event / 10% key, 10% [UNK]-as-dropout | Paper |
| Model sizes 10M / 100M / 1B | Paper |
| Pre-training caps (event ≤24, profile ≤200, ≤6500 events/user) | Paper |
| PRAGMA+Nemotron frozen-text-embedding variant (MSE) | Paper · opt-in |
| Synthetic data generator | Our addition |
| AML transfer-graph GraphSAGE ablation | Our addition |
| Gradient-boosting downstream probe | Our default |
nano CPU/CI model size | Our addition |
How fidelity is kept honest
- The core representation above is implemented to match the paper and reviewed against a single internal spec that every module is checked against.
- Where the paper is silent on an engineering detail, pragmatiq picks a default, exposes it in config, and records it — see the Decisions log.
- Where pragmatiq goes beyond the paper (the generator, the AML graph), it is presented standalone and labelled, never folded into a fidelity claim.
The paper uses real Revolut data; pragmatiq cannot, so the synthetic generator stands in. That is the single largest gap between this implementation and the paper's results, and it is why the headline numbers here are about signal recovery on synthetic data, not about matching the paper's absolute metrics.
pragmatiq is an independent implementation inspired by the PRAGMA paper (arXiv 2604.08649) and is not affiliated with or endorsed by Revolut.