pragmatiq
Design decisions

Paper fidelity

Exactly what follows the PRAGMA paper, and what pragmatiq adds on top.

pragmatiq's goal is not novelty over PRAGMA — it is to make the implementation path concrete and honest. This page is the explicit ledger of what is the paper's and what is ours.

ComponentSource
Key–value–time tokenization + 8·ln(1+Δt/8) time transformPaper
TimeRoPE on continuous log-seconds (profile = since milestone; history = to last event)Paper
Profile / event / history encoders; tied 3d MLM head + label smoothingPaper
Masking 15% token / 10% event / 10% key, 10% [UNK]-as-dropoutPaper
Model sizes 10M / 100M / 1BPaper
Pre-training caps (event ≤24, profile ≤200, ≤6500 events/user)Paper
PRAGMA+Nemotron frozen-text-embedding variant (MSE)Paper · opt-in
Synthetic data generatorOur addition
AML transfer-graph GraphSAGE ablationOur addition
Gradient-boosting downstream probeOur default
nano CPU/CI model sizeOur addition

How fidelity is kept honest

  • The core representation above is implemented to match the paper and reviewed against a single internal spec that every module is checked against.
  • Where the paper is silent on an engineering detail, pragmatiq picks a default, exposes it in config, and records it — see the Decisions log.
  • Where pragmatiq goes beyond the paper (the generator, the AML graph), it is presented standalone and labelled, never folded into a fidelity claim.

The paper uses real Revolut data; pragmatiq cannot, so the synthetic generator stands in. That is the single largest gap between this implementation and the paper's results, and it is why the headline numbers here are about signal recovery on synthetic data, not about matching the paper's absolute metrics.

pragmatiq is an independent implementation inspired by the PRAGMA paper (arXiv 2604.08649) and is not affiliated with or endorsed by Revolut.

On this page