Quickstart

Install pragmatiq and run the full pipeline — generate data, tokenize, pretrain, embed, and probe — in a few minutes on CPU.

This runs the entire pragmatiq pipeline end to end on synthetic data, on CPU, in a few minutes. By the end you will have a trained model and a credit-risk probe score measured against a raw-count baseline — proof the embedding carries signal beyond trivial counts.

Requirements

Python 3.11+. A GPU is optional — everything runs on CPU (slower but correct). The commands below use a virtual environment so the install is isolated.

Install

git clone https://github.com/dynamiq-ai/pragmatiq.git
cd pragmatiq
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Optional extras pull in focused tooling: .[gnn] for the AML graph, .[serve] for ONNX/Triton export, .[nemotron] for the frozen text embedder, .[demo] for the Streamlit app.

Run the whole pipeline

pragmatiq quickstart
# smaller/faster local smoke:
pragmatiq quickstart --n-users 2000 --max-steps 80

from pragmatiq import api

result = api.quickstart(n_users=2000, max_steps=80)
print(result["message"])

quickstart runs five stages in order:

generate synthetic users and event histories,
fit the key–value–time tokenizer,
pretrain a small masked-language model,
embed users,
probe a gradient-boosting credit-risk classifier against a raw-count baseline.

Read the result

The run prints a one-line summary like:

credit probe AUC 0.71 vs raw-count baseline 0.54  ·  run: runs/quickstart

The probe head is gradient boosting by default (HistGradientBoostingClassifier), and the raw-count baseline uses the same classifier — so the gap reflects the representation, not the model family. Both ROC-AUC and PR-AUC are reported (PR-AUC is the honest headline on low-prevalence risk tasks).

Why this is a forecast, not a hindcast

When a label table carries an eval_ts, each user's history is truncated at that point before embedding — for both the probe and the baseline — so metrics never peek at the outcome window.

Install

Run the whole pipeline

Read the result

Where to go next

How it works

Bring your own data

On this page