Mathematical Reference

PredictionX
Pricing Engine & Strategy Agent

From raw futures prices and the Black (1976) binary option formula, through proper-scoring-rule evaluation, to the autonomous agent's OLS-calibrated half-Kelly position sizing.

Black 1976 · Log-Normal OLS Confidence Regression Kalshi · CFTC-Regulated Half-Kelly Sizing

§ 01Overview

PredictionX prices binary prediction market contracts on Kalshi. A Kalshi commodity contract is economically equivalent to a cash-or-nothing digital option: it pays $1 if a commodity price exceeds (or falls below) a strike level $K$ on a given settlement date, and $0 otherwise.

The engine computes a fair probability $\hat{p}$ for each contract through a four-stage pipeline:

1

Data Ingestion

Fetch Kalshi market price (YES and NO independently), build a futures price strip, derive annualised volatility.

2

Distribution Fit → p_base

Fit a log-normal distribution over the futures strip and compute P(price > K) using the d₂ formula.

3

Time-Value Decay → p_tv = fair_value

Model how probability evolves as DTE shrinks — the same formula applied at each intermediate DTE. The point estimate at today's DTE becomes the engine's final fair value.

4

Persist & Backtest

Save the pricing run to SQLite. Once the contract settles, record the outcome and compare predicted vs actual probability using proper scoring rules.

The edge is then $\text{edge} = \hat{p} - p_{\text{market}}$, where $p_{\text{market}}$ is the independently quoted Kalshi mid-price for the relevant side.

§ 02Kalshi Binary Contracts

A Kalshi commodity contract resolves to YES ($1) or NO ($0) at expiry based on a single condition:

Payoff — Above Contract
$$\text{Payoff} = \begin{cases} \$1 & \text{if } S_T > K \\ \$0 & \text{otherwise} \end{cases}$$
Payoff — Below Contract
$$\text{Payoff} = \begin{cases} \$1 & \text{if } S_T < K \\ \$0 & \text{otherwise} \end{cases}$$

where $S_T$ is the underlying commodity spot price at settlement date $T$, and $K$ is the strike price parsed from the ticker suffix (e.g. -T80 → $K = \$80$).

The risk-neutral fair value equals the risk-neutral probability:

Risk-Neutral Fair Value
$$\hat{p}_{\text{YES}} = \mathbb{E}^{\mathbb{Q}}\!\left[\mathbf{1}_{S_T > K}\right] = \mathbb{Q}(S_T > K)$$ $$\hat{p}_{\text{NO}} = 1 - \hat{p}_{\text{YES}}$$

YES and NO prices are independently quoted

Kalshi operates a separate orderbook for each side. YES and NO bid/ask are quoted independently by market makers:

Mid-Price Calculation
$$p_{\text{YES,mkt}} = \frac{\text{yes\_bid} + \text{yes\_ask}}{2} \qquad p_{\text{NO,mkt}} = \frac{\text{no\_bid} + \text{no\_ask}}{2}$$ $$p_{\text{YES,mkt}} + p_{\text{NO,mkt}} \neq 1 \quad \text{in general}$$

Example on a liquid contract:

BidAskMid
YES42¢44¢43¢
NO57¢61¢59¢
Sum of mids102¢ ≠ 100¢

The 2¢ gap is the market-maker's spread collected when YES and NO are sold simultaneously. Buying YES at ask (44¢) and NO at ask (61¢) costs 105¢ > $1 — no arbitrage.

Arbitrage-Free Bounds
  • $\text{yes\_ask} + \text{no\_ask} \geq \$1$ — you cannot buy both sides for less than the $1 payout.
  • $\text{yes\_bid} + \text{no\_bid} \leq \$1$ — you cannot sell both sides for more than $1.

These constraints are enforced by the matching engine and do not force mid-prices to sum to $1.

Edge against the correct side's price

Edge Definitions
$$\text{edge}_{\text{YES}} = \hat{p}_{\text{YES}} - p_{\text{YES,mkt}}$$ $$\text{edge}_{\text{NO}} = (1 - \hat{p}_{\text{YES}}) - p_{\text{NO,mkt}}$$

Because $p_{\text{YES,mkt}} + p_{\text{NO,mkt}} \neq 1$, YES and NO edges are not equal-and-opposite. A wide-spread contract can show positive edge on both sides simultaneously.

Fallback When NO Prices Are Unavailable

If the API omits NO bid/ask, the engine falls back to arb-consistent complements: $\text{no\_bid} = 1 - \text{yes\_ask}$, $\text{no\_ask} = 1 - \text{yes\_bid}$, giving $p_{\text{NO,mkt}} = 1 - p_{\text{YES,mkt}}$. This is noted in the debug log and stored as no_prices_from_api = False.

§ 03Futures Price Strip

A futures strip is the set of quoted futures prices across all liquid delivery months for a commodity. For WTI crude (root CL):

Delivery MonthTicker (yfinance)Close Price
May 2026CLK26.NYM$62.40
Jun 2026CLM26.NYM$62.10
Dec 2026CLZ26.NYM$60.80

Ticker construction follows CME naming conventions: root + month code + 2-digit year + exchange suffix.

FGHJKMNQUVXZ
JanFebMarAprMayJunJulAugSepOctNovDec

Exchange suffixes: .NYM (NYMEX energy/metals), .CMX (COMEX precious metals), .CBT (CBOT grains), .CME (CME livestock/indices/crypto).

§ 03bPyth Network Spot Price — Gold & Silver

Two series — KXGOLDD and KXSILVERD — settle on a Pyth Network spot price feed, not on COMEX futures. Kalshi's specifications for these series reference Pyth's 1-minute candle close for XAU/USD and XAG/USD respectively. Using the COMEX futures price as the anchor for these contracts would be systematically wrong.

Settlement Kinds

Seriessettlement_kindPoint-Estimate SourceVol Source
KXWTI, KXBRENTD, KXNATGASD, KXCOPPERDfuturesyfinance stripyfinance continuous front-month
KXGOLDDpyth_spotPyth XAU/USD (765d2b…)yfinance GC=F historical vol
KXSILVERDpyth_spotPyth XAG/USD (f2fb02…)yfinance SI=F historical vol

How the Pyth Anchor Is Applied

For pyth_spot series, the pricing flow calls data/pyth.py to retrieve the current Pyth spot price and then calls apply_spot_anchor(strip, spot_price), which replaces the strip's front-month price with the live Pyth spot and returns the adjusted strip plus computed basis:

Spot Anchor Substitution
$$F_{\text{front}}^{\text{adj}} = S_{\text{Pyth}} \qquad \text{(front-month price replaced)}$$ $$\text{basis} = F_{\text{front}} - S_{\text{Pyth}} \qquad \text{(stored in distribution\_params)}$$
Failure Is Fatal for pyth_spot Series

If data/pyth.py raises PythAPIError (HTTP failure, stale publish time, non-positive price), the pricing call raises PricingError — it does not silently fall back to the COMEX futures price. A stale or wrong underlying is worse than no price at all.

Pyth Staleness Guard

Each price returned by get_spot_price() is validated against its publish_time field. A price whose publish timestamp is more than 180 seconds behind wall-clock is rejected with PythAPIError — mirroring the yfinance frozen-data guard in data/futures.py.

§ 04Forward Price Interpolation

The key input to the pricing formula is the forward price $F$ — the expected futures price at the contract's settlement month. If the exact settlement month is in the strip, it is used directly. Otherwise we interpolate in log-price space:

Log-Price Linear Interpolation
$$\ln F_T = \ln F_{t_1} + \frac{T - t_1}{t_2 - t_1} \left(\ln F_{t_2} - \ln F_{t_1}\right)$$ $$F_T = \exp\!\left(\ln F_T\right)$$

where $t_1 \le T \le t_2$ are the nearest delivery dates bracketing the settlement date, measured in calendar ordinals. Interpolating in log-price rather than price space preserves positivity and is consistent with the log-normal dynamics. When $T$ is beyond the last strip date, the last price is used (flat extrapolation).

§ 05Volatility Estimation

Volatility $\sigma$ is the annualised standard deviation of log-returns. Two estimation regimes apply depending on time-to-expiry:

Long-Term Regime (DTE ≥ 7 Days)

252 calendar days of daily closing prices are fetched for the continuous front-month contract (ticker {root}=F). Log-returns are computed and annualised:

Daily Realised Volatility
$$r_i = \ln\!\left(\frac{S_i}{S_{i-1}}\right), \quad i = 1, \ldots, N$$ $$\sigma_{\text{daily}} = \sqrt{\frac{1}{N-1}\sum_{i=1}^{N}\!\left(r_i - \bar{r}\right)^2}$$ $$\sigma_{\text{annual}} = \sigma_{\text{daily}} \times \sqrt{252}$$

The $\sqrt{252}$ annualisation assumes 252 trading days per year. The engine clips the result to $[0.05,\ 2.00]$ to prevent extreme values from degenerate data.

Short-Term Regime (DTE < 7 Days)

For short-dated contracts, 5 days of hourly price data are used, with an asset-class-specific annualisation factor:

Hourly Realised Volatility
$$\sigma_{\text{hourly}} = \text{std}\!\left(\ln\!\left(\frac{S_t}{S_{t-1}}\right)\right) \quad \text{(over 1h intervals)}$$ $$\sigma_{\text{annual}} = \sigma_{\text{hourly}} \times \sqrt{H_{\text{year}}}$$ $$H_{\text{year}} = \begin{cases} 252 \times 17 = 4{,}284 & \text{equity indices (RTH only)} \\ 365 \times 23 = 8{,}395 & \text{commodities, crypto, metals} \end{cases}$$

Energy and commodity markets trade nearly around the clock (one maintenance hour excluded). Using asset-class-specific $H_{\text{year}}$ prevents over- or under-stating intraday vol.

§ 06The Log-Normal Price Model

The engine models the terminal price $S_T$ as log-normally distributed under the risk-neutral measure $\mathbb{Q}$:

Risk-Neutral Log-Normal Dynamics
$$S_T = F \cdot \exp\!\left(-\tfrac{1}{2}\sigma_T^2 + \sigma_T Z\right), \quad Z \sim \mathcal{N}(0,1)$$

where:

Log-Normal Terminal Distribution
$$\ln S_T \sim \mathcal{N}\!\left(\ln F - \tfrac{1}{2}\sigma_T^2,\ \sigma_T^2\right)$$

This is the Black (1976) futures pricing model applied to binary options — the industry standard for commodity options and OTC derivatives on futures.

Why Use the Futures Price as the Mean?

Under $\mathbb{Q}$, the futures price is an unbiased expectation of the future spot price (no risk premium needed). Using $F$ rather than the current spot $S_0$ correctly accounts for the term structure — e.g., oil in backwardation where $F(T) < S_0$.

§ 07The d₂ Formula — Core Probability Calculation

Given the log-normal model, the probability that $S_T$ exceeds the strike $K$ has a closed-form solution — the digital option pricing formula, derived by integrating the log-normal density above $K$:

Binary Option Probability — Above
$$\mathbb{Q}(S_T > K) = N(d_2)$$ $$d_2 = \frac{\ln\!\left(\dfrac{F}{K}\right) - \dfrac{1}{2}\sigma_T^2}{\sigma_T}$$
Binary Option Probability — Below
$$\mathbb{Q}(S_T < K) = N(-d_2) = 1 - N(d_2)$$

where $N(\cdot)$ is the standard normal CDF $N(x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi}} e^{-t^2/2} \, dt$.

Derivation

We want $\mathbb{Q}(S_T > K)$. Since $\ln S_T \sim \mathcal{N}(\mu_T, \sigma_T^2)$ with $\mu_T = \ln F - \frac{1}{2}\sigma_T^2$:

$$\mathbb{Q}(S_T > K) = \mathbb{Q}(\ln S_T > \ln K) = \mathbb{Q}\!\left(\frac{\ln S_T - \mu_T}{\sigma_T} > \frac{\ln K - \mu_T}{\sigma_T}\right)$$

Let $Z = (\ln S_T - \mu_T)/\sigma_T \sim \mathcal{N}(0,1)$. Then:

$$\frac{\ln K - \mu_T}{\sigma_T} = \frac{\ln K - \ln F + \tfrac{1}{2}\sigma_T^2}{\sigma_T} = -d_2$$ $$\Rightarrow\ \mathbb{Q}(S_T > K) = \mathbb{Q}(Z > -d_2) = N(d_2) \qquad \blacksquare$$

Intuitive Interpretation of d₂

Scenario$d_2$$N(d_2)$Interpretation
Deep ITM: $F \gg K$$\to +\infty$$\to 1.0$Almost certain YES
At-the-money: $F = K$$\approx -\tfrac{1}{2}\sigma_T$$\approx 0.47$–$0.50$Slight negative bias from log-normal skew
Deep OTM: $F \ll K$$\to -\infty$$\to 0.0$Almost certain NO
At expiry: $T \to 0$$\to \pm\infty$$\to 0$ or $1$Deterministic outcome

Relationship to Black-Scholes d₁ and d₂

In standard Black-Scholes for a vanilla call, $d_1 = d_2 + \sigma_T$. The call price is $C = F \cdot N(d_1) - K \cdot N(d_2)$, where $N(d_2)$ is the probability of exercise and $N(d_1)$ is the delta. For our cash-or-nothing digital, there is no delivery of the underlying — only $N(d_2)$ matters.

Key Insight

A Kalshi binary contract is priced using only $N(d_2)$ from Black-Scholes. There is no $d_1$ term because the payoff is a fixed $\$1$, not the underlying price. This makes the formula simpler and less sensitive to model assumptions than vanilla option pricing.

Total Volatility $\sigma_T$

Long-Term (DTE in Days)
$$T = \frac{\tau}{365}, \quad \sigma_T = \sigma_{\text{annual}} \sqrt{T}$$
Short-Term (DTE in Hours, Commodities/Metals)
$$T = \frac{h}{365 \times 23}, \quad \sigma_T = \sigma_{\text{annual}} \sqrt{T}$$

Two floors prevent degenerate $\sigma_T = 0$ at expiry: long-term mode floors DTE at 1 day ($\tau_{\min} = 1$); short-term mode floors $h$ at 0.25 hours (15 minutes). One exception: the curve endpoint at $\tau = 0$ is computed before any floor, using the deterministic boundary rule $p(0) = \mathbf{1}_{F > K}$, ensuring DTE=0 shows 100%/0% exactly.

§ 08Time-Value Decay Curve

The distribution fit gives $p_{\text{base}}$ — the probability evaluated at today's DTE. As expiry approaches, the contract's probability drifts toward its final 0 or 1 value. The time-value decay curve models this path.

For each DTE $\tau$ from today down to expiry, the engine applies the same log-normal formula with the remaining time:

Decay Curve Formula
$$p(\tau) = N\!\left(d_2(\tau)\right), \quad d_2(\tau) = \frac{\ln(F/K) - \tfrac{1}{2}\sigma^2 (\tau/365)}{\sigma \sqrt{\tau/365}}$$

At $\tau = 0$, the outcome is deterministic: $p(0) = \mathbf{1}_{F > K}$. The full curve is a list of $(\tau,\ p(\tau))$ pairs.

Shape of the Curve

Simplifying Assumption

The forward price is held constant along the curve — the strip is not re-interpolated at each DTE. In practice, as expiry approaches, the relevant forward price shifts toward the prompt-month contract.

§ 09Scenario Analysis (±1σ Shocks)

Three scenario probabilities are computed by shocking the forward price by ±1 standard deviation of the log-return over the remaining life:

Vol Shock on Forward Price
$$F_{\text{bull}} = F \cdot e^{+\sigma_T}, \quad F_{\text{bear}} = F \cdot e^{-\sigma_T}, \quad F_{\text{base}} = F$$
Scenario Probabilities
$$p_{\text{bull}} = N\!\left(\frac{\ln(F_{\text{bull}}/K) - \tfrac{1}{2}\sigma_T^2}{\sigma_T}\right)$$ $$p_{\text{base}} = N\!\left(\frac{\ln(F/K) - \tfrac{1}{2}\sigma_T^2}{\sigma_T}\right)$$ $$p_{\text{bear}} = N\!\left(\frac{\ln(F_{\text{bear}}/K) - \tfrac{1}{2}\sigma_T^2}{\sigma_T}\right)$$

Note that $\sigma_T$ in the denominator of $d_2$ is unchanged — we shock the expected price path while keeping residual uncertainty fixed. This gives a spread of outcomes bracketing the base case.

Directionality for NO Side

For a "below" contract priced on the NO side, bull/bear labels are inverted: a bullish price move hurts the NO side. The terminal report applies this inversion automatically when rendering.

§ 10Aggregation — Final Fair Value

Aggregation Pipeline
$$p_{\text{base}} \xrightarrow{\text{time-value at today's DTE}} p_{\text{tv}} = \hat{p}$$
  1. $p_{\text{base}}$: log-normal probability from the distribution fit. Anchors the estimate.
  2. $p_{\text{tv}}$: the decay-curve value at today's DTE. Mathematically equivalent to $p_{\text{base}}$ under the same time scaling, but the time-value module also produces the full $\{\tau,\ p(\tau)\}$ curve for visualisation and forward simulation.
  3. $\hat{p}$: the final engine fair value, equal to $p_{\text{tv}}$. If the time-value module fails, the aggregator falls back to $\hat{p} = p_{\text{base}}$.

Scenarios (bull/base/bear) are computed separately using $\pm 1\sigma_T$ forward shocks — see §9. They are reported alongside $\hat{p}$ but do not participate in its definition.

§ 11Short-Term Mode (DTE < 7 Days)

Contracts expiring within 7 days receive special treatment because:

Short-Term Time Scaling
$$T = \frac{h}{365 \times 23} \quad \text{where } h = \text{time-to-expiry in hours (fractional)}$$ $$\sigma_T = \sigma_{\text{annual,hourly}} \times \sqrt{T}$$

$h$ is computed from the Kalshi API's close_time field (ISO 8601 UTC) minus current UTC time, giving sub-hour precision. The decay curve in short-term mode shows 8 evenly-spaced hourly points from now to expiry.

§ 12Volatility Annualisation — Regime Reference

Asset ClassDTE RegimeData PeriodIntervalAnnualisation
All (long-term)≥ 7 days252 days1d$\times\sqrt{252}$
Energy, Metals, Crypto, Ag< 7 days5 days1h$\times\sqrt{365 \times 23}$
Equity Indices< 7 days5 days1h$\times\sqrt{252 \times 17}$

The 23h/day assumption accounts for the maintenance window in electronic futures markets (typically 1 hour around 5–6 PM ET). Equity index futures use 17h because extended-hours sessions have much lower liquidity.

§ 13Edge Score

Edge Definition
$$\text{edge} = \hat{p} - p_{\text{mkt}}$$

The market price $p_{\text{mkt}}$ is always sourced from the Kalshi API mid-price (never entered manually). Interpretation:

EdgeSignalImplication
$> +5\%$Strong YES edgeMarket underpricing YES; consider buying YES
$+1\%$ to $+5\%$Mild YES edgeModest mispricing on YES side
$\pm 1\%$Fairly pricedNo strong signal
$-1\%$ to $-5\%$Mild NO edgeMarket underpricing NO; consider buying NO
$< -5\%$Strong NO edgeMarket overpricing YES; strong signal for NO
Edge Is a Model Output, Not a Guarantee

The log-normal model is a simplification. Real commodity prices exhibit jumps, fat tails, and mean-reversion the model does not capture. Positive edge is a signal to investigate, not a mechanical trading rule.

§ 14Backtesting and Model Evaluation

Once a contract settles, the engine evaluates the prediction using proper scoring rules — mathematical measures that reward honest probability estimates and cannot be gamed by reporting extreme probabilities.

Brier Score

Brier Score
$$\text{BS} = \frac{1}{N} \sum_{i=1}^{N} \left(\hat{p}_i - o_i\right)^2$$

where $\hat{p}_i \in [0,1]$ is the model's fair value and $o_i \in \{0, 1\}$ is the actual outcome. Reference values:

Brier Skill Score (BSS)

Brier Skill Score vs. Market
$$\text{BSS} = 1 - \frac{\text{BS}_{\text{model}}}{\text{BS}_{\text{market}}}$$

BSS > 0 means the model beats the market price as a probability estimator. BSS is the primary long-run measure of whether the engine adds value beyond using the market mid as your forecast.

Calibration

A model is well-calibrated if, across all contracts where it predicted probability $p$, the event actually occurred $p$ fraction of the time. The engine bins predictions into 10pp buckets:

$$\text{event rate in bucket } b = \frac{\sum_{i \in b} o_i}{\left|b\right|}$$
Calibration PatternDiagnosis
Event rate consistently above diagonalModel underestimates probability (too bearish)
Event rate consistently below diagonalModel overestimates probability (too bullish)
S-shaped: too low at extremes, too high in middleModel is over-confident — probabilities too extreme
Inverse-S: too high at extremes, too low in middleModel is under-confident — probabilities too conservative

Mean Absolute Error and Bias

MAE and Bias
$$\text{MAE} = \frac{1}{N}\sum_{i=1}^{N} \left|\hat{p}_i - o_i\right|$$ $$\text{Bias} = \frac{1}{N}\sum_{i=1}^{N} \left(\hat{p}_i - o_i\right)$$

Positive bias means the model systematically overestimates the probability of the event occurring.

Edge Direction Accuracy

Directional Accuracy
$$\text{edge\_correct}_i = \begin{cases} 1 & \text{if } \text{edge}_i > 0 \text{ and } o_i = 1 \\ 1 & \text{if } \text{edge}_i < 0 \text{ and } o_i = 0 \\ 0 & \text{otherwise} \end{cases}$$ $$\text{Edge Accuracy} = \frac{1}{N}\sum_{i=1}^N \text{edge\_correct}_i$$

PnL per Dollar Risked

Realised PnL per Dollar
$$\text{PnL} = o - p_{\text{mkt}}$$

Averaging this over all runs in an edge bucket gives the realised expected value of acting on that edge signal. Positive average PnL in the high-edge bucket validates the model's alpha.

Why Proper Scoring Rules Matter

The Brier score is strictly proper: it is minimised in expectation only when the forecast equals the true probability. A model that reports extreme probabilities to artificially improve its score will actually increase its Brier score. BSS improvement is genuinely signal, not an artefact of reporting strategy.

§ 15Agent Pipeline Overview

The autonomous strategy agent runs a continuous scan-price-signal-size-record loop, evaluating all open Kalshi contracts and recording sizing decisions in shadow mode by default. Every decision is persisted to the strategy_decisions table with full provenance so the loop can be evaluated like a live trading system.

1

Scan

List all open Kalshi contracts across every configured series via the public API.

2

Filter

Apply six lightweight pre-pricing checks: priceable series, deduplication, volume, mid-price range, TTX/DTE window, and spread quality.

3

Price

Run the full pricing engine for each contract that passes filters → raw_edge = fair_value − market_price.

4

Confidence Regression

Look up OLS coefficients (α, β) from historical resolved runs for this segment → adjusted_edge = α + β × raw_edge.

5

Classify

Compare adjusted_edge to threshold θ: watch (below θ), auto-buy ([θ, 2θ)), or clear-buy (≥ 2θ). No per-cycle LLM call.

6

Size

Compute position size under two parallel strategies: fractional Kelly and Fixed dollar amount.

7

Record

Write the decision to strategy_decisions. Update the bankroll curve. After contract settlement, auto-resolve and compute PnL.

Shadow Mode

kelly_action and fixed_action are written to the database as if trades were placed, but no order is submitted to Kalshi. The agent simulates a live portfolio using mark-to-market PnL computed at settlement. Live mode is enabled by --live or AGENT_LIVE_MODE=true.

§ 16Pre-Signal Filters

Six filters are applied before any pricing or database lookup — ordered cheapest to most expensive. The first failing check short-circuits evaluation.

#FilterDefault ThresholdReason
1Priceable seriesfutures_root ≠ ""Series must map to a futures root; unknown series skipped entirely.
2DeduplicationNot in v_open_positionsAvoid accumulating duplicate positions in the same contract.
3Minimum volume500 contractsIlliquid contracts have wide spreads and unreliable mid-prices.
4Mid-price range5% – 95%Deep OTM/ITM contracts are near-resolved; edge signal is noisy.
5TTX / DTE window2h – 26h (short) or 1d – 14d (medium)Avoid contracts too close (no time to act) or too far (vol estimates unreliable).
6Spread qualityyes_spread / yes_mid ≤ 8%Wide bid/ask makes the mid-price an unreliable proxy for fair value.

TTX Regime Split

A contract is classified as short-term if dte_hours ≤ 26 (approximately one trading day). Short-term contracts use hourly volatility and TTX buckets. Medium-term contracts use daily DTE and daily vol. The agent applies separate TTX/DTE range filters for each regime.

§ 17Confidence Regression and Adjusted Edge

A raw-edge threshold alone ignores whether the pricing model's edge estimates are systematically over- or under-stated in specific market segments. The agent estimates a linear regression from historical raw_edge to realised PnL per dollar risked, fitting two coefficients that capture bias and calibration quality independently.

Training Data

YES Side
$$x_i = \hat{p}_i - p_{\text{YES,mkt},i} \quad \text{(raw YES edge)}$$ $$y_i = o_i - p_{\text{YES,mkt},i} \quad \text{(realised PnL per dollar risked)}$$
NO Side
$$x_i = (1 - \hat{p}_i) - p_{\text{NO,mkt},i} \quad \text{(raw NO edge)}$$ $$y_i = (1 - o_i) - p_{\text{NO,mkt},i} \quad \text{(realised PnL per dollar risked on NO)}$$

OLS Regression

Ordinary Least Squares
$$y = \alpha + \beta \, x + \varepsilon$$ $$\hat{\alpha},\ \hat{\beta} = \underset{\alpha, \beta}{\arg\min} \sum_{i=1}^{N} \left(y_i - \alpha - \beta\, x_i\right)^2$$

Interpreting the coefficients:

Adjusted Edge Formula

Adjusted Edge
$$\text{adjusted\_edge} = \hat{\alpha} + \hat{\beta} \cdot \text{raw\_edge}$$

Fallback Prior

When fewer than min_samples (default 50) resolved runs exist for a segment, the agent falls back to a conservative neutral prior:

$$\hat{\alpha} = 0.0,\quad \hat{\beta} = 0.5 \quad \Rightarrow \quad \text{adjusted\_edge} = 0.5 \times \text{raw\_edge}$$

Confidence Hierarchy

LevelSegmentExample Label
1 — Most specificSeries + direction + TTX/DTE bucketKXGOLDD/above/4-8h
2Series + TTX/DTE bucket (any direction)KXGOLDD/4-8h
3Global + TTX/DTE bucket (all series)global/4-8h
4 — FallbackGlobal average (all segments)global_average

TTX buckets for short-term: <2h, 2–4h, 4–8h, 8–16h, 16–26h. DTE buckets for medium-term: 1d, 2–3d, 4–7d, 8–14d.

§ 18Signal Classification

Two thresholds govern classification:

TierConditionRecorded Action
Watch$\text{adjusted\_edge} < \theta$watch — zero size, DB row written for analysis
Auto-Buy$\theta \le \text{adjusted\_edge} < 2\theta_0$buy — rule-based, proceeds immediately to sizing
Clear-Buy$\text{adjusted\_edge} \ge 2\theta_0$buy — same execution as auto-buy; label preserved in analytics for high-confidence tracking
Sanity Gate — Implausible Edge

Before any buy proceeds to sizing, raw_edge is checked against max_raw_edge_sanity (default 0.25). If raw_edge ≥ 0.25, the classification is forced to watch with skip reason raw_edge_implausible:<value>. Sane liquid markets do not leave 25 percentage points on the table; a reading this extreme almost certainly signals a stale forward price or input error.

Why Keep the Clear-Buy Label?

Separating clear-buy from auto-buy in the DB enables the periodic review to analyse whether the model's highest-confidence signals outperform relative to the auto-buy tier — without requiring a per-cycle LLM call to generate the distinction.

LLM Observability (Startup Preflight)

StatusCondition
okAPI key present — periodic review LLM will be available
errorKey missing — periodic review runs rule-based findings only, no Claude synthesis

§ 19Position Sizing

Kelly Criterion for Binary Contracts

A Kalshi YES contract bought at price $p_{\text{mkt}}$ has payoff structure: win (probability $\hat{p}$) profit = $1 - p_{\text{mkt}}$; lose (probability $1 - \hat{p}$) loss = $p_{\text{mkt}}$. The Kelly fraction maximises expected log-wealth growth. For net odds $b = (1 - p_{\text{mkt}}) / p_{\text{mkt}}$:

Kelly Fraction — Binary Contract
$$f^* = \frac{\hat{p} \cdot b - (1 - \hat{p})}{b} = \frac{\hat{p} - p_{\text{mkt}}}{1 - p_{\text{mkt}}} = \frac{\text{edge}}{1 - p_{\text{mkt}}}$$

A half-Kelly multiplier $\lambda = 0.5$ and a hard cap $f_{\text{max}}$ (default 10% of bankroll) reduce variance and ruin risk:

Final Kelly Fraction and Size
$$f_{\text{adj}} = \lambda \cdot f^* = \frac{\lambda \cdot \text{edge}}{1 - p_{\text{mkt}}}$$ $$f_{\text{final}} = \min\!\left(f_{\text{adj}},\ f_{\text{max}}\right)$$ $$\text{size}_{\text{USD}} = f_{\text{final}} \times W_{\text{avail}}$$

Cash-on-Hand Sizing and EV-Ranked Allocation

Available Kelly Bankroll
$$W_{\text{avail}} = \max\!\left(0,\ W_{\text{snapshot}} - \sum_{i \in \text{open}} \text{size}_{\text{USD},i}\right)$$

Within a single cycle, candidates are funded in decreasing order of $f^*$. After each buy, $W_{\text{avail}}$ is decremented before the next candidate is sized:

Within-Cycle Priority
$$\text{priority}(c) = \frac{\text{adjusted\_edge}_c}{1 - p_{\text{mkt},c}}, \quad p_{\text{mkt},c} < 1$$ $$W_{\text{avail}}^{(k+1)} = \max\!\left(0,\ W_{\text{avail}}^{(k)} - \text{size}_{\text{USD},c_k}\right)$$
Why Half-Kelly?

Full Kelly maximises long-run growth but produces extreme drawdowns and is highly sensitive to model error. Half-Kelly gives approximately 75% of the growth rate with much lower variance, and is standard practice for strategies with uncertain probability estimates.

Fixed Strategy

Fixed Position Size
$$\text{size}_{\text{USD}} = \begin{cases} A & \text{if } \text{adjusted\_edge} \ge \theta \\ 0 & \text{otherwise} \end{cases}$$

where $A$ is a constant dollar amount (default $50) and $\theta$ is the same segment-overridden threshold used for Kelly classification. Fixed is a benchmark: lower variance than Kelly but does not scale with edge conviction.

Fixed Never Trades Live

Fixed is a permanent shadow benchmark — it records decisions exactly as Kelly does, but no real Kalshi order is ever placed on behalf of the Fixed strategy.

Bankroll Tracking

Bankroll Update
$$W_{t+1} = W_t + \sum_{\text{resolved at }t} \text{PnL}_i$$ $$\text{PnL}_i = \text{size}_{\text{USD},i} \times \frac{o_i - p_{\text{mkt},i}}{p_{\text{mkt},i}}$$

This is the dollar profit from buying size_USD worth of contracts at $p_{\text{mkt}}$: each dollar spent buys $1/p_{\text{mkt}}$ contracts, each paying $1 or $0 at settlement. Snapshots are append-only and never amended.

§ 20Segment Overrides and Periodic Review

The agent runs a periodic review at most daily, or when review_sample_threshold (default 50) new resolutions have accumulated since the last review. The review identifies market segments where the confidence regression is systematically wrong and applies segment overrides that take effect immediately — no restart required.

Review Process

  1. Load analytics views (v_by_ttx_bucket, v_by_dte_bucket, v_by_moneyness, v_strategy_comparison, etc.)
  2. Flag segments with n ≥ review_min_segment_n (default 30) and edge accuracy below 0.45 or above 0.65.
  3. Call claude-sonnet-4-6 with tool use — the LLM can query allowed analytics views, inspect bankroll curves, and retrieve prior overrides.
  4. LLM returns override recommendations as structured JSON.
  5. Overrides are written to segment_overrides and take effect on the next agent cycle.

Override Types

TypeEffect on Regression Coefficients
confidence_multiplier $\hat{\beta}' = v \cdot \hat{\beta}$, $\hat{\alpha}$ unchanged. Use $v < 1$ to dampen over-trading; $v > 1$ to boost a reliably undervalued segment.
exclude $\hat{\alpha}' = 0,\ \hat{\beta}' = 0\ \Rightarrow\ \text{adjusted\_edge} = 0$. Segment always produces watch/skip regardless of raw edge. Used for persistently negative realised PnL.
min_edge_override Replaces min_adjusted_edge ($\theta$) for the specific segment. Raises the bar for low-quality segments or lowers it for highly reliable ones.
Override Preservation

The periodic review only deactivates prior permanent overrides when the LLM produces ≥1 valid replacement. A silent review never wipes the active override set — this fixed a live-trading incident where a review wiped a KXWTI exclude 47 seconds before 7 fresh KXWTI buys.

§ 21Live-Mode Safety Guards

The live execution path applies a layered set of safety checks introduced after Day-1 of live trading exposed several failure modes. All guards are evaluated in order; the first failing guard determines the skip reason.

Data-Quality Guards (Before Pricing)

GuardThresholdFailure Action
Stale daily dataLast bar older than 96hRaise FuturesDataError
Stale hourly dataLast bar older than 6hRaise FuturesDataError
Frozen futures pricePrice identical (|Δ| < 1e-4) for ≥5 consecutive hourly readings over 6hRaise FuturesDataError
Stale Pyth feedpublish_time more than 180s behind wall-clockRaise PythAPIError

Per-Order Live Guards (Before Placing a Kalshi Order)

These guards evaluate against a per-cycle running state (_LiveCycleState) so two candidates within the same cycle cannot collectively breach a cap that each one would pass individually.

GuardCondition CheckedSkip Reason
No quoted askask ≤ 0 for the evaluated sideno_ask
Insufficient live cashOrder cost (count × ask) exceeds running working cashinsufficient_live_cash:<avail><<cost>
Directional asymmetry(series, direction) already has ≥ max_same_direction_buys_per_cycle (default 3) buysdirectional_asymmetry:…
Per-series exposure capPost-trade series exposure / account value > 20%series_exposure_cap:…
Portfolio exposure capPost-trade total open exposure / account value > 40%portfolio_exposure_cap:…
Account Value and Exposure Definitions
$$\text{total\_account\_value} = \text{Kalshi\_cash\_balance} + \text{open\_position\_cost}$$ $$\text{series\_frac} = \frac{\text{series\_exposure}[\text{series}] + \text{order\_cost}}{\text{total\_account\_value}}$$ $$\text{portfolio\_frac} = \frac{\text{running\_exposure} + \text{order\_cost}}{\text{total\_account\_value}}$$
Phantom-Exposure Guard

Open exposure is computed only for rows where order_status IN ('pending', 'filled') AND kalshi_order_id IS NOT NULL. Rows where placement crashed or the order was cancelled are excluded, preventing a failed placement from leaving phantom exposure that would artificially inflate position caps.

Order Type Restriction

All live orders are limit only, price clamped 1–99¢. Market orders are not supported by design: a stale or buggy signal produces an unfilled resting order rather than sweeping the book at an adverse price.

§ 22Model Limitations and Assumptions

Log-Normal Dynamics

The model assumes continuous log-normal price paths. Real commodity prices exhibit:

Constant Forward Price

The forward price $F$ is interpolated from today's futures strip and held constant throughout the decay curve. In practice, $F$ shifts as new information arrives.

Constant Volatility

A single annualised $\sigma$ is used throughout the remaining contract life. Real markets exhibit a volatility term structure and a volatility smile. The engine uses a single number — adequate for ballpark fair-value estimation, not for precision near-the-money pricing.

Risk-Neutral vs. Real-World Measure

The engine operates under $\mathbb{Q}$, using the futures price as the expected terminal value. Risk premiums (e.g., the crude oil risk premium embedded in backwardation) are implicitly absorbed into the futures price but not separately modelled.

Short-Term Forward Price Accuracy

For very short-dated contracts, the front-month futures price is a reasonable proxy for the eventual settlement price, but intraday basis can be significant. A more precise implementation would use the specific expiry-month contract's last traded price.

§ 23Glossary

TermDefinition
$F$Forward price — futures price interpolated to the settlement month
$K$Strike price — target price level in the Kalshi contract (e.g. $80)
$\sigma$Annualised volatility of log-returns
$\sigma_T$Total volatility over remaining life: $\sigma\sqrt{T}$
$T$Time to expiry in years: $\tau/365$ (long-term) or $h/(365 \times 23)$ (short-term)
$d_2$Standardised distance from strike: $(\ln(F/K) - \tfrac{1}{2}\sigma_T^2)/\sigma_T$
$N(\cdot)$Standard normal CDF
$p_{\text{base}}$Log-normal probability from the distribution fit
$p_{\text{tv}}$Time-value point estimate (same formula at today's DTE)
$\hat{p}$Final YES fair value — the engine's $P(\text{target met})$, equal to $p_{\text{tv}}$
$p_{\text{YES,mkt}}$YES mid-price from Kalshi: (yes_bid + yes_ask) / 2
$p_{\text{NO,mkt}}$NO mid-price from Kalshi: (no_bid + no_ask) / 2 — independently quoted, not 1 − YES mid
edge$\hat{p} - p_{\text{YES,mkt}}$ for YES; $(1-\hat{p}) - p_{\text{NO,mkt}}$ for NO
DTEDays to expiry (whole calendar days)
TTXTime to expiry in hours/minutes (short-term display)
ITMIn-the-money: $F > K$ for an "above" contract
OTMOut-of-the-money: $F < K$ for an "above" contract
ATMAt-the-money: $F \approx K$
$\mathbb{Q}$Risk-neutral probability measure; futures price = $\mathbb{E}^{\mathbb{Q}}[S_T]$
Futures stripSet of futures prices across all liquid delivery months
Log-normalDistribution where $\ln X \sim \mathcal{N}(\mu, \sigma^2)$; guarantees $X > 0$
Cash-or-nothing digitalOption paying a fixed cash amount if underlying exceeds strike; equivalent to a Kalshi YES payout
Brier Score (BS)Mean squared error of probability forecasts vs. binary outcomes; range [0, 1], lower is better
Brier Skill Score (BSS)$1 - \text{BS}_\text{model}/\text{BS}_\text{market}$; positive means model beats market price as a probability estimate
CalibrationWhen the model says $p\%$, the event occurs $p\%$ of the time across many predictions
MAEMean absolute error: average $|\hat{p} - o|$ across resolved contracts
BiasMean signed error $(\hat{p} - o)$; positive = model over-estimates probability
Edge accuracyFraction of runs where edge direction matched the actual outcome
raw_edgeUnadjusted edge signal directly from the pricing engine
adjusted_edge$\alpha + \beta \cdot \text{raw\_edge}$ — scaled by OLS regression coefficients
$\alpha$ (intercept)Expected realised PnL when raw_edge = 0; captures systematic model bias
$\beta$ (slope)Additional realised PnL per unit of raw_edge; $\beta=1$ = perfectly calibrated
pyth_spotSettlement kind for KXGOLDD/KXSILVERD: settles on Pyth Network XAU/USD or XAG/USD, not COMEX futures
spot_basisPyth spot − futures front-month price, stored in distribution_params for pyth_spot series
$\theta_0$ (global threshold)Unoverridden config.min_adjusted_edge (default 0.072); the auto-buy/clear-buy split at $2\theta_0$
Shadow modeAgent mode where decisions are logged as if trades were placed but no order is submitted to Kalshi
Kelly fractionOptimal bet size as fraction of bankroll: $f^* = \text{edge}/(1-p_{\text{mkt}})$; half-Kelly ($\lambda=0.5$) used in practice
Segment overridePer-segment adjustment to α, β, or θ written by periodic review; takes effect next cycle
Clear-buyContract classified at adjusted_edge ≥ 2θ; same execution as auto-buy, label preserved in analytics
TTX bucketTime-to-expiry range bucket for short-term contracts (<26h): e.g. "4-8h"