§ 01Overview
PredictionX prices binary prediction market contracts on Kalshi. A Kalshi commodity contract is economically equivalent to a cash-or-nothing digital option: it pays $1 if a commodity price exceeds (or falls below) a strike level $K$ on a given settlement date, and $0 otherwise.
The engine computes a fair probability $\hat{p}$ for each contract through a four-stage pipeline:
Data Ingestion
Fetch Kalshi market price (YES and NO independently), build a futures price strip, derive annualised volatility.
Distribution Fit → p_base
Fit a log-normal distribution over the futures strip and compute P(price > K) using the d₂ formula.
Time-Value Decay → p_tv = fair_value
Model how probability evolves as DTE shrinks — the same formula applied at each intermediate DTE. The point estimate at today's DTE becomes the engine's final fair value.
Persist & Backtest
Save the pricing run to SQLite. Once the contract settles, record the outcome and compare predicted vs actual probability using proper scoring rules.
The edge is then $\text{edge} = \hat{p} - p_{\text{market}}$, where $p_{\text{market}}$ is the independently quoted Kalshi mid-price for the relevant side.
§ 02Kalshi Binary Contracts
A Kalshi commodity contract resolves to YES ($1) or NO ($0) at expiry based on a single condition:
where $S_T$ is the underlying commodity spot price at settlement date $T$, and $K$ is the strike price parsed from the ticker suffix (e.g. -T80 → $K = \$80$).
The risk-neutral fair value equals the risk-neutral probability:
YES and NO prices are independently quoted
Kalshi operates a separate orderbook for each side. YES and NO bid/ask are quoted independently by market makers:
Example on a liquid contract:
| Bid | Ask | Mid | |
|---|---|---|---|
| YES | 42¢ | 44¢ | 43¢ |
| NO | 57¢ | 61¢ | 59¢ |
| Sum of mids | 102¢ ≠ 100¢ | ||
The 2¢ gap is the market-maker's spread collected when YES and NO are sold simultaneously. Buying YES at ask (44¢) and NO at ask (61¢) costs 105¢ > $1 — no arbitrage.
- $\text{yes\_ask} + \text{no\_ask} \geq \$1$ — you cannot buy both sides for less than the $1 payout.
- $\text{yes\_bid} + \text{no\_bid} \leq \$1$ — you cannot sell both sides for more than $1.
These constraints are enforced by the matching engine and do not force mid-prices to sum to $1.
Edge against the correct side's price
Because $p_{\text{YES,mkt}} + p_{\text{NO,mkt}} \neq 1$, YES and NO edges are not equal-and-opposite. A wide-spread contract can show positive edge on both sides simultaneously.
If the API omits NO bid/ask, the engine falls back to arb-consistent complements: $\text{no\_bid} = 1 - \text{yes\_ask}$, $\text{no\_ask} = 1 - \text{yes\_bid}$, giving $p_{\text{NO,mkt}} = 1 - p_{\text{YES,mkt}}$. This is noted in the debug log and stored as no_prices_from_api = False.
§ 03Futures Price Strip
A futures strip is the set of quoted futures prices across all liquid delivery months for a commodity. For WTI crude (root CL):
| Delivery Month | Ticker (yfinance) | Close Price |
|---|---|---|
| May 2026 | CLK26.NYM | $62.40 |
| Jun 2026 | CLM26.NYM | $62.10 |
| Dec 2026 | CLZ26.NYM | $60.80 |
Ticker construction follows CME naming conventions: root + month code + 2-digit year + exchange suffix.
| F | G | H | J | K | M | N | Q | U | V | X | Z |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
Exchange suffixes: .NYM (NYMEX energy/metals), .CMX (COMEX precious metals), .CBT (CBOT grains), .CME (CME livestock/indices/crypto).
§ 03bPyth Network Spot Price — Gold & Silver
Two series — KXGOLDD and KXSILVERD — settle on a Pyth Network spot price feed, not on COMEX futures. Kalshi's specifications for these series reference Pyth's 1-minute candle close for XAU/USD and XAG/USD respectively. Using the COMEX futures price as the anchor for these contracts would be systematically wrong.
Settlement Kinds
| Series | settlement_kind | Point-Estimate Source | Vol Source |
|---|---|---|---|
| KXWTI, KXBRENTD, KXNATGASD, KXCOPPERD | futures | yfinance strip | yfinance continuous front-month |
| KXGOLDD | pyth_spot | Pyth XAU/USD (765d2b…) | yfinance GC=F historical vol |
| KXSILVERD | pyth_spot | Pyth XAG/USD (f2fb02…) | yfinance SI=F historical vol |
How the Pyth Anchor Is Applied
For pyth_spot series, the pricing flow calls data/pyth.py to retrieve the current Pyth spot price and then calls apply_spot_anchor(strip, spot_price), which replaces the strip's front-month price with the live Pyth spot and returns the adjusted strip plus computed basis:
If data/pyth.py raises PythAPIError (HTTP failure, stale publish time, non-positive price), the pricing call raises PricingError — it does not silently fall back to the COMEX futures price. A stale or wrong underlying is worse than no price at all.
Pyth Staleness Guard
Each price returned by get_spot_price() is validated against its publish_time field. A price whose publish timestamp is more than 180 seconds behind wall-clock is rejected with PythAPIError — mirroring the yfinance frozen-data guard in data/futures.py.
§ 04Forward Price Interpolation
The key input to the pricing formula is the forward price $F$ — the expected futures price at the contract's settlement month. If the exact settlement month is in the strip, it is used directly. Otherwise we interpolate in log-price space:
where $t_1 \le T \le t_2$ are the nearest delivery dates bracketing the settlement date, measured in calendar ordinals. Interpolating in log-price rather than price space preserves positivity and is consistent with the log-normal dynamics. When $T$ is beyond the last strip date, the last price is used (flat extrapolation).
§ 05Volatility Estimation
Volatility $\sigma$ is the annualised standard deviation of log-returns. Two estimation regimes apply depending on time-to-expiry:
Long-Term Regime (DTE ≥ 7 Days)
252 calendar days of daily closing prices are fetched for the continuous front-month contract (ticker {root}=F). Log-returns are computed and annualised:
The $\sqrt{252}$ annualisation assumes 252 trading days per year. The engine clips the result to $[0.05,\ 2.00]$ to prevent extreme values from degenerate data.
Short-Term Regime (DTE < 7 Days)
For short-dated contracts, 5 days of hourly price data are used, with an asset-class-specific annualisation factor:
Energy and commodity markets trade nearly around the clock (one maintenance hour excluded). Using asset-class-specific $H_{\text{year}}$ prevents over- or under-stating intraday vol.
§ 06The Log-Normal Price Model
The engine models the terminal price $S_T$ as log-normally distributed under the risk-neutral measure $\mathbb{Q}$:
where:
- $F = F(T)$ is the futures price at the settlement month — the risk-neutral expectation of $S_T$
- $\sigma_T = \sigma \sqrt{T}$ is the total volatility over the remaining time $T$ (in years)
- The $-\tfrac{1}{2}\sigma_T^2$ term is the Itô correction keeping $\mathbb{E}^{\mathbb{Q}}[S_T] = F$
This is the Black (1976) futures pricing model applied to binary options — the industry standard for commodity options and OTC derivatives on futures.
Under $\mathbb{Q}$, the futures price is an unbiased expectation of the future spot price (no risk premium needed). Using $F$ rather than the current spot $S_0$ correctly accounts for the term structure — e.g., oil in backwardation where $F(T) < S_0$.
§ 07The d₂ Formula — Core Probability Calculation
Given the log-normal model, the probability that $S_T$ exceeds the strike $K$ has a closed-form solution — the digital option pricing formula, derived by integrating the log-normal density above $K$:
where $N(\cdot)$ is the standard normal CDF $N(x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi}} e^{-t^2/2} \, dt$.
Derivation
We want $\mathbb{Q}(S_T > K)$. Since $\ln S_T \sim \mathcal{N}(\mu_T, \sigma_T^2)$ with $\mu_T = \ln F - \frac{1}{2}\sigma_T^2$:
Let $Z = (\ln S_T - \mu_T)/\sigma_T \sim \mathcal{N}(0,1)$. Then:
Intuitive Interpretation of d₂
| Scenario | $d_2$ | $N(d_2)$ | Interpretation |
|---|---|---|---|
| Deep ITM: $F \gg K$ | $\to +\infty$ | $\to 1.0$ | Almost certain YES |
| At-the-money: $F = K$ | $\approx -\tfrac{1}{2}\sigma_T$ | $\approx 0.47$–$0.50$ | Slight negative bias from log-normal skew |
| Deep OTM: $F \ll K$ | $\to -\infty$ | $\to 0.0$ | Almost certain NO |
| At expiry: $T \to 0$ | $\to \pm\infty$ | $\to 0$ or $1$ | Deterministic outcome |
Relationship to Black-Scholes d₁ and d₂
In standard Black-Scholes for a vanilla call, $d_1 = d_2 + \sigma_T$. The call price is $C = F \cdot N(d_1) - K \cdot N(d_2)$, where $N(d_2)$ is the probability of exercise and $N(d_1)$ is the delta. For our cash-or-nothing digital, there is no delivery of the underlying — only $N(d_2)$ matters.
A Kalshi binary contract is priced using only $N(d_2)$ from Black-Scholes. There is no $d_1$ term because the payoff is a fixed $\$1$, not the underlying price. This makes the formula simpler and less sensitive to model assumptions than vanilla option pricing.
Total Volatility $\sigma_T$
Two floors prevent degenerate $\sigma_T = 0$ at expiry: long-term mode floors DTE at 1 day ($\tau_{\min} = 1$); short-term mode floors $h$ at 0.25 hours (15 minutes). One exception: the curve endpoint at $\tau = 0$ is computed before any floor, using the deterministic boundary rule $p(0) = \mathbf{1}_{F > K}$, ensuring DTE=0 shows 100%/0% exactly.
§ 08Time-Value Decay Curve
The distribution fit gives $p_{\text{base}}$ — the probability evaluated at today's DTE. As expiry approaches, the contract's probability drifts toward its final 0 or 1 value. The time-value decay curve models this path.
For each DTE $\tau$ from today down to expiry, the engine applies the same log-normal formula with the remaining time:
At $\tau = 0$, the outcome is deterministic: $p(0) = \mathbf{1}_{F > K}$. The full curve is a list of $(\tau,\ p(\tau))$ pairs.
Shape of the Curve
- Deep ITM ($F \gg K$): $p$ starts high and increases monotonically toward 1.0 — less time for price to fall below $K$.
- Deep OTM ($F \ll K$): $p$ starts low and decreases toward 0.0 — less time for a rally to reach $K$.
- ATM ($F \approx K$): $p \approx 0.5$ throughout with a gentle final convergence.
- Near-strike, high vol: the curve is flatter, with a sharp final convergence as $\sigma_T \to 0$.
The forward price is held constant along the curve — the strip is not re-interpolated at each DTE. In practice, as expiry approaches, the relevant forward price shifts toward the prompt-month contract.
§ 09Scenario Analysis (±1σ Shocks)
Three scenario probabilities are computed by shocking the forward price by ±1 standard deviation of the log-return over the remaining life:
Note that $\sigma_T$ in the denominator of $d_2$ is unchanged — we shock the expected price path while keeping residual uncertainty fixed. This gives a spread of outcomes bracketing the base case.
For a "below" contract priced on the NO side, bull/bear labels are inverted: a bullish price move hurts the NO side. The terminal report applies this inversion automatically when rendering.
§ 10Aggregation — Final Fair Value
- $p_{\text{base}}$: log-normal probability from the distribution fit. Anchors the estimate.
- $p_{\text{tv}}$: the decay-curve value at today's DTE. Mathematically equivalent to $p_{\text{base}}$ under the same time scaling, but the time-value module also produces the full $\{\tau,\ p(\tau)\}$ curve for visualisation and forward simulation.
- $\hat{p}$: the final engine fair value, equal to $p_{\text{tv}}$. If the time-value module fails, the aggregator falls back to $\hat{p} = p_{\text{base}}$.
Scenarios (bull/base/bear) are computed separately using $\pm 1\sigma_T$ forward shocks — see §9. They are reported alongside $\hat{p}$ but do not participate in its definition.
§ 11Short-Term Mode (DTE < 7 Days)
Contracts expiring within 7 days receive special treatment because:
- Daily closing-price vol is too coarse — a lot can happen in hours
- Integer DTE (0 or 1) provides insufficient precision for time scaling $T$
- Expiry times matter: a contract at 4:00 PM vs 11:59 PM EST is very different at DTE=0
$h$ is computed from the Kalshi API's close_time field (ISO 8601 UTC) minus current UTC time, giving sub-hour precision. The decay curve in short-term mode shows 8 evenly-spaced hourly points from now to expiry.
§ 12Volatility Annualisation — Regime Reference
| Asset Class | DTE Regime | Data Period | Interval | Annualisation |
|---|---|---|---|---|
| All (long-term) | ≥ 7 days | 252 days | 1d | $\times\sqrt{252}$ |
| Energy, Metals, Crypto, Ag | < 7 days | 5 days | 1h | $\times\sqrt{365 \times 23}$ |
| Equity Indices | < 7 days | 5 days | 1h | $\times\sqrt{252 \times 17}$ |
The 23h/day assumption accounts for the maintenance window in electronic futures markets (typically 1 hour around 5–6 PM ET). Equity index futures use 17h because extended-hours sessions have much lower liquidity.
§ 13Edge Score
The market price $p_{\text{mkt}}$ is always sourced from the Kalshi API mid-price (never entered manually). Interpretation:
| Edge | Signal | Implication |
|---|---|---|
| $> +5\%$ | Strong YES edge | Market underpricing YES; consider buying YES |
| $+1\%$ to $+5\%$ | Mild YES edge | Modest mispricing on YES side |
| $\pm 1\%$ | Fairly priced | No strong signal |
| $-1\%$ to $-5\%$ | Mild NO edge | Market underpricing NO; consider buying NO |
| $< -5\%$ | Strong NO edge | Market overpricing YES; strong signal for NO |
The log-normal model is a simplification. Real commodity prices exhibit jumps, fat tails, and mean-reversion the model does not capture. Positive edge is a signal to investigate, not a mechanical trading rule.
§ 14Backtesting and Model Evaluation
Once a contract settles, the engine evaluates the prediction using proper scoring rules — mathematical measures that reward honest probability estimates and cannot be gamed by reporting extreme probabilities.
Brier Score
where $\hat{p}_i \in [0,1]$ is the model's fair value and $o_i \in \{0, 1\}$ is the actual outcome. Reference values:
- BS = 0.00: perfect predictions
- BS = 0.25: equivalent to guessing 50% on every contract
- BS = 1.00: perfectly wrong on every prediction
Brier Skill Score (BSS)
BSS > 0 means the model beats the market price as a probability estimator. BSS is the primary long-run measure of whether the engine adds value beyond using the market mid as your forecast.
Calibration
A model is well-calibrated if, across all contracts where it predicted probability $p$, the event actually occurred $p$ fraction of the time. The engine bins predictions into 10pp buckets:
| Calibration Pattern | Diagnosis |
|---|---|
| Event rate consistently above diagonal | Model underestimates probability (too bearish) |
| Event rate consistently below diagonal | Model overestimates probability (too bullish) |
| S-shaped: too low at extremes, too high in middle | Model is over-confident — probabilities too extreme |
| Inverse-S: too high at extremes, too low in middle | Model is under-confident — probabilities too conservative |
Mean Absolute Error and Bias
Positive bias means the model systematically overestimates the probability of the event occurring.
Edge Direction Accuracy
PnL per Dollar Risked
Averaging this over all runs in an edge bucket gives the realised expected value of acting on that edge signal. Positive average PnL in the high-edge bucket validates the model's alpha.
The Brier score is strictly proper: it is minimised in expectation only when the forecast equals the true probability. A model that reports extreme probabilities to artificially improve its score will actually increase its Brier score. BSS improvement is genuinely signal, not an artefact of reporting strategy.
§ 15Agent Pipeline Overview
The autonomous strategy agent runs a continuous scan-price-signal-size-record loop, evaluating all open Kalshi contracts and recording sizing decisions in shadow mode by default. Every decision is persisted to the strategy_decisions table with full provenance so the loop can be evaluated like a live trading system.
Scan
List all open Kalshi contracts across every configured series via the public API.
Filter
Apply six lightweight pre-pricing checks: priceable series, deduplication, volume, mid-price range, TTX/DTE window, and spread quality.
Price
Run the full pricing engine for each contract that passes filters → raw_edge = fair_value − market_price.
Confidence Regression
Look up OLS coefficients (α, β) from historical resolved runs for this segment → adjusted_edge = α + β × raw_edge.
Classify
Compare adjusted_edge to threshold θ: watch (below θ), auto-buy ([θ, 2θ)), or clear-buy (≥ 2θ). No per-cycle LLM call.
Size
Compute position size under two parallel strategies: fractional Kelly and Fixed dollar amount.
Record
Write the decision to strategy_decisions. Update the bankroll curve. After contract settlement, auto-resolve and compute PnL.
kelly_action and fixed_action are written to the database as if trades were placed, but no order is submitted to Kalshi. The agent simulates a live portfolio using mark-to-market PnL computed at settlement. Live mode is enabled by --live or AGENT_LIVE_MODE=true.
§ 16Pre-Signal Filters
Six filters are applied before any pricing or database lookup — ordered cheapest to most expensive. The first failing check short-circuits evaluation.
| # | Filter | Default Threshold | Reason |
|---|---|---|---|
| 1 | Priceable series | futures_root ≠ "" | Series must map to a futures root; unknown series skipped entirely. |
| 2 | Deduplication | Not in v_open_positions | Avoid accumulating duplicate positions in the same contract. |
| 3 | Minimum volume | 500 contracts | Illiquid contracts have wide spreads and unreliable mid-prices. |
| 4 | Mid-price range | 5% – 95% | Deep OTM/ITM contracts are near-resolved; edge signal is noisy. |
| 5 | TTX / DTE window | 2h – 26h (short) or 1d – 14d (medium) | Avoid contracts too close (no time to act) or too far (vol estimates unreliable). |
| 6 | Spread quality | yes_spread / yes_mid ≤ 8% | Wide bid/ask makes the mid-price an unreliable proxy for fair value. |
TTX Regime Split
A contract is classified as short-term if dte_hours ≤ 26 (approximately one trading day). Short-term contracts use hourly volatility and TTX buckets. Medium-term contracts use daily DTE and daily vol. The agent applies separate TTX/DTE range filters for each regime.
§ 17Confidence Regression and Adjusted Edge
A raw-edge threshold alone ignores whether the pricing model's edge estimates are systematically over- or under-stated in specific market segments. The agent estimates a linear regression from historical raw_edge to realised PnL per dollar risked, fitting two coefficients that capture bias and calibration quality independently.
Training Data
OLS Regression
Interpreting the coefficients:
- $\alpha$ (intercept — bias): expected PnL when raw_edge = 0. If $\alpha < 0$, the model loses on zero-edge trades — systematic overconfidence or adverse selection.
- $\beta$ (slope — calibration): additional realised PnL per unit of raw_edge. $\beta = 1$ means raw edge is perfectly predictive; $\beta < 1$ means edges are overstated; $\beta > 1$ means the model is conservative.
Adjusted Edge Formula
Fallback Prior
When fewer than min_samples (default 50) resolved runs exist for a segment, the agent falls back to a conservative neutral prior:
Confidence Hierarchy
| Level | Segment | Example Label |
|---|---|---|
| 1 — Most specific | Series + direction + TTX/DTE bucket | KXGOLDD/above/4-8h |
| 2 | Series + TTX/DTE bucket (any direction) | KXGOLDD/4-8h |
| 3 | Global + TTX/DTE bucket (all series) | global/4-8h |
| 4 — Fallback | Global average (all segments) | global_average |
TTX buckets for short-term: <2h, 2–4h, 4–8h, 8–16h, 16–26h. DTE buckets for medium-term: 1d, 2–3d, 4–7d, 8–14d.
§ 18Signal Classification
Two thresholds govern classification:
- $\theta$ — the segment-overridden minimum adjusted edge. Starts at
min_adjusted_edge(default 0.072) and may be raised/lowered by amin_edge_override. Used for the watch/buy boundary. - $\theta_0$ — the global, unoverridden
config.min_adjusted_edge(default 0.072). Used exclusively for the auto-buy/clear-buy boundary at $2\theta_0$. Segment overrides never move this line.
| Tier | Condition | Recorded Action |
|---|---|---|
| Watch | $\text{adjusted\_edge} < \theta$ | watch — zero size, DB row written for analysis |
| Auto-Buy | $\theta \le \text{adjusted\_edge} < 2\theta_0$ | buy — rule-based, proceeds immediately to sizing |
| Clear-Buy | $\text{adjusted\_edge} \ge 2\theta_0$ | buy — same execution as auto-buy; label preserved in analytics for high-confidence tracking |
Before any buy proceeds to sizing, raw_edge is checked against max_raw_edge_sanity (default 0.25). If raw_edge ≥ 0.25, the classification is forced to watch with skip reason raw_edge_implausible:<value>. Sane liquid markets do not leave 25 percentage points on the table; a reading this extreme almost certainly signals a stale forward price or input error.
Separating clear-buy from auto-buy in the DB enables the periodic review to analyse whether the model's highest-confidence signals outperform relative to the auto-buy tier — without requiring a per-cycle LLM call to generate the distinction.
LLM Observability (Startup Preflight)
| Status | Condition |
|---|---|
ok | API key present — periodic review LLM will be available |
error | Key missing — periodic review runs rule-based findings only, no Claude synthesis |
§ 19Position Sizing
Kelly Criterion for Binary Contracts
A Kalshi YES contract bought at price $p_{\text{mkt}}$ has payoff structure: win (probability $\hat{p}$) profit = $1 - p_{\text{mkt}}$; lose (probability $1 - \hat{p}$) loss = $p_{\text{mkt}}$. The Kelly fraction maximises expected log-wealth growth. For net odds $b = (1 - p_{\text{mkt}}) / p_{\text{mkt}}$:
A half-Kelly multiplier $\lambda = 0.5$ and a hard cap $f_{\text{max}}$ (default 10% of bankroll) reduce variance and ruin risk:
Cash-on-Hand Sizing and EV-Ranked Allocation
Within a single cycle, candidates are funded in decreasing order of $f^*$. After each buy, $W_{\text{avail}}$ is decremented before the next candidate is sized:
Full Kelly maximises long-run growth but produces extreme drawdowns and is highly sensitive to model error. Half-Kelly gives approximately 75% of the growth rate with much lower variance, and is standard practice for strategies with uncertain probability estimates.
Fixed Strategy
where $A$ is a constant dollar amount (default $50) and $\theta$ is the same segment-overridden threshold used for Kelly classification. Fixed is a benchmark: lower variance than Kelly but does not scale with edge conviction.
Fixed is a permanent shadow benchmark — it records decisions exactly as Kelly does, but no real Kalshi order is ever placed on behalf of the Fixed strategy.
Bankroll Tracking
This is the dollar profit from buying size_USD worth of contracts at $p_{\text{mkt}}$: each dollar spent buys $1/p_{\text{mkt}}$ contracts, each paying $1 or $0 at settlement. Snapshots are append-only and never amended.
§ 20Segment Overrides and Periodic Review
The agent runs a periodic review at most daily, or when review_sample_threshold (default 50) new resolutions have accumulated since the last review. The review identifies market segments where the confidence regression is systematically wrong and applies segment overrides that take effect immediately — no restart required.
Review Process
- Load analytics views (
v_by_ttx_bucket,v_by_dte_bucket,v_by_moneyness,v_strategy_comparison, etc.) - Flag segments with
n ≥ review_min_segment_n(default 30) and edge accuracy below 0.45 or above 0.65. - Call
claude-sonnet-4-6with tool use — the LLM can query allowed analytics views, inspect bankroll curves, and retrieve prior overrides. - LLM returns override recommendations as structured JSON.
- Overrides are written to
segment_overridesand take effect on the next agent cycle.
Override Types
| Type | Effect on Regression Coefficients |
|---|---|
confidence_multiplier |
$\hat{\beta}' = v \cdot \hat{\beta}$, $\hat{\alpha}$ unchanged. Use $v < 1$ to dampen over-trading; $v > 1$ to boost a reliably undervalued segment. |
exclude |
$\hat{\alpha}' = 0,\ \hat{\beta}' = 0\ \Rightarrow\ \text{adjusted\_edge} = 0$. Segment always produces watch/skip regardless of raw edge. Used for persistently negative realised PnL. |
min_edge_override |
Replaces min_adjusted_edge ($\theta$) for the specific segment. Raises the bar for low-quality segments or lowers it for highly reliable ones. |
The periodic review only deactivates prior permanent overrides when the LLM produces ≥1 valid replacement. A silent review never wipes the active override set — this fixed a live-trading incident where a review wiped a KXWTI exclude 47 seconds before 7 fresh KXWTI buys.
§ 21Live-Mode Safety Guards
The live execution path applies a layered set of safety checks introduced after Day-1 of live trading exposed several failure modes. All guards are evaluated in order; the first failing guard determines the skip reason.
Data-Quality Guards (Before Pricing)
| Guard | Threshold | Failure Action |
|---|---|---|
| Stale daily data | Last bar older than 96h | Raise FuturesDataError |
| Stale hourly data | Last bar older than 6h | Raise FuturesDataError |
| Frozen futures price | Price identical (|Δ| < 1e-4) for ≥5 consecutive hourly readings over 6h | Raise FuturesDataError |
| Stale Pyth feed | publish_time more than 180s behind wall-clock | Raise PythAPIError |
Per-Order Live Guards (Before Placing a Kalshi Order)
These guards evaluate against a per-cycle running state (_LiveCycleState) so two candidates within the same cycle cannot collectively breach a cap that each one would pass individually.
| Guard | Condition Checked | Skip Reason |
|---|---|---|
| No quoted ask | ask ≤ 0 for the evaluated side | no_ask |
| Insufficient live cash | Order cost (count × ask) exceeds running working cash | insufficient_live_cash:<avail><<cost> |
| Directional asymmetry | (series, direction) already has ≥ max_same_direction_buys_per_cycle (default 3) buys | directional_asymmetry:… |
| Per-series exposure cap | Post-trade series exposure / account value > 20% | series_exposure_cap:… |
| Portfolio exposure cap | Post-trade total open exposure / account value > 40% | portfolio_exposure_cap:… |
Open exposure is computed only for rows where order_status IN ('pending', 'filled') AND kalshi_order_id IS NOT NULL. Rows where placement crashed or the order was cancelled are excluded, preventing a failed placement from leaving phantom exposure that would artificially inflate position caps.
All live orders are limit only, price clamped 1–99¢. Market orders are not supported by design: a stale or buggy signal produces an unfilled resting order rather than sweeping the book at an adverse price.
§ 22Model Limitations and Assumptions
Log-Normal Dynamics
The model assumes continuous log-normal price paths. Real commodity prices exhibit:
- Jump risk: sudden gap moves from OPEC announcements, weather events, geopolitical shocks
- Fat tails: extreme moves occur more frequently than the normal distribution predicts
- Mean reversion: energy prices tend to revert to long-run equilibrium cost of production
Constant Forward Price
The forward price $F$ is interpolated from today's futures strip and held constant throughout the decay curve. In practice, $F$ shifts as new information arrives.
Constant Volatility
A single annualised $\sigma$ is used throughout the remaining contract life. Real markets exhibit a volatility term structure and a volatility smile. The engine uses a single number — adequate for ballpark fair-value estimation, not for precision near-the-money pricing.
Risk-Neutral vs. Real-World Measure
The engine operates under $\mathbb{Q}$, using the futures price as the expected terminal value. Risk premiums (e.g., the crude oil risk premium embedded in backwardation) are implicitly absorbed into the futures price but not separately modelled.
Short-Term Forward Price Accuracy
For very short-dated contracts, the front-month futures price is a reasonable proxy for the eventual settlement price, but intraday basis can be significant. A more precise implementation would use the specific expiry-month contract's last traded price.
§ 23Glossary
| Term | Definition |
|---|---|
| $F$ | Forward price — futures price interpolated to the settlement month |
| $K$ | Strike price — target price level in the Kalshi contract (e.g. $80) |
| $\sigma$ | Annualised volatility of log-returns |
| $\sigma_T$ | Total volatility over remaining life: $\sigma\sqrt{T}$ |
| $T$ | Time to expiry in years: $\tau/365$ (long-term) or $h/(365 \times 23)$ (short-term) |
| $d_2$ | Standardised distance from strike: $(\ln(F/K) - \tfrac{1}{2}\sigma_T^2)/\sigma_T$ |
| $N(\cdot)$ | Standard normal CDF |
| $p_{\text{base}}$ | Log-normal probability from the distribution fit |
| $p_{\text{tv}}$ | Time-value point estimate (same formula at today's DTE) |
| $\hat{p}$ | Final YES fair value — the engine's $P(\text{target met})$, equal to $p_{\text{tv}}$ |
| $p_{\text{YES,mkt}}$ | YES mid-price from Kalshi: (yes_bid + yes_ask) / 2 |
| $p_{\text{NO,mkt}}$ | NO mid-price from Kalshi: (no_bid + no_ask) / 2 — independently quoted, not 1 − YES mid |
| edge | $\hat{p} - p_{\text{YES,mkt}}$ for YES; $(1-\hat{p}) - p_{\text{NO,mkt}}$ for NO |
| DTE | Days to expiry (whole calendar days) |
| TTX | Time to expiry in hours/minutes (short-term display) |
| ITM | In-the-money: $F > K$ for an "above" contract |
| OTM | Out-of-the-money: $F < K$ for an "above" contract |
| ATM | At-the-money: $F \approx K$ |
| $\mathbb{Q}$ | Risk-neutral probability measure; futures price = $\mathbb{E}^{\mathbb{Q}}[S_T]$ |
| Futures strip | Set of futures prices across all liquid delivery months |
| Log-normal | Distribution where $\ln X \sim \mathcal{N}(\mu, \sigma^2)$; guarantees $X > 0$ |
| Cash-or-nothing digital | Option paying a fixed cash amount if underlying exceeds strike; equivalent to a Kalshi YES payout |
| Brier Score (BS) | Mean squared error of probability forecasts vs. binary outcomes; range [0, 1], lower is better |
| Brier Skill Score (BSS) | $1 - \text{BS}_\text{model}/\text{BS}_\text{market}$; positive means model beats market price as a probability estimate |
| Calibration | When the model says $p\%$, the event occurs $p\%$ of the time across many predictions |
| MAE | Mean absolute error: average $|\hat{p} - o|$ across resolved contracts |
| Bias | Mean signed error $(\hat{p} - o)$; positive = model over-estimates probability |
| Edge accuracy | Fraction of runs where edge direction matched the actual outcome |
| raw_edge | Unadjusted edge signal directly from the pricing engine |
| adjusted_edge | $\alpha + \beta \cdot \text{raw\_edge}$ — scaled by OLS regression coefficients |
| $\alpha$ (intercept) | Expected realised PnL when raw_edge = 0; captures systematic model bias |
| $\beta$ (slope) | Additional realised PnL per unit of raw_edge; $\beta=1$ = perfectly calibrated |
| pyth_spot | Settlement kind for KXGOLDD/KXSILVERD: settles on Pyth Network XAU/USD or XAG/USD, not COMEX futures |
| spot_basis | Pyth spot − futures front-month price, stored in distribution_params for pyth_spot series |
| $\theta_0$ (global threshold) | Unoverridden config.min_adjusted_edge (default 0.072); the auto-buy/clear-buy split at $2\theta_0$ |
| Shadow mode | Agent mode where decisions are logged as if trades were placed but no order is submitted to Kalshi |
| Kelly fraction | Optimal bet size as fraction of bankroll: $f^* = \text{edge}/(1-p_{\text{mkt}})$; half-Kelly ($\lambda=0.5$) used in practice |
| Segment override | Per-segment adjustment to α, β, or θ written by periodic review; takes effect next cycle |
| Clear-buy | Contract classified at adjusted_edge ≥ 2θ; same execution as auto-buy, label preserved in analytics |
| TTX bucket | Time-to-expiry range bucket for short-term contracts (<26h): e.g. "4-8h" |