Writing

Notes on RL, optimization, numerical methods, and applied techniques.

Posts

  • Variance reduction and GAE — The same trick shows up in Monte Carlo (control variates), trading (minimum-variance hedge), quant finance (excess return), and policy-gradient RL (the advantage function). A walk through the unified formula and modern policy gradient methods like GAE and GRPO.

  • Fisher Information again and again — The same matrix appears as a preconditioner in natural gradient, a constraint in TRPO, a diagonal approximation in Adam, an ingredient in NES, and a lower bound in Cramér-Rao.

  • Goal programming: hyperparameters as economic coefficients — Lagrangian multipliers as shadow prices: prices we measure, dials we tune, constraints we cannot violate, and some questions about RLHF penalty terms.

  • Personalization and RLHF — Parallels between RankNet/LambdaMART and DPO, both built on Bradley-Terry.

  • Robustness and clipping — A guardrail against bad inputs can go on the input itself, with robust estimators like winsorization, MAD, or MCD, or on the function output, with gradient clipping, PPO ratio clipping, or per-trade PnL clipping.

  • Binary search and bisection — A one-stop C++23 template that solves integer search, bisection root-finding, bond yield-to-maturity, and inverse CDF sampling.

  • Systems under load — Little’s Law, M/M/1 wait times, and Amdahl’s law, applied to software systems and to a trading desk in a sell-off.

Code