A short glossary of recurring concepts. Each entry is one or two sentences; for the full treatment, follow the cross-references in parentheses.
- Active subspace.
- The leading eigenspace of the gradient outer-product matrix . Captures the linear directions in input space along which a function varies most; allows GPs to scale to high Constantine, 2015.
- Active subspace, deep.
- Replaces the linear projection by a learned nonlinear encoder , trained jointly with an MLP link so that ; gradient-free, with chosen by a validation-MSE elbow instead of a spectral gap (Section Section 9.5.1; Tripathy & Bilionis (2018)).
- Adam, AdamW.
- Adaptive stochastic-gradient optimizers using momentum on the gradient and on its square; AdamW separates the weight-decay step from the adaptive step Kingma & Ba, 2015Loshchilov & Hutter, 2019.
- Approximate aggregation.
- Empirical observation that in Krusell--Smith-class economies the cross-sectional distribution of wealth is nearly summarized by its mean for price forecasting purposes (Chapter Chapter 6).
- Automatic differentiation (AD).
- Algorithmic computation of exact derivatives of composite functions via the chain rule; reverse-mode AD is the engine of every deep-learning framework Baydin et al., 2018Margossian, 2019.
- Bayesian active learning (BAL).
- Adaptive sample-design strategy in which the next training point is chosen to maximize an acquisition function based on predictive uncertainty; pairs naturally with GPs (Chapter Chapter 9).
- Bellman equation.
- Recursive characterization of the value function in a discrete-time dynamic program. Continuous-time analogue is the HJB equation.
- Brock--Mirman model.
- Stochastic neoclassical growth model with log utility and full depreciation that admits a closed-form policy ; the canonical DEQN benchmark in this script.
- Common random numbers (CRN).
- Variance-reduction technique in which the same shock realisations are reused across simulations of different parameter values, removing simulation noise from comparisons Glasserman, 2004.
- Collocation point.
- A spatial location at which a PDE residual is evaluated and minimized in a PINN training loop (Chapter Chapter 7).
- Curse of dimensionality.
- Exponential blow-up of grid-based methods in the dimension of the state space; mitigated, not eliminated, by neural-network and GP approximators.
- Deep Equilibrium Net (DEQN).
- A neural-network-based solver for dynamic stochastic equilibrium models that minimizes the equilibrium-equation residuals directly via SGD (Chapter Chapter 2).
- Deep Galerkin Method (DGM).
- An LSTM-style architecture introduced by Sirignano & Spiliopoulos (2018) for solving high-dimensional PDEs; the architectural sibling of standard PINNs.
- Deep kernel learning (DKL).
- Composes a neural-network feature extractor with a GP layer in the learned feature space Wilson et al., 2016.
- DeepONet.
- A neural architecture for operator learning: a branch net encodes the input function and a trunk net encodes the query point; the inner product is the predicted output Lu et al., 2021.
- EMINN.
- Economic-Model Informed Neural Network: a PINN-style approach to the master equation in continuous-time HA models Gu et al., 2024.
- Ergodic distribution.
- The stationary distribution of a Markov process; in a Krusell--Smith economy it is the long-run distribution of wealth across agents.
- Fischer--Burmeister (FB).
- A smooth complementarity function used to encode KKT conditions in differentiable losses; the opposite sign has the same zero set but the chapter and notebooks use this convention Fischer, 1992.
- Fourier Neural Operator (FNO).
- Operator-learning architecture parameterizing a kernel integral operator in Fourier space; cheap and resolution-invariant Li et al., 2021.
- Functional derivative.
- , the density / Riesz representer of the Fréchet derivative of with respect to a function-valued argument (equivalently, the directional derivative of at in the direction of a Dirac perturbation ); appears in the master equation (Chapter Chapter 8).
- Gauss--Hermite quadrature.
- Polynomial quadrature rule for integrals against the standard normal density; backbone of the expectations step in DEQNs.
- HJB equation.
- Hamilton--Jacobi--Bellman equation; continuous-time analogue of the Bellman equation, a PDE in the value function.
- Histogram (Young 2010).
- Mass-redistribution scheme on a fixed grid that propagates a wealth distribution deterministically without Monte Carlo noise (Chapter Chapter 6).
- Hyperband.
- Successive-halving multi-fidelity hyperparameter scheduler that explores many configurations cheaply and concentrates budget on the survivors Li et al., 2018.
- Inducing points.
- A small set of pseudo-data points used in sparse GPs to approximate the full kernel matrix at cost Titsias, 2009.
- Itô’s lemma.
- The chain rule of stochastic calculus; for the scalar diffusion , the only difference from ordinary calculus is the second-order correction .
- Karush--Kuhn--Tucker (KKT) conditions.
- First-order necessary conditions for constrained optimization; encoded smoothly via Fischer--Burmeister in DEQN losses.
- Kolmogorov forward equation (KFE / Fokker--Planck).
- The PDE governing the time evolution of the probability density of an Itô process.
- Marginal likelihood.
- The log-evidence in a GP; sum of a data-fit term and a complexity penalty (Chapter Chapter 9).
- Master equation.
- A single PDE that subsumes the HJB, KFE, and market-clearing conditions of a continuous-time mean-field-game equilibrium; argument includes the cross-sectional measure .
- Mean field game (MFG).
- Equilibrium concept in which each atomistic agent best-responds to the cross-sectional distribution and the distribution evolves under those best responses; the natural framework for the HJB+KFE system Lasry & Lions, 2007.
- Neural Tangent Kernel (NTK).
- In the infinite-width limit, gradient-descent training of a deep network is equivalent to kernel regression with the (deterministic) NTK Jacot et al., 2018.
- Operator learning.
- Learning a map between function spaces (input field solution function) rather than a single solution; DeepONet and FNO are the leading architectures.
- Physics-Informed Neural Network (PINN).
- A neural network trained by minimizing a PDE residual at collocation points plus boundary-condition penalties Raissi et al., 2019.
- Pseudo-state.
- Treating model parameters as additional inputs to a neural network so that the trained surrogate covers an entire parameter range without retraining (Chapter Chapter 9).
- Quasi-Monte Carlo (QMC).
- Deterministic low-discrepancy sequences (Sobol, Halton, Niederreiter) achieving error rates close to for smooth integrands.
- ReLoBRaLo.
- Adaptive loss-balancing scheme that reweights multi-component losses by recent relative-decrease ratios Bischof & Kraus, 2025.
- Simulated Method of Moments (SMM).
- Estimator that matches simulated to empirical moments; the natural extension of GMM when moments lack closed form McFadden, 1989.
- Simulation-based inference (SBI).
- Modern likelihood-free Bayesian inference using neural conditional density estimators Cranmer et al., 2020.
- Social cost of carbon (SCC).
- Marginal welfare cost of one additional unit of emissions, commonly reported as USD/tCO after choosing the consumption numeraire and applying the carbon-to-CO conversion; the headline policy number from a climate IAM.
- Sobol / Shapley indices.
- Variance-decomposition tools for global sensitivity analysis. Sobol decompositions are cleanest under independent inputs; Shapley effects allocate variance across inputs and can be defined for dependent inputs when the dependence structure is modeled explicitly.
- Universal approximation.
- A single-hidden-layer network with a non-polynomial activation can approximate any continuous function on a compact set arbitrarily well Cybenko, 1989Hornik et al., 1989.
- Value Function Iteration (VFI).
- Classical contraction-mapping algorithm for solving the Bellman equation by iterating the Bellman operator until convergence.
- Young’s lottery.
- The unique two-point split that, when applied to off-grid policy choices, preserves the conditional mean exactly; the building block of the histogram update.
- Constantine, P. G. (2015). Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies. SIAM.
- Tripathy, R. K., & Bilionis, I. (2018). Deep UQ: Learning Deep Neural Network Surrogate Models for High Dimensional Uncertainty Quantification. Journal of Computational Physics, 375, 565–588.
- Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR).
- Loshchilov, I., & Hutter, F. (2019). Decoupled Weight Decay Regularization. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=Bkg6RiCqY7
- Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2018). Automatic Differentiation in Machine Learning: a Survey. Journal of Machine Learning Research, 18(153), 1–43. http://jmlr.org/papers/v18/17-468.html
- Margossian, C. C. (2019). A Review of Automatic Differentiation and its Efficient Implementation. WIREs Data Mining and Knowledge Discovery, 9(4), e1305.
- Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering (Vol. 53). Springer. 10.1007/978-0-387-21617-1
- Sirignano, J., & Spiliopoulos, K. (2018). DGM: A Deep Learning Algorithm for Solving Partial Differential Equations. Journal of Computational Physics, 375, 1339–1364.
- Wilson, A. G., Hu, Z., Salakhutdinov, R., & Xing, E. P. (2016). Deep Kernel Learning. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 370–378.
- Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3), 218–229. 10.1038/s42256-021-00302-5
- Gu, Z., Lauriere, M., Merkel, S., & Payne, J. (2024). Global Solutions to Master Equations for Continuous Time Heterogeneous Agent Macroeconomic Models. https://arxiv.org/abs/2406.13726
- Fischer, A. (1992). A Special Newton-Type Optimization Method. Optimization, 24(3–4), 269–284.
- Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., & Anandkumar, A. (2021). Fourier Neural Operator for Parametric Partial Differential Equations. International Conference on Learning Representations.
- Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. Journal of Machine Learning Research, 18(185), 1–52.
- Titsias, M. K. (2009). Variational Learning of Inducing Variables in Sparse Gaussian Processes. Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS).