Glossary - Deep Learning for Solving and Estimating Dynamic Models in Economics and Finance

A short glossary of recurring concepts. Each entry is one or two sentences; for the full treatment, follow the cross-references in parentheses.

Active subspace.: The leading eigenspace of the gradient outer-product matrix $\E[\nabla f \nabla f^\top]$ . Captures the linear directions in input space along which a function varies most; allows GPs to scale to high $d$ Constantine, 2015.
Active subspace, deep.: Replaces the linear projection $U_m^\top \x$ by a learned nonlinear encoder $h\colon \R^D \to \R^d$ , trained jointly with an MLP link $g\colon \R^d \to \R$ so that $\hat f(\xi) = g(h(\xi))$ ; gradient-free, with $d$ chosen by a validation-MSE elbow instead of a spectral gap (Section Section 9.5.1; Tripathy & Bilionis (2018)).
Adam, AdamW.: Adaptive stochastic-gradient optimizers using momentum on the gradient and on its square; AdamW separates the weight-decay step from the adaptive step Kingma & Ba, 2015Loshchilov & Hutter, 2019.
Approximate aggregation.: Empirical observation that in Krusell--Smith-class economies the cross-sectional distribution of wealth is nearly summarized by its mean for price forecasting purposes (Chapter Chapter 6).
Automatic differentiation (AD).: Algorithmic computation of exact derivatives of composite functions via the chain rule; reverse-mode AD is the engine of every deep-learning framework Baydin et al., 2018Margossian, 2019.
Bayesian active learning (BAL).: Adaptive sample-design strategy in which the next training point is chosen to maximize an acquisition function based on predictive uncertainty; pairs naturally with GPs (Chapter Chapter 9).
Bellman equation.: Recursive characterization of the value function in a discrete-time dynamic program. Continuous-time analogue is the HJB equation.
Brock--Mirman model.: Stochastic neoclassical growth model with log utility and full depreciation that admits a closed-form policy $s^\star = \alpha\beta$ ; the canonical DEQN benchmark in this script.
Common random numbers (CRN).: Variance-reduction technique in which the same shock realisations are reused across simulations of different parameter values, removing simulation noise from comparisons Glasserman, 2004.
Collocation point.: A spatial location at which a PDE residual is evaluated and minimized in a PINN training loop (Chapter Chapter 7).
Curse of dimensionality.: Exponential blow-up of grid-based methods in the dimension of the state space; mitigated, not eliminated, by neural-network and GP approximators.
Deep Equilibrium Net (DEQN).: A neural-network-based solver for dynamic stochastic equilibrium models that minimizes the equilibrium-equation residuals directly via SGD (Chapter Chapter 2).
Deep Galerkin Method (DGM).: An LSTM-style architecture introduced by Sirignano & Spiliopoulos (2018) for solving high-dimensional PDEs; the architectural sibling of standard PINNs.
Deep kernel learning (DKL).: Composes a neural-network feature extractor with a GP layer in the learned feature space Wilson et al., 2016.
DeepONet.: A neural architecture for operator learning: a branch net encodes the input function and a trunk net encodes the query point; the inner product is the predicted output Lu et al., 2021.
EMINN.: Economic-Model Informed Neural Network: a PINN-style approach to the master equation in continuous-time HA models Gu et al., 2024.
Ergodic distribution.: The stationary distribution of a Markov process; in a Krusell--Smith economy it is the long-run distribution of wealth across agents.
Fischer--Burmeister (FB).: A smooth complementarity function $\Phi(a,b) = a + b - \sqrt{a^2+b^2}$ used to encode KKT conditions in differentiable losses; the opposite sign has the same zero set but the chapter and notebooks use this convention Fischer, 1992.
Fourier Neural Operator (FNO).: Operator-learning architecture parameterizing a kernel integral operator in Fourier space; cheap and resolution-invariant Li et al., 2021.
Functional derivative.: $\delta V/\delta g$ , the density / Riesz representer of the Fréchet derivative of $V$ with respect to a function-valued argument $g$ (equivalently, the directional derivative of $V$ at $g$ in the direction of a Dirac perturbation $\delta_{y_0}$ ); appears in the master equation (Chapter Chapter 8).
Gauss--Hermite quadrature.: Polynomial quadrature rule for integrals against the standard normal density; backbone of the expectations step in DEQNs.
HJB equation.: Hamilton--Jacobi--Bellman equation; continuous-time analogue of the Bellman equation, a PDE in the value function.
Histogram (Young 2010).: Mass-redistribution scheme on a fixed grid that propagates a wealth distribution deterministically without Monte Carlo noise (Chapter Chapter 6).
Hyperband.: Successive-halving multi-fidelity hyperparameter scheduler that explores many configurations cheaply and concentrates budget on the survivors Li et al., 2018.
Inducing points.: A small set of pseudo-data points used in sparse GPs to approximate the full kernel matrix at $\mathcal{O}(nm^2)$ cost Titsias, 2009.
Itô’s lemma.: The chain rule of stochastic calculus; for the scalar diffusion $dX_t=\mu\,dt+\sigma\,dB_t$ , the only difference from ordinary calculus is the second-order correction $\tfrac{1}{2}f''(X_t)\sigma^2\,dt$ .
Karush--Kuhn--Tucker (KKT) conditions.: First-order necessary conditions for constrained optimization; encoded smoothly via Fischer--Burmeister in DEQN losses.
Kolmogorov forward equation (KFE / Fokker--Planck).: The PDE governing the time evolution of the probability density of an Itô process.
Marginal likelihood.: The log-evidence $\log p(\bm y \mid \bm\vartheta)$ in a GP; sum of a data-fit term and a complexity penalty (Chapter Chapter 9).
Master equation.: A single PDE that subsumes the HJB, KFE, and market-clearing conditions of a continuous-time mean-field-game equilibrium; argument includes the cross-sectional measure $g$ .
Mean field game (MFG).: Equilibrium concept in which each atomistic agent best-responds to the cross-sectional distribution and the distribution evolves under those best responses; the natural framework for the HJB+KFE system Lasry & Lions, 2007.
Neural Tangent Kernel (NTK).: In the infinite-width limit, gradient-descent training of a deep network is equivalent to kernel regression with the (deterministic) NTK Jacot et al., 2018.
Operator learning.: Learning a map between function spaces (input field $\to$ solution function) rather than a single solution; DeepONet and FNO are the leading architectures.
Physics-Informed Neural Network (PINN).: A neural network trained by minimizing a PDE residual at collocation points plus boundary-condition penalties Raissi et al., 2019.
Pseudo-state.: Treating model parameters as additional inputs to a neural network so that the trained surrogate covers an entire parameter range without retraining (Chapter Chapter 9).
Quasi-Monte Carlo (QMC).: Deterministic low-discrepancy sequences (Sobol, Halton, Niederreiter) achieving error rates close to $\mathcal{O}(1/M)$ for smooth integrands.
ReLoBRaLo.: Adaptive loss-balancing scheme that reweights multi-component losses by recent relative-decrease ratios Bischof & Kraus, 2025.
Simulated Method of Moments (SMM).: Estimator that matches simulated to empirical moments; the natural extension of GMM when moments lack closed form McFadden, 1989.
Simulation-based inference (SBI).: Modern likelihood-free Bayesian inference using neural conditional density estimators Cranmer et al., 2020.
Social cost of carbon (SCC).: Marginal welfare cost of one additional unit of emissions, commonly reported as USD/tCO $_2$ after choosing the consumption numeraire and applying the carbon-to-CO $_2$ conversion; the headline policy number from a climate IAM.
Sobol / Shapley indices.: Variance-decomposition tools for global sensitivity analysis. Sobol decompositions are cleanest under independent inputs; Shapley effects allocate variance across inputs and can be defined for dependent inputs when the dependence structure is modeled explicitly.
Universal approximation.: A single-hidden-layer network with a non-polynomial activation can approximate any continuous function on a compact set arbitrarily well Cybenko, 1989Hornik et al., 1989.
Value Function Iteration (VFI).: Classical contraction-mapping algorithm for solving the Bellman equation by iterating the Bellman operator until convergence.
Young’s lottery.: The unique two-point split that, when applied to off-grid policy choices, preserves the conditional mean exactly; the building block of the histogram update.

References¶

Constantine, P. G. (2015). Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies. SIAM.
Tripathy, R. K., & Bilionis, I. (2018). Deep UQ: Learning Deep Neural Network Surrogate Models for High Dimensional Uncertainty Quantification. Journal of Computational Physics, 375, 565–588.
Kingma, D. P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR).
Loshchilov, I., & Hutter, F. (2019). Decoupled Weight Decay Regularization. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=Bkg6RiCqY7
Baydin, A. G., Pearlmutter, B. A., Radul, A. A., & Siskind, J. M. (2018). Automatic Differentiation in Machine Learning: a Survey. Journal of Machine Learning Research, 18(153), 1–43. http://jmlr.org/papers/v18/17-468.html
Margossian, C. C. (2019). A Review of Automatic Differentiation and its Efficient Implementation. WIREs Data Mining and Knowledge Discovery, 9(4), e1305.
Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering (Vol. 53). Springer. 10.1007/978-0-387-21617-1
Sirignano, J., & Spiliopoulos, K. (2018). DGM: A Deep Learning Algorithm for Solving Partial Differential Equations. Journal of Computational Physics, 375, 1339–1364.
Wilson, A. G., Hu, Z., Salakhutdinov, R., & Xing, E. P. (2016). Deep Kernel Learning. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 370–378.
Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3), 218–229. 10.1038/s42256-021-00302-5
Gu, Z., Lauriere, M., Merkel, S., & Payne, J. (2024). Global Solutions to Master Equations for Continuous Time Heterogeneous Agent Macroeconomic Models. https://arxiv.org/abs/2406.13726
Fischer, A. (1992). A Special Newton-Type Optimization Method. Optimization, 24(3–4), 269–284.
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., & Anandkumar, A. (2021). Fourier Neural Operator for Parametric Partial Differential Equations. International Conference on Learning Representations.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. Journal of Machine Learning Research, 18(185), 1–52.
Titsias, M. K. (2009). Variational Learning of Inducing Variables in Sparse Gaussian Processes. Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS).

A Glossary