Climate Economics and Deep Uncertainty Quantification - Deep Learning for Solving and Estimating Dynamic Models in Economics and Finance

This chapter brings together the methods developed throughout this script and applies them to one of the most consequential computational challenges in economics: climate change policy. Integrated assessment models (IAMs) couple economic growth with the carbon cycle, temperature dynamics, and climate damages, creating high-dimensional nonstationary dynamic programming problems that are ideal candidates for the DEQN and surrogate methods we have developed. We present the CDICE model of Folini et al. (2025), solve it with DEQNs, and then use GP surrogates and Bayesian active learning, first for deep uncertainty quantification Friedl et al., 2023, and then, applying the same surrogate-then-optimize machinery to a different OLG model and a different surrogate, to search over policy parameters and derive constrained Pareto-improving carbon tax rules in an OLG--IAM with deep uncertainty Kübler et al., 2026. This last step illustrates a general use of surrogates that goes beyond estimation and UQ: once the structural model has been mapped into a fast, differentiable surrogate, the costly outer loop of an optimal-policy search in a dynamic stochastic heterogeneous-agent economy collapses into a small optimization on the surrogate. For broader overviews of climate economics and IAMs, see Hassler et al. (2016) on environmental macroeconomics, Dietz (2024) on IAMs, Fernández-Villaverde et al. (2025) on the intersection of climate economics and deep learning, and Ploeg & Rezai (2026) on the macroeconomics of climate policy.

11.1The Macroeconomics of Climate Change¶

Climate change is a global externality: the emissions of each agent affect the welfare of all agents, including future generations that have no say in current decisions. Unlike standard market failures, the climate externality operates across centuries, involves deep scientific uncertainty, and couples the macroeconomy with the earth system in both directions. Recent overviews include Hassler et al. (2016) on environmental macroeconomics, Dietz (2024) on IAMs, Fernández-Villaverde et al. (2025) on climate economics and deep learning, and Ploeg & Rezai (2026) on the macroeconomics of climate policy.

11.1.0.1Integrated assessment models.¶

Integrated assessment models (IAMs) formalize this coupling. The economy produces output and emissions; emissions accumulate in the atmosphere and raise global temperature; temperature increases cause damages that reduce output. The feedback loop is closed (Figure Figure 11.1):

$The integrated-assessment feedback loop. The economy produces output and CO_2 emissions; emissions accumulate in the atmosphere and raise global mean temperature (\Delta T); higher temperatures generate damages that reduce output and consumption, which in turn shape the path of future emissions. An IAM closes this loop and uses it to quantify the welfare cost of additional emissions, summarized by the social cost of carbon .$

Figure 11.1:The integrated-assessment feedback loop. The economy produces output and CO $_2$ emissions; emissions accumulate in the atmosphere and raise global mean temperature ( $\Delta T$ ); higher temperatures generate damages that reduce output and consumption, which in turn shape the path of future emissions. An IAM closes this loop and uses it to quantify the welfare cost of additional emissions, summarized by the social cost of carbon (11.1).

The central output of an IAM is the social cost of carbon (SCC): the marginal welfare cost of one additional unit of CO $_2$ emissions, measured in consumption-equivalent units. When emissions are measured in GtC (gigatons of carbon), the SCC has units of consumption per GtC. Conversion to USD per tCO $_2$ requires first applying the consumption-to-USD numeraire and then converting the carbon mass unit: one tCO $_2$ contains $12/44$ tons of carbon, so a price expressed per ton of carbon is divided by $44/12$ to obtain the corresponding price per ton of CO $_2$ (and a GtC price is also divided by 10⁹). Formally,

\mathrm{SCC}_t = -\frac{\partial V_t / \partial E_t}{\partial V_t / \partial C_t},

(11.1)

where $V_t$ is the value function, $E_t$ is contemporaneous emissions, and $C_t$ is consumption. The flow form is linked to the stock-based form $\mathrm{SCC}^M_t = -(\partial V_t/\partial M^{\mathrm{AT}}_t)/(\partial V_t/\partial C_t)$ , derived in Section Section 11.6, by the chain rule

\frac{\partial V_t}{\partial E_t} \;=\; \frac{\partial V_{t+1}}{\partial M^{\mathrm{AT}}_{t+1}}\,\frac{\partial M^{\mathrm{AT}}_{t+1}}{\partial E_t}

(11.2)

together with the carbon-to-CO $_2$ unit conversion noted above. In a first-best allocation, the optimal carbon tax equals the SCC Golosov et al., 2014. The SCC is high when climate damages are steep, the climate response is strong, discounting is low, and tipping risks are material Cai & Lontzek, 2019Dietz, 2024.

11.1.0.2From surrogates to climate IAMs.¶

Chapter Chapter 9 introduced surrogates and Bayesian active learning as fast approximators for repeated model evaluations. Climate IAMs are the natural application: each parameter configuration (climate sensitivity, damage curvature, discount rate) is expensive to solve, yet policy questions require evaluating thousands of configurations to map out tail risks and Pareto-improving rules. The DEQN approach of Chapters Chapter 2--Chapter 3, combined with the GP and active-learning toolkit of Chapter Chapter 9, is therefore the natural workhorse for climate-policy uncertainty quantification.

11.1.0.3Why computation matters.¶

Solving IAMs globally, as opposed to linearization or certainty equivalence, is computationally demanding for several reasons:

Nonstationarity: TFP, population, emissions intensity, and radiative forcing all change exogenously over time, so the policy function cannot be time-invariant.
Coupled dynamics: the economy and climate interact in both directions through emissions and damages.
Long horizon: welfare effects unfold over 100--300 years, requiring stable numerical solutions far from the steady state.
Curse of dimensionality: multiple climate state variables (carbon stocks, temperature layers), stochastic shocks, and uncertain parameters raise the dimension of the state space well beyond what standard grid-based methods can handle.

The deep learning toolkit developed in this course (DEQNs, deep surrogates, and GP-based uncertainty quantification) is therefore particularly well suited to climate economics.

11.1.0.4The three movements of this chapter.¶

The remainder of this chapter has three movements. Movement 1 (Section 11.3--Section 11.5) makes precise what changes when we ask the Deep Equilibrium Network of Chapter Chapter 2 to solve a non-stationary IAM, and presents the modified training algorithm in one labeled box. Movement 2 (Paragraph--Section 11.8) puts that algorithm to work on a concrete stochastic DICE economy. Movement 3 (Section 11.9--Section 11.12) sketches the four extensions that matter for serious climate-finance research: Bayesian learning on the climate sensitivity, recursive Epstein--Zin preferences, global uncertainty quantification of the social cost of carbon, and constrained Pareto-improving carbon-tax design in a heterogeneous-agent IAM.

11.2The DICE Model¶

11.2.1The IAM Landscape¶

DICE is the workhorse of this chapter, but it is one of several integrated assessment models in active use. The list below summarizes the active landscape; each model trades global parsimony for regional or sectoral granularity, and computational tractability for fidelity of the climate physics.

DICE.: Global aggregate; 3-box carbon cycle and 2-layer energy-balance model; one-sector Ramsey planner. The standard benchmark for SCC and integrated policy analysis Nordhaus, 2017.
RICE.: Twelve-region extension of DICE with trade Nordhaus & Yang, 1996. Used for regional SCC and equity questions.
CDICE.: A global DICE-2016 recalibration tailored to deep-learning solution methods, with Epstein--Zin preferences and OLG variants. The model used in Section 11.6 below Folini et al., 2025.
ACE.: Analytic Climate Economy: log-linear approximations to the carbon cycle, temperature dynamics, and damages yield a closed-form optimal carbon tax Traeger, 2023. Acts as an analytic benchmark for the numerical SCC computed below.
FaIR / MAGICC.: Reduced-complexity climate emulators that take emissions as input and produce temperature responses; widely used to translate IPCC scenarios to economic models.
WITCH / REMIND.: Multi-region IAMs with full energy-system modules; standard for mitigation-pathway and technology-portfolio studies. Outside the scope of this script.

CDICE is the model we solve in this chapter. ACE provides a useful analytic shadow for it, in particular a closed-form SCC that decomposes transparently into structural parameters; we do not derive that closed form here, but Exercise 11.3 asks the reader to compute it from Traeger (2023) and compare against the DEQN-trained CDICE solution as an external sanity check.

The Dynamic Integrated model of Climate and the Economy (DICE), developed by Nordhaus (1994), is the most influential IAM; in this chapter we follow the variant of Nordhaus (2017) as recalibrated by Folini et al. (2025). It couples a neoclassical growth model with a reduced-form climate module in a single global framework. The remainder of this section builds the model up block by block, in increasing complexity: first the macro-economic backbone (Section 11.2.2), then the emissions and abatement technology (Section 11.2.3), then the climate physics (Section 11.2.5--Section 11.2.6), and finally the damage feedback (Section 11.2.7) that closes the loop. A consolidated calibration is given in Table Table 11.2.

11.2.1.1Time-step convention.¶

Following Folini et al. (2025) we calibrate CDICE on an annual time step, $\Delta_t = 1$ year, so that all rates in Table Table 11.2 (capital depreciation $\delta$ , pure rate of time preference $\rho$ , the decay rates $g^{\sigma}_0, \delta^{\sigma}, g^{\mathrm{back}}, \delta^{\mathrm{Land}}$ , the carbon-cycle transfer rates $b_{12}, b_{23}$ , and the temperature-block coefficients $c_1, c_3, c_4$ ) are read directly as annual values; the original DICE-2016 calibration of Nordhaus (2017), by contrast, hard-wires a 5-year time step into its coefficients. Growth rates of TFP and population are written as annual log changes, $g^A_t := \ln(A_{t+1}/A_t)$ and $g^L_t := \ln(L_{t+1}/L_t)$ , and the dynamics (11.14), (11.16)--(11.17) and the FOC residuals of Section 11.6 therefore carry no $\Delta_t$ multipliers; emissions $E_t$ entering (11.14) are the annual total. Switching to a non-annual $\Delta_t$ amounts to reinserting the multiplications $\Delta_t \cdot \{g^{\sigma}_0, \delta^{\sigma}, g^{\mathrm{back}}, \delta^{\mathrm{Land}}, b_{12}, b_{23}, c_1, c_3, c_4\}$ in the obvious places, the time-step-generic form discussed in Online Appendix D of Folini et al. (2025).

11.2.2Production and the Ramsey--Cass--Koopmans backbone¶

Strip away the climate block and DICE is just a neoclassical growth model with population and TFP growth. A single representative firm produces gross output with Cobb--Douglas technology in capital and effective labor,

Y^{\mathrm{gross}}_t \;=\; K_t^{\alpha}\,(A_t L_t)^{1-\alpha},

(11.3)

where $\alpha\in(0,1)$ is the capital share, $A_t$ is total factor productivity, and $L_t$ is population. Both $A_t$ and $L_t$ follow deterministic but time-varying paths: $A_t$ trends because of exogenous productivity growth, and $L_t$ follows the calibrated demographic projection of Nordhaus (2017). The capital stock evolves under the standard accumulation law

K_{t+1} \;=\; (1-\delta) K_t + I_t,

(11.4)

with depreciation rate $\delta$ and gross investment $I_t$ . The economy’s resource constraint, written in terms of net (after-damages, after-abatement) output that we develop in Section 11.2.3--Section 11.2.7, is $C_t + I_t \le Y^{\mathrm{net}}_t$ , where $C_t$ is aggregate consumption.

A benevolent planner picks $(C_t,\, I_t,\, \mu_t)_{t\ge 0}$ to maximize a discounted CRRA-IES felicity sum,

V_0 \;=\; \sum_{t=0}^{\infty} \beta_t\, L_t\,\frac{(C_t/L_t)^{1-1/\psi}-1}{1-1/\psi}, \qquad \beta_t \;=\; \exp(-\rho\,\Delta_t \cdot t),

(11.5)

with intertemporal-elasticity-of-substitution parameter $\psi>0$ and pure rate of time preference $\rho$ . This is the time-additive aggregator of the standard Ramsey--Cass--Koopmans growth model; we replace it with the recursive Epstein--Zin form once stochastic risk enters the picture (Section 11.10). The planner controls $\mu_t$ , the emissions abatement rate, in addition to the savings--consumption split; we develop the cost of abatement next.

11.2.3Industrial emissions, abatement, and the backstop technology¶

Industrial production is a CO $_2$ -emitting activity. Let $\sigma_t$ denote the carbon intensity of gross output, expressed in CDICE’s working units of 10³ GtC of emissions per unit of gross output (a 10³ GtC normalization on the carbon stocks improves the conditioning of the climate side; see Table Table 11.2). Industrial emissions are then $\sigma_t Y^{\mathrm{gross}}_t$ before any mitigation effort; with abatement rate $\mu_t \in [0,1]$ the planner can scale these emissions down,

E_{\mathrm{ind},t} \;=\; (1-\mu_t)\,\sigma_t\, Y^{\mathrm{gross}}_t \;=\; (1-\mu_t)\,\sigma_t\, K_t^{\alpha}(A_t L_t)^{1-\alpha}.

(11.6)

Carbon intensity is itself an exogenous decreasing time path. DICE-2016 calibrates a closed-form decay,

\sigma_t \;=\; \sigma_0\,\exp\!\left[\frac{g^{\sigma}_0}{\log(1+\delta^{\sigma})}\bigl((1+\delta^{\sigma})^{t}-1\bigr)\right],

(11.7)

with initial intensity $\sigma_0$ , initial growth rate $g^{\sigma}_0<0$ (so emissions per dollar of output fall over time), and second-derivative parameter $\delta^{\sigma}>0$ that bends the path further down at long horizons. Equation (11.7) captures the steady decarbonization that even unabated world output undergoes through ongoing technological change; the planner’s $\mu_t$ is the additional mitigation effort on top of that baseline.

Abatement is not free. In the spirit of an aggregate marginal-abatement-cost curve, DICE assumes the abatement-cost share of gross output is a power function of $\mu_t$ ,

\Theta(\mu_t) \;=\; \theta_{1,t}\,\mu_t^{\theta_2},

(11.8)

with curvature parameter $\theta_2>1$ (a typical calibration is $\theta_2=2.6$ ). The level coefficient $\theta_{1,t}$ is not a free parameter: it is pinned down by the cost of the backstop technology, the cleanest large-scale abatement technology available at any given time (e.g. direct air capture). Let $p^{\mathrm{back}}_t$ denote the cost per unit of CO $_2$ avoided when the backstop is fully deployed, and assume an exogenous declining path,

p^{\mathrm{back}}_t \;=\; p^{\mathrm{back}}_0\,\exp(-g^{\mathrm{back}}\,t),

(11.9)

reflecting steady cost reductions in clean technologies. Setting the marginal abatement cost at $\mu_t=1$ equal to the backstop price (multiplied by carbon-to-CO $_2$ conversion $\mathrm{c2co2}$ to keep mass units consistent, and by 10³ to convert $\sigma_t$ from 10³ GtC working units back to GtC) yields the calibration identity

\theta_{1,t} \;=\; \frac{p^{\mathrm{back}}_t \cdot 10^3 \cdot \mathrm{c2co2} \cdot \sigma_t}{\theta_2}.

(11.10)

Equation (11.10) is what makes $\Theta(\mu)$ economically meaningful rather than a fitted polynomial: the abatement-cost function inherits its level from the backstop price and its curvature from the assumption $\theta_2=2.6$ . The 10³ factor matches Equation (11) of Online Appendix D of Folini et al. (2025) and the corresponding factor of 1000 in the companion implementation. As the backstop becomes cheaper ( $p^{\mathrm{back}}_t \downarrow$ ), full mitigation becomes cheaper too, which is one of the channels that makes the deterministic optimal $\mu_t$ rise toward 1 over the 21st century.

The bound $\mu_t \in [0,1]$ deserves a comment. $\mu_t = 0$ means business-as-usual emissions; $\mu_t = 1$ means full deployment of the backstop, eliminating all industrial emissions. Values $\mu_t > 1$ would correspond to net-negative industrial emissions (e.g. aggressive direct air capture beyond the firm’s own footprint), which DICE forbids; we will impose the upper bound as a Kuhn--Tucker constraint, smoothed by a Fischer--Burmeister term, in Paragraph.

11.2.4Land-use emissions and net output¶

The atmosphere does not distinguish between an industrial flow and a non-industrial flow of carbon. In DICE, total emissions therefore comprise an industrial component (11.6) and an exogenous land-use-change component,

E_{\mathrm{Land},t} \;=\; E_{\mathrm{Land},0}\,\exp(-\delta^{\mathrm{Land}}\,t),

(11.11)

which decays smoothly toward zero as deforestation slows. Total emissions feeding the atmosphere are

E_t \;=\; E_{\mathrm{ind},t} + E_{\mathrm{Land},t}.

(11.12)

Closing the production block requires accounting for two additional drains on gross output: climate damages, governed by atmospheric temperature $T^{\mathrm{AT}}_t$ via a damage fraction $\Omega(T^{\mathrm{AT}}_t)$ developed in Section 11.2.7, and abatement spending (11.8). Net output is therefore

Y^{\mathrm{net}}_t \;=\; \bigl(1 - \Omega(T^{\mathrm{AT}}_t) - \Theta(\mu_t)\bigr)\,Y^{\mathrm{gross}}_t,

(11.13)

which is what is available for consumption and investment. The additive form is the convention adopted in CDICE and used by the production-grade DEQN library port; an alternative multiplicative form $(1-\Omega^{\mathrm{ret}})(1-\Theta)$ with retained-output factor $\Omega^{\mathrm{ret}}$ appears in Nordhaus (2008).

The planner’s controls and exogenous trends are now all named. The endogenous economic state is the capital stock $K_t$ . The exogenous trends are TFP $A_t$ , population $L_t$ , carbon intensity $\sigma_t$ , land-use emissions $E_{\mathrm{Land},t}$ , and (added below) the non-CO $_2$ component of radiative forcing $F^{\mathrm{EX}}_t$ . The planner controls the consumption--investment split (equivalently, the savings rate $s_t$ ) and the abatement rate $\mu_t \in [0,1]$ . All that remains is the climate side: the carbon cycle that turns total emissions $E_t$ into atmospheric concentration, the energy balance that turns concentration into temperature, and the damage function that turns temperature back into output loss.

11.2.5Carbon cycle¶

DICE represents the global carbon cycle as a three-reservoir linear system: an atmospheric box, an upper (mixed-layer) ocean box, and a lower (deep) ocean box. Carbon flows between reservoirs at calibrated rates, and total emissions $E_t$ from (11.12) enter directly into the atmospheric reservoir. Stacking concentrations as $M_t = (M^{\mathrm{AT}}_t,\, M^{\mathrm{UO}}_t,\, M^{\mathrm{LO}}_t)^\top$ , the transition is

M_{t+1} \;=\; (I + B)\, M_t \;+\; \bm{e}_1\,E_t,

(11.14)

where $\bm{e}_1 = (1,0,0)^\top$ injects emissions into the atmosphere alone, $E_t$ is the per-period emissions total, and the transfer matrix

B \;=\; \begin{pmatrix} -b_{12} & b_{12}\,M^{\mathrm{AT}}_{\mathrm{eq}}/M^{\mathrm{UO}}_{\mathrm{eq}} & 0 \\ b_{12} & -b_{12}\,M^{\mathrm{AT}}_{\mathrm{eq}}/M^{\mathrm{UO}}_{\mathrm{eq}} - b_{23} & b_{23}\,M^{\mathrm{UO}}_{\mathrm{eq}}/M^{\mathrm{LO}}_{\mathrm{eq}} \\ 0 & b_{23} & -b_{23}\,M^{\mathrm{UO}}_{\mathrm{eq}}/M^{\mathrm{LO}}_{\mathrm{eq}} \end{pmatrix}

(11.15)

encodes the two atmosphere--upper-ocean exchange rates ( $b_{12}$ in either direction) and the two upper-ocean--lower-ocean exchange rates ( $b_{23}$ in either direction). The off-diagonal scaling by the equilibrium-mass ratios $M^{\mathrm{AT}}_{\mathrm{eq}}/M^{\mathrm{UO}}_{\mathrm{eq}}$ and $M^{\mathrm{UO}}_{\mathrm{eq}}/M^{\mathrm{LO}}_{\mathrm{eq}}$ guarantees that, under zero net emissions, the system relaxes to the calibrated pre-industrial equilibrium $M_{\mathrm{eq}} = (M^{\mathrm{AT}}_{\mathrm{eq}},\, M^{\mathrm{UO}}_{\mathrm{eq}},\, M^{\mathrm{LO}}_{\mathrm{eq}})^\top$ . Calibrated values for $b_{12}, b_{23}$ , and $M_{\mathrm{eq}}$ in CDICE are listed in Table Table 11.2. The lecture slides for this chapter sometimes write the same transition with four directional rates $\phi_{12},\phi_{21},\phi_{23},\phi_{32}$ in place of $b_{12}$ and $b_{23}$ ; the two parameterizations are identical under $\phi_{12} = b_{12}$ , $\phi_{21} = b_{12}\,M^{\mathrm{AT}}_{\mathrm{eq}}/M^{\mathrm{UO}}_{\mathrm{eq}}$ , $\phi_{23} = b_{23}$ , $\phi_{32} = b_{23}\,M^{\mathrm{UO}}_{\mathrm{eq}}/M^{\mathrm{LO}}_{\mathrm{eq}}$ , i.e. the slide form makes the equilibrium-mass scaling absorbed into $B$ explicit at the cost of two extra symbols.

Equation (11.14) is a pulse-and-decay system: a unit pulse of emissions raises atmospheric carbon by one unit instantaneously, and that anomaly then bleeds into the upper ocean over decades and into the deep ocean over centuries. Figure Figure 11.2 shows the implied BAU emissions trajectory under nine alternative climate-module calibrations; the spread is mostly driven by the equilibrium climate sensitivity (developed in Section 11.2.6), not by the carbon cycle, which is tightly disciplined by the pulse and step tests of Section 11.2.8.

Business-as-usual industrial emissions in CDICE (in GtCO_2/yr) under the nine combinations of three carbon-cycle calibrations (MMM, MESMO, LOVECLIM) and three temperature calibrations (MMM, HadGEM2-ES, GISS-E2-R); the thin CDICE curves overlap visually, confirming that the BAU emissions path is essentially insensitive to the climate-module calibration because \sigma_t and A_t are exogenous. The thick red and orange curves are the RCP 8.5 and RCP 6.0 scenarios, included as climate-policy reference paths. Reproduced from , Figure 11(a). — Figure 11.2:Business-as-usual industrial emissions in CDICE (in GtCO $_2$ /yr) under the nine combinations of three carbon-cycle calibrations (MMM, MESMO, LOVECLIM) and three temperature calibrations (MMM, HadGEM2-ES, GISS-E2-R); the thin CDICE curves overlap visually, confirming that the BAU emissions path is essentially insensitive to the climate-module calibration because $\sigma_t$ and $A_t$ are exogenous. The thick red and orange curves are the RCP 8.5 and RCP 6.0 scenarios, included as climate-policy reference paths. Reproduced from Folini *et al.* (2025), Figure 11(a).

11.2.6Two-layer energy balance and radiative forcing¶

A two-layer energy balance model links carbon concentrations to temperature:

T^{\mathrm{AT}}_{t+1} = T^{\mathrm{AT}}_t + c_1 \bigl(F_t - \lambda\, T^{\mathrm{AT}}_t - c_3(T^{\mathrm{AT}}_t - T^{\mathrm{OC}}_t)\bigr)

(11.16)

T^{\mathrm{OC}}_{t+1} = T^{\mathrm{OC}}_t + c_4 \bigl(T^{\mathrm{AT}}_t - T^{\mathrm{OC}}_t\bigr)

(11.17)

where radiative forcing is

F_t = F_{\mathrm{2\times CO_2}} \frac{\log(M^{\mathrm{AT}}_t / M^{\mathrm{AT}}_{\mathrm{PI}})}{\log 2} + F^{\mathrm{EX}}_t.

(11.18)

Figure Figure 11.3 summarizes the full topology of the climate side: industrial emissions enter the atmospheric carbon stock, leak into the upper and lower ocean reservoirs at calibrated rates, raise radiative forcing through the logarithmic CO $_2$ term, and warm the atmospheric and ocean temperature layers through the two-layer energy balance.

$Topology of the CDICE climate side. Total emissions E_t enter the atmospheric carbon box M^{\mathrm{AT}}_t, leak into the upper- and lower-ocean carbon boxes at exchange rates b_{12} and b_{23}, and drive radiative forcing F_t through the logarithmic CO_2 relation. The two-layer energy balance maps F_t into the atmospheric temperature T^{\mathrm{AT}}_t via c_1, with c_3, c_4 governing the heat exchange between atmosphere and ocean. The dashed arrow closes the loop through the damage function back into output (developed in ). Five climate states (M^{\mathrm{AT}}, M^{\mathrm{UO}}, M^{\mathrm{LO}}, T^{\mathrm{AT}}, T^{\mathrm{OC}}) form the climate-side block of the DEQN state vector .$

Figure 11.3:Topology of the CDICE climate side. Total emissions $E_t$ enter the atmospheric carbon box $M^{\mathrm{AT}}_t$ , leak into the upper- and lower-ocean carbon boxes at exchange rates $b_{12}$ and $b_{23}$ , and drive radiative forcing $F_t$ through the logarithmic CO $_2$ relation. The two-layer energy balance maps $F_t$ into the atmospheric temperature $T^{\mathrm{AT}}_t$ via $c_1$ , with $c_3, c_4$ governing the heat exchange between atmosphere and ocean. The dashed arrow closes the loop through the damage function back into output (developed in Section 11.2.7). Five climate states $(M^{\mathrm{AT}}, M^{\mathrm{UO}}, M^{\mathrm{LO}}, T^{\mathrm{AT}}, T^{\mathrm{OC}})$ form the climate-side block of the DEQN state vector (11.25).

The parameter $\lambda = F_{\mathrm{2\times CO_2}} / \Delta T_{\mathrm{AT},\times 2}$ is determined by the equilibrium climate sensitivity (ECS), defined as the long-run atmospheric warming from a doubling of CO $_2$ concentration. We treat $\lambda$ as a deterministic constant in the baseline model; Section 11.9 promotes it to a learnable Gaussian parameter, with the additive feedback term $\varphi_{1C}\tilde f_{t+1} T^{\mathrm{AT}}_t$ entering the right-hand side of (11.16) and the coefficient $\varphi_{1C}$ defined in that subsection. ECS is one of the most consequential and uncertain parameters in climate science Roe & Baker, 2007Knutti et al., 2017. Observational and model-based estimates place ECS in a likely (66 %) range of 2.5°C--4°C and a very likely (90 %) range of 2°C--5°C, with a best estimate of approximately 3°C Calvin et al., 2023; ECS uncertainty is one of the largest single sources of variance in the SCC.

11.2.7Damage function: closing the climate--economy loop¶

The damage function is what turns a temperature anomaly back into an output loss, and so it is what closes the economy--climate--damages feedback loop drawn schematically in Figure Figure 11.1. Following the convention in Folini et al. (2025), Online Appendix D, we treat $\Omega(T_{\mathrm{AT}})$ as the damage fraction of gross output (the fraction lost to climate damages, increasing in $T_{\mathrm{AT}}$ ), and the abatement-cost fraction $\Theta(\mu)$ from (11.8) as a separate output drain. The two enter additively in net output (11.13); an alternative multiplicative form $(1-\Omega^{\mathrm{ret}})(1-\Theta)$ with retained-output factor $\Omega^{\mathrm{ret}}$ is used by Nordhaus (2008).

The workhorse specification is Nordhaus (2008)’s quadratic,

\Omega^N(T_{\mathrm{AT}}) \;=\; \pi_1\, T_{\mathrm{AT}} + \pi_2\, T_{\mathrm{AT}}^2,

(11.19)

which is relatively benign for moderate warming and is what we use in the deterministic CDICE solve below. Calibrated values $(\pi_1, \pi_2)$ are listed in Table Table 11.2. The damage function (11.19) is the most contested object in the IAM literature: at $T_{\mathrm{AT}}=3\,{}^\circ\mathrm{C}$ above pre-industrial, Nordhaus--quadratic damages amount to roughly $2\%$ of gross output, which several recent empirical literatures argue is far below realistic central estimates. We therefore treat the damage curvature $\pi_2$ as one of the two key uncertain parameters in the deep-UQ analysis of Section 11.11 (the other being the equilibrium climate sensitivity).

For the tipping-point branch of the literature, Weitzman (2012) argued that catastrophic thresholds require a steeper damage function,

\Omega^W(T_{\mathrm{AT}}) \;=\; 1 \;-\; \frac{1}{1 + \bigl(\tfrac{1}{\psi_1} T_{\mathrm{AT}}\bigr)^2 + \bigl(\tfrac{1}{2\, TP} T_{\mathrm{AT}}\bigr)^{6.754}},

(11.20)

where $TP$ is a stochastic tipping-point threshold. We do not solve a Weitzman damage variant in the baseline CDICE-DEQN, but the OLG-IAM of Section 11.12 introduces a stylized tipping risk in the same spirit; the degree of convexity of the damage function is one of the most important determinants of the optimal carbon tax.

11.2.8CDICE: recalibration of the climate module¶

A key contribution of Folini et al. (2025) is a systematic recalibration of the DICE climate module against benchmarks from climate science model archives (CMIP). Their CDICE framework retains the same functional forms as DICE but fits parameters to the four-test protocol summarized in Table Table 11.1.

Table 11.1:CDICE climate-module calibration protocol. The first two tests discipline the carbon-cycle and temperature-response blocks directly; the last two check whether the calibrated reduced-form module remains accurate on out-of-sample and historically realistic forcing paths.

Test	Target	Use
1. Carbon pulse (100 GtC)	Atmospheric retention path	Calibrate carbon cycle
2. $4\times$ CO $_2$ step	Temperature impulse response	Calibrate temperature block
3. 1% CO $_2$ /year	Transient climate response	Out-of-sample validation
4. Historical + RCP	Realistic forcing paths	End-to-end validation

This calibration ensures that the reduced-form climate module is consistent with state-of-the-art earth system models. CDICE also introduces a transparent time-step formulation, $X_{t+\Delta t} = X_t + \Delta t \cdot f(X_t, u_t; \theta)$ , that allows coherent implementation at annual, 5-year, or 10-year resolution within a single generic framework. Figure Figure 11.4 illustrates how much the climate-cycle calibration matters even before the planner makes any decision: under business-as-usual, DICE-2016 and CDICE produce visibly different atmospheric carbon trajectories, and the gap propagates into temperature, damages, and ultimately the SCC.

Atmospheric carbon M^{\mathrm{AT}}_t along the BAU path (in GtC, over 200 years from 2015) under the three CDICE carbon-cycle calibrations (CDICE = MMM, CDICE-MESMO, CDICE-LOVECLIM) and the legacy DICE-2016 carbon cycle. Only the carbon-cycle block is varied here; the temperature block is held at the CDICE MMM calibration, since the BAU carbon-stock path does not depend on the temperature calibration to first order. The DICE-2016 path lies systematically above the CMIP-disciplined paths, reflecting that the original DICE carbon cycle overstates atmospheric retention; CDICE-MESMO and CDICE-LOVECLIM bracket the CDICE baseline on the slow-removal and fast-removal sides, respectively. Reproduced from , Figure 15(a). — Figure 11.4:Atmospheric carbon $M^{\mathrm{AT}}_t$ along the BAU path (in GtC, over 200 years from 2015) under the three CDICE carbon-cycle calibrations (CDICE = MMM, CDICE-MESMO, CDICE-LOVECLIM) and the legacy DICE-2016 carbon cycle. Only the carbon-cycle block is varied here; the temperature block is held at the CDICE MMM calibration, since the BAU carbon-stock path does not depend on the temperature calibration to first order. The DICE-2016 path lies systematically above the CMIP-disciplined paths, reflecting that the original DICE carbon cycle overstates atmospheric retention; CDICE-MESMO and CDICE-LOVECLIM bracket the CDICE baseline on the slow-removal and fast-removal sides, respectively. Reproduced from Folini *et al.* (2025), Figure 15(a).

11.2.9Calibration and initial conditions, in one place¶

The block-by-block model description above introduces a fairly large set of parameters. Table Table 11.2 consolidates the calibration we use throughout the rest of the chapter, lifted from the Online Appendix of Folini et al. (2025). Two CMIP5 alternatives (HadGEM2-ES and GISS-E2-R) are shown alongside the multi-model mean (MMM) so that the deep-UQ analysis of Section 11.11 has a concrete distribution to draw from. We follow the CDICE convention of expressing all carbon quantities in 10³ GtC working units: equilibrium and initial carbon stocks $M_{\mathrm{eq}}$ and $M_0$ , the initial carbon intensity $\sigma_0$ , and the initial land-use emissions $E_{\mathrm{Land},0}$ are all on the same scale, which keeps the numerical conditioning of the carbon-cycle and emissions states under control. The factor 10³ appears explicitly in the abatement-cost calibration (11.10) to convert $\sigma_t$ back to GtC when it is multiplied by the backstop price; a reader comparing values against raw DICE-2016 numbers (e.g. $\sim 2.6$ GtC/yr land-use emissions, $\sim 851$ GtC atmospheric carbon in 2015) should multiply the table entries by 10³ first.

Table 11.2:CDICE baseline calibration used in the deterministic CDICE-DEQN solve. Parameter values follow the Online Appendix of Folini et al. (2025) and are stated on an annual time step ( $\Delta_t = 1$ yr). All carbon quantities ( $M_{\mathrm{eq}}, M_0, \sigma_0, E_{\mathrm{Land},0}$ ) are in CDICE’s 10³ GtC working units; multiply by 10³ to recover GtC. Two alternative climate calibrations (CDICE-HadGEM2-ES, CDICE-GISS-E2-R) are listed in the temperature block, with their full free-parameter sets $\{c_1, c_3, c_4, F_{\mathrm{2\times CO_2}}, \lambda\}$ and corresponding ECS, since simply varying ECS while holding the rest of the temperature block fixed is not equivalent to using the full CMIP5 calibration Folini et al., 2025. Initial state is for year 2015.

Block	Parameter	Value	Meaning
Economy	$\alpha$	0.30	Capital share in Cobb--Douglas output
	$\delta$	0.10/yr	Capital depreciation rate
	$\rho$	0.015/yr	Pure rate of time preference
	$\psi$	0.69	Intertemporal elasticity of substitution
Emissions &	$\sigma_0$	$9.556\!\times\!10^{-5}$ (10³ GtC)/USD	Initial carbon intensity
abatement	$g^{\sigma}_0$	-0.0152/yr	Initial decay rate of $\sigma_t$
	$\delta^{\sigma}$	0.001/yr	Curvature of $\sigma_t$ decay
	$p^{\mathrm{back}}_0$	0.55 thUSD/tCO $_2$	Initial backstop price
	$g^{\mathrm{back}}$	0.005/yr	Decay rate of backstop price
	$\theta_2$	2.6	Curvature of $\Theta(\mu)$
	$\mathrm{c2co2}$	3.666	Carbon-to-CO $_2$ mass conversion
Land use	$E_{\mathrm{Land},0}$	$7.09\!\times\!10^{-4}$ (10³ GtC)/yr	Initial land-use emissions
	$\delta^{\mathrm{Land}}$	0.023/yr	Decay rate of $E_{\mathrm{Land},t}$
Carbon cycle	$b_{12}$	0.054/yr	Atm.--upper-ocean transfer rate
	$b_{23}$	0.0082/yr	Upper-ocean--lower-ocean transfer rate
	$M_{\mathrm{eq}}$	$(0.607, 0.489, 1.281)$ (10³ GtC)	Pre-industrial equilibrium masses
Temperature	$c_1$ (MMM)	0.137/yr	Atmospheric heat-capacity inverse
	$c_3$ (MMM)	0.73/yr	Atm.--ocean coupling
	$c_4$ (MMM)	0.00689/yr	Ocean heat-capacity inverse
	$F_{\mathrm{2\times CO_2}}$ (MMM)	3.45 W/m $^2$	Forcing from CO $_2$ doubling
	$\lambda$ (MMM)	1.06 W/m $^2$ /K	Climate feedback parameter
	ECS (MMM)	$\approx 3.25\,{}^\circ$ C	Equilibrium climate sensitivity
	HadGEM2-ES	$(c_1,c_3,c_4)=(0.154,0.55,0.00671)$ /yr	High-end CMIP5 calibration
		$F_{\mathrm{2\times CO_2}}=2.95$ , $\lambda=0.65$ , ECS $\approx 4.55$ °C
	GISS-E2-R	$(c_1,c_3,c_4)=(0.213,1.16,0.00921)$ /yr	Low-end CMIP5 calibration
		$F_{\mathrm{2\times CO_2}}=3.65$ , $\lambda=1.70$ , ECS $\approx 2.15$ °C
Damages	$\pi_1$	0.0	Linear damage coefficient
	$\pi_2$	0.00236	Quadratic damage coefficient
Initial state	$K_0$	223 T USD	Capital, year 2015
	$M_0$	$(0.851, 0.628, 1.323)$ (10³ GtC)	Atm./upper/lower carbon, 2015
	$T_0$	$(1.10, 0.27)\,{}^\circ$ C	Atm./ocean temp. above pre-industrial, 2015

11.2.10The full IAM, summarized¶

Pulling the previous subsections together, CDICE is a deterministic dynamical system on a finite-dimensional state vector that the planner steers with two controls. The endogenous state at date $t$ is the sextuple

\bm{X}^{\mathrm{end}}_t \;=\; \bigl(K_t,\; M^{\mathrm{AT}}_t,\; M^{\mathrm{UO}}_t,\; M^{\mathrm{LO}}_t,\; T^{\mathrm{AT}}_t,\; T^{\mathrm{OC}}_t\bigr),

(11.21)

the exogenous-trend vector is

\bm{X}^{\mathrm{exo}}_t \;=\; \bigl(A_t,\; L_t,\; \sigma_t,\; E_{\mathrm{Land},t},\; F^{\mathrm{EX}}_t\bigr),

(11.22)

and the planner’s controls are $(C_t,\, \mu_t)$ (equivalently $(K_{t+1},\, \mu_t)$ , since investment is determined by the resource constraint $C_t + I_t = Y^{\mathrm{net}}_t$ together with (11.4)). The transitions are: capital from (11.4) with $I_t = Y^{\mathrm{net}}_t - C_t$ ; total emissions from (11.12), fed into the carbon cycle (11.14); temperature from (11.16)--(11.17) with forcing (11.18); and net output, hence the resource constraint, from (11.13). The objective is the discounted CRRA-IES felicity sum (11.5) subject to $\mu_t \in [0,1]$ .

That is the entire deterministic IAM. Every primitive named above has a closed-form expression and a calibrated parameter (Table Table 11.2); the only thing left is to find the optimal policy $(C_t, \mu_t)_{t\ge 0}$ . The model is intrinsically non-stationary. Section 11.3 makes that observation precise; the stationary DEQN of Chapter Chapter 2 needs to be amended before we can solve this system.

11.3Why DICE Breaks the Stationary DEQN¶

This is the technical pivot of the chapter. The stationary DEQN of Chapter Chapter 2 was designed for models whose policy function is a fixed point of a Bellman operator on an ergodic state space. IAMs satisfy neither premise. Three structural features break the stationarity assumption simultaneously, and each must be addressed before the DEQN can be trained at all.

11.3.0.1Time-varying state distributions with no ergodic limit.¶

The endogenous state of a stationary DSGE is the projection of a recurrent Markov chain onto a finite-dimensional vector; the policy function lives on its stationary distribution. In an IAM the analogue object does not exist within the planning horizon. Atmospheric carbon $M^{\mathrm{AT}}_t$ rises from a pre-industrial baseline of $\sim 600$ GtC to a peak of $\sim 1500$ GtC over a century, then decays over millennia; atmospheric temperature $T^{\mathrm{AT}}_t$ follows with a multi-decade lag and a multi-century relaxation. Within the 300 years the planner cares about, neither variable ever returns to a state it has been in before. The state visited at $t = 100$ is therefore not exchangeable with the state visited at $t = 200$ , and a time-invariant policy function $\bm p(\bm X_t)$ that depends only on the endogenous state misses the whole point of the exercise: the optimal mitigation effort at a given $(M^{\mathrm{AT}}, T^{\mathrm{AT}})$ depends on whether that state was reached on the way up or on the way down. Cf. the curse-of-dimensionality discussion in Section 2.1: it is not the size of the state space that breaks the DEQN here, it is the lack of recurrence.

11.3.0.2Deterministic drift through exogenous trends.¶

Even setting the carbon and temperature stocks aside, the IAM is drifting deterministically. Total factor productivity $A_t$ trends up at a calibrated, time-varying rate; population $L_t$ follows the demographic projection of Nordhaus (2017); carbon intensity $\sigma_t$ falls along the closed-form decay (11.7); land-use emissions $E_{\mathrm{Land},t}$ decay smoothly (11.11); the backstop price $p^{\mathrm{back}}_t$ falls (11.9); the exogenous non-CO $_2$ forcing $F^{\mathrm{EX}}_t$ follows a fitted RCP trajectory; and the abatement-cost level $\theta_{1,t}$ inherits the time dependence of $\sigma_t$ and $p^{\mathrm{back}}_t$ through (11.10). Seven exogenous trends drive the model even before a shock is introduced. A time-invariant policy can never see them, and replacing them with their long-run averages is exactly the certainty-equivalence move that defeats the purpose of solving the model globally.

11.3.0.3Finite calendar-time horizon.¶

A stationary DEQN trains under a transversality condition: as $t \to \infty$ , the discounted shadow price of capital goes to zero, and the iterative-projection loss inherits that fixed-point structure for free. An IAM is not solved on $[0, \infty)$ . The planning horizon is a finite calendar date (the notebooks of Section 11.7 run roughly three centuries from a 2015 start), so transversality is not available and the policy is solved over a finite forward sweep instead.

11.3.0.4Putting it together.¶

These features compound and explain why a time-invariant DEQN of Chapter Chapter 2 cannot be used here without modification. The next two sections operationalize the response: Section 11.4 reorganizes the network inputs to include calendar time, and Section 11.5 states the resulting training algorithm as a labeled diff against the stationary DEQN box of Section 2.3.

11.4What Changes in the DEQN Setup¶

We now translate this into one concrete design choice for the network inputs. The autodiff machinery, the squared-residual structure, and the rest of the training loop of Chapter Chapter 2 carry over unchanged; this is a refactor of what the network sees, not a new solver.

11.4.1Time and trends as states¶

Calendar time itself enters as a state. Because neural networks prefer bounded inputs, we use the monotone time rescaling $\tau_t = 1 - \exp(-\vartheta\, t) \in [0, 1)$ of Eq. (11.24). Every exogenous trend ( $A_t, L_t, \sigma_t, E_{\mathrm{Land},t}, F^{\mathrm{EX}}_t, p^{\mathrm{back}}_t, \theta_{1,t}$ ) is then a deterministic function of $\tau_t$ , so passing $\tau_t$ to the network is informationally equivalent to passing the entire trend bundle. Training trajectories begin from the calibrated 2015 state and run forward over the planner’s horizon.

11.5The Non-Stationary DEQN Algorithm¶

The design choice of Section 11.4 translates into a single training algorithm. The body below is a literal diff against the stationary DEQN of Section 2.3: unchanged lines are grayed, new or modified lines are bolded.

Algorithm 11.1 (Non-Stationary DEQN Training)

Input: Network $\mathcal{N}_\rho$ , learning rate $\eta$ , episodes $E$ , training steps $T_{\mathrm{train}}$ ; \ [NEW] calibrated initial state $\bm x_0$ (e.g., the 2015 state) and a planning horizon $T_{\max}$
for episode $e = 1, \ldots, E$ :
- [CHANGED] Simulate $K$ forward trajectories from $\bm x_0$ over $[0, T_{\max}]$ under the current policy, and collect the time-stamped states $(\tau_t, \bm x_t)$ into $\mathcal D$
- for gradient step $t = 1, \ldots, T_{\mathrm{train}}$ :
  - Draw mini-batch $\mathcal B \subset \mathcal D$
  - Compute loss:~ $\ell_\rho = \frac{1}{|\mathcal B|}\sum_{\bm x_i \in \mathcal B}\|G(\bm x_i, \mathcal N_\rho(\bm x_i))\|^2$
  - Update:~ $\rho \leftarrow \rho - \eta \cdot \nabla_\rho \ell_\rho$
Output: Trained network $\mathcal{N}_{\rho^\star}$ approximating the policy function

One delta against the stationary DEQN box. The simulation step starts from a calibrated initial state $\bm x_0$ and integrates $K$ trajectories forward through calendar time, so the pool $\mathcal D$ contains time-stamped states $(\tau_t, \bm x_t)$ along finite-horizon trajectories rather than draws from an ergodic distribution. With $\tau_t$ in the input the network learns a time-dependent policy; every other line of the box is the stationary DEQN of Section 2.3 unchanged.

11.5.0.1What replaces transversality.¶

Because the pool $\mathcal D$ is built from $K$ forward simulations of length $T_{\max}$ that all start at the same $\bm x_0$ , every trajectory visits the full calendar window $[0, T_{\max}]$ and a uniform mini-batch draw from $\mathcal D$ is therefore stratified across calendar time by construction. The missing transversality condition of Section 11.3 is absorbed numerically by choosing the horizon long enough that the discounted contribution of the terminal state falls below the training-noise floor: at the CDICE calibration $\rho = 0.015$ /yr and the notebooks’ default $T_{\max} = 300$ years, $\hat\beta_t^{\,T_{\max}} \approx \exp(-\rho\,T_{\max}) \approx 0.011$ , which is one to two orders of magnitude below the achievable residual root-mean-square at convergence. When the horizon must be short (e.g., the 1D toy of Exercise 11.10), one instead adds an explicit terminal residual $\lambda_T\,\|\bm x_{T_{\max}} - \bm x^{\mathrm{ref}}_{T_{\max}}\|^2$ to the loss; both options are standard in the finite-horizon DEQN literature.

11.6The Planner’s Lagrangian and FOCs¶

Movement 2 puts the non-stationary DEQN of Section 11.5 to work on the deterministic CDICE economy of Section 11.2. Solving this system with the algorithm of Section 11.5 amounts to writing the planner’s Lagrangian, deriving the first-order and envelope conditions, normalizing them, treating each FOC as a residual, and minimizing the sum of squared residuals on the time-stamped state pool generated by the forward simulation of Section 11.5. This section follows Friedl et al. (2023) and Online Appendix D of Folini et al. (2025).

11.6.0.1Detrending and state vector, in compact form.¶

The model-rendering choices already named in Section 11.4.1 carry over verbatim. Variables that grow with the productivity--population product $A_t L_t$ are rescaled to per-effective-capita units:

c_t \;:=\; \frac{C_t}{A_t\,L_t}, \qquad k_t \;:=\; \frac{K_t}{A_t\,L_t}.

(11.23)

Calendar time enters through the bounded rescaling (compatible with the dynamic-programming convention of Traeger (2014)),

\tau \;=\; 1 - \exp(-\vartheta\, t) \;\in\; [0,1), \qquad\text{with inverse}\quad t \;=\; -\frac{\ln(1-\tau)}{\vartheta},

(11.24)

with compression parameter $\vartheta > 0$ . The full DEQN state vector then collects the detrended endogenous CDICE states, the bounded time index, the Bayesian-belief states $(\mu_{f,t}, S_{f,t})$ used in Section 11.9, and a slot for pseudo-state parameters $\theta$ used in the UQ analysis of Section 11.11:

\bm{X}_t \;=\; \bigl[\underbrace{k_t,\, M^{\mathrm{AT}}_t,\, M^{\mathrm{UO}}_t,\, M^{\mathrm{LO}}_t,\, T^{\mathrm{AT}}_t,\, T^{\mathrm{OC}}_t,\, \mu_{f,t},\, S_{f,t},\, \tau_t}_{9\text{ endogenous, exogenous, and time states}};\; \underbrace{\theta}_{N\text{ pseudo-state parameters}}\bigr].

(11.25)

In the deterministic core developed in this and the next section, only the six endogenous-state entries plus $\tau_t$ are active, i.e. a seven-dimensional input vector; $(\mu_{f,t}, S_{f,t})$ and $\theta$ are appended only in the extensions of Movement 3.

11.6.0.2The Lagrangian.¶

We now derive the equilibrium conditions that the DEQN will be trained against. The derivation follows the standard Lagrangian approach in CRRA-IES form, working directly with the deterministic CDICE primitives of Section 11.2 (the recursive Epstein--Zin refinement is layered on in Section 11.10). Write the Lagrangian with multiplier $\lambda_t$ for the budget constraint $C_t + I_t = Y^{\mathrm{net}}_t$ , multipliers $\nu^{\mathrm{AT}}_t,\nu^{\mathrm{UO}}_t,\nu^{\mathrm{LO}}_t$ for the three carbon-reservoir transitions (11.14), multipliers $\eta^{\mathrm{AT}}_t,\eta^{\mathrm{OC}}_t$ for the temperature dynamics (11.16)--(11.17), and KKT multiplier $\lambda^\mu_t \ge 0$ for the abatement bound $\mu_t \le 1$ . The derivation produces ten equilibrium conditions: a consumption FOC, an abatement FOC, the capital Euler equation, the budget/resource constraint, three carbon-stock envelope conditions, two temperature-envelope conditions, and the abatement upper-bound complementarity. Two of these are enforced algebraically (the static consumption and abatement FOCs); the remaining eight become the DEQN residuals of Section 11.7.

11.6.0.3Qualitative overview.¶

Taking derivatives of the Lagrangian with respect to the controls yields:

w.r.t. $C_t$ : the marginal utility of consumption equals the shadow price of the budget, $\partial V^{1-1/\psi}/\partial C_t = \lambda_t$ .
w.r.t. $K_{t+1}$ : the shadow value of capital today equals the discounted expected marginal value tomorrow, $\xi_t = e^{-\rho}\,\partial \mathbb{E}_t[V_{t+1}^{1-\gamma}]^{(1-1/\psi)/(1-\gamma)} / \partial K_{t+1}$ .
w.r.t. $\mu_t$ : the marginal abatement cost equals the shadow value of reduced emissions (plus the complementarity term if $\mu_t = 1$ ).

11.6.0.4Envelope theorem.¶

Since the FOC for $K_{t+1}$ involves $\partial V/\partial K_{t+1}$ , which cannot be computed analytically, we apply the envelope theorem. It provides derivatives of the value function with respect to current states, in particular $\partial V/\partial k_t$ , $\partial V/\partial M_{\mathrm{AT},t}$ , $\partial V/\partial T^{\mathrm{AT}}_t$ , which are then shifted forward one period and substituted back into the FOCs.

11.6.0.5Capital Euler equation.¶

Combining the FOCs and envelope conditions yields:

1 = e^{-\rho}\,\mathbb{E}_t\!\left[\left(\frac{V_{t+1}}{\bigl(\mathbb{E}_t[V_{t+1}^{1-\gamma}]\bigr)^{1/(1-\gamma)}}\right)^{1/\psi - \gamma} \cdot \frac{(C_{t+1}/L_{t+1})^{-1/\psi}}{(C_t/L_t)^{-1/\psi}} \cdot R^K_{t+1}\right],

(11.26)

where $R^K_{t+1}$ is the return on capital inclusive of climate damages. The SCC also appears through the shadow price of atmospheric carbon:

\mathrm{SCC}^{M}_t = -\frac{\partial V_t / \partial M_{\mathrm{AT},t}}{\partial V_t / \partial C_t}.

(11.27)

This is a shadow value per unit of atmospheric carbon stock. The emissions-based SCC in (11.1) additionally includes the marginal loading of a unit of emissions into $M_{\mathrm{AT},t}$ and the carbon-to-CO $_2$ unit conversion. At the optimum, the marginal abatement cost equals the carbon tax equals the emissions SCC after these conversions.

11.6.0.6Normalization of multipliers.¶

Over a 300-year horizon, $A_t$ and $L_t$ can move the natural scale of marginal utilities and multipliers substantially, with the direction and magnitude depending on the IES through $A_t^{1-1/\psi}L_t$ . Such scale drift makes network outputs and gradients harder to optimize stably. Following the detrending logic of (11.23), all multipliers, the budget multiplier, the abatement-bound multiplier, and the five climate envelope multipliers alike, are divided by $A_t^{1-1/\psi}\,L_t$ . The argument for the climate multipliers tracks the budget-multiplier case via the envelope conditions of Paragraph and is spelled out in Online Appendix D of Folini et al. (2025); we adopt the result here:

\hat{\lambda}_t := \frac{\lambda_t}{A_t^{1-1/\psi}\,L_t},\quad \hat{\lambda}^\mu_t := \frac{\lambda^\mu_t}{A_t^{1-1/\psi}\,L_t},\quad \hat{\nu}^{\mathrm{AT}}_t := \frac{\nu^{\mathrm{AT}}_t}{A_t^{1-1/\psi}\,L_t},\quad \hat{\nu}^{\mathrm{UO}}_t := \frac{\nu^{\mathrm{UO}}_t}{A_t^{1-1/\psi}\,L_t},\quad \ldots

(11.28)

and analogously for the remaining multipliers $\hat{\nu}^{\mathrm{LO}}_t$ , $\hat{\eta}^{\mathrm{AT}}_t$ , and $\hat{\eta}^{\mathrm{OC}}_t$ . The normalization induces an effective discount factor that absorbs the trend growth in the per-effective-capita Euler equation,

\hat{\beta}_t \;:=\; \exp\!\left(-\rho + \left(1-\frac{1}{\psi}\right) g^A_t + g^L_t\right),

(11.29)

where $g^A_t := \ln(A_{t+1}/A_t)$ and $g^L_t := \ln(L_{t+1}/L_t)$ are annual log changes. Equation (11.29) mirrors Equation (38) of Online Appendix D of Folini et al. (2025): the population term enters linearly because $L_t$ enters the felicity weight $L_t (C_t/L_t)^{1-1/\psi}$ linearly, while the productivity term inherits the $1-1/\psi$ exponent from the per-effective-capita rescaling of consumption. All intertemporal equations below use $\hat{\beta}_t$ in place of $e^{-\rho}$ . For a non-annual time step, replace $\rho$ by $\rho\Delta_t$ and $g^A_t, g^L_t$ by their per-period analogues.

11.6.0.7Sign convention for the climate multipliers.¶

We adopt the value-derivative convention throughout the script: each climate multiplier $\hat{\nu}^{\mathrm{AT}}_t,\, \hat{\nu}^{\mathrm{UO}}_t,\, \hat{\nu}^{\mathrm{LO}}_t,\, \hat{\eta}^{\mathrm{AT}}_t,\, \hat{\eta}^{\mathrm{OC}}_t$ is the (normalized) partial derivative of the value function with respect to the corresponding climate state. Because extra atmospheric carbon lowers welfare, $\hat{\nu}^{\mathrm{AT}}_t$ is non-positive at the optimum, which is why the stock SCC carries a minus sign, $\mathrm{SCC}^M_t = -\hat{\nu}^{\mathrm{AT}}_t/\hat{\lambda}_t$ . The companion implementation in dice_2p_surrogate_lib.py stores the positive marginal damage $-\hat{\nu}^{\mathrm{AT}}_t$ as a network output for numerical conditioning and flips the sign explicitly inside each residual; the algebra below uses the script convention, so the reader who compares the equations to the code will see one extra sign flip per carbon-multiplier term.

11.6.0.8Symbol cheat-sheet for the multipliers.¶

Before writing the FOCs and the loss, Table Table 11.3 collects the multipliers that the DEQN learns and their role; subsequent equations use the hat-normalized form throughout.

Table 11.3:Normalized Lagrange multipliers in the CDICE--DEQN. All values are divided by $A_t^{1-1/\psi}\,L_t$ relative to the raw multipliers, so the hatted versions inherit the per-effective-capita scale that the network outputs see. The atmospheric carbon multiplier carries the SCC up to the marginal-utility denominator: $\mathrm{SCC}^M_t = -\hat\nu^{\mathrm{AT}}_t / \hat\lambda_t$ .

Symbol	Multiplier on	Sign at optimum	Network output?
_t	Budget constraint $C_t + I_t = Y^{\mathrm{net}}_t$	$> 0$	yes (softplus)
^_t	Abatement upper bound $\mu_t \le 1$	$\ge 0$	no (implied, Eq. (11.39))
^_t	Atmospheric carbon transition $M^{\mathrm{AT}}_{t+1}=\ldots$	$\le 0$	yes (stored as $-\hat\nu^{\mathrm{AT}}_t > 0$ via softplus)
^_t	Upper-ocean carbon transition	$\le 0$	yes (linear)
^_t	Lower-ocean carbon transition	$\le 0$	yes (linear)
^_t	Atmospheric temperature transition	$\le 0$	yes (linear)
^_t	Ocean temperature transition	$\le 0$	yes (linear)

11.6.0.9Key FOCs in normalized form.¶

After normalization, the most important first-order conditions become (see Online Appendix D of Folini et al. (2025) for the complete set of 14 equations):

\frac{\partial \mathcal{L}}{\partial c_t} = 0 \; \Leftrightarrow\; c_t^{-1/\psi}\,A_t^{1-1/\psi}\,L_t - \hat{\lambda}_t = 0

(11.30)

\frac{\partial \mathcal{L}}{\partial k_{t+1}} = 0 \; \Leftrightarrow\; \exp\!\bigl(g^A_t + g^L_t\bigr)\,\hat{\lambda}_t - \hat{\beta}_t\Bigl\{\hat{\lambda}_{t+1}\bigl[\bigl(1-\Omega(T_{\mathrm{AT},t+1}) - \Theta(\mu_{t+1})\bigr)\alpha k_{t+1}^{\alpha-1} + (1-\delta)\bigr] \nonumber

(11.31)

\quad + \hat{\nu}^{\mathrm{AT}}_{t+1}\,\sigma_{t+1}(1-\mu_{t+1})A_{t+1}L_{t+1}\alpha k_{t+1}^{\alpha-1}\Bigr\} = 0

(11.32)

\frac{\partial \mathcal{L}}{\partial \mu_t} = 0 \; \Leftrightarrow\; \hat{\lambda}_t\,\Theta'(\mu_t)\,k_t^\alpha + \hat{\lambda}^\mu_t + \hat{\nu}^{\mathrm{AT}}_t\,\sigma_t\,A_t\,L_t\,k_t^\alpha = 0.

(11.33)

Equation (11.32) is the capital Euler equation: it equates the marginal cost of saving one additional unit today (left) to the discounted marginal benefit tomorrow (right), which now includes a term from the atmospheric carbon envelope ( $\hat{\nu}^{\mathrm{AT}}_{t+1}$ ) because higher capital increases output and hence emissions.

11.6.0.10Envelope conditions.¶

Convention reminder. As stated in Section 11.2.1, CDICE is calibrated on an annual time step and the coefficients $b_{12}, b_{23}, c_1, c_3, c_4$ in Table Table 11.2 are annual rates; consequently no $\Delta_t$ multipliers appear in either the dynamics (11.14), (11.16)--(11.17) or in the FOC residuals below.

Differentiating the Lagrangian with respect to state variables and shifting forward one period yields the shadow prices of the climate stocks. For example, the atmospheric carbon envelope is:

\frac{\partial \mathcal{L}}{\partial M_{\mathrm{AT},t+1}} = 0 \;\Leftrightarrow\; \hat{\nu}^{\mathrm{AT}}_t - \hat{\beta}_t\!\left[\hat{\nu}^{\mathrm{AT}}_{t+1}(1-b_{12}) + \hat{\nu}^{\mathrm{UO}}_{t+1}\,b_{12} + \hat{\eta}^{\mathrm{AT}}_{t+1}\,c_1\,F_{\mathrm{2\times CO_2}}\,\frac{1}{\ln 2\,M_{\mathrm{AT},t+1}}\right] = 0.

(11.34)

This equation says that the current shadow price of atmospheric carbon ( $\hat{\nu}^{\mathrm{AT}}_t$ ) must equal the discounted future effects through three channels: persistence in the atmosphere ( $b_{12}$ term), diffusion into the upper ocean ( $\hat{\nu}^{\mathrm{UO}}_{t+1}$ term), and radiative forcing on temperature ( $\hat{\eta}^{\mathrm{AT}}_{t+1}$ term). It is the existence of these climate multipliers that distinguishes the IAM from the purely economic models of Chapters Chapter 2--Chapter 4.

11.6.0.11Fischer--Burmeister complementarity for abatement.¶

The abatement rate is bounded above by 1 (full abatement), giving the KKT condition:

1 - \mu_t \;\geq\; 0 \quad\perp\quad \hat{\lambda}^\mu_t \;\geq\; 0,

(11.35)

which is non-smooth at $\mu_t = 1$ . As in the borrowing-constraint treatment of Chapter Chapter 5 (Section Section 5.4), we replace it with the Fischer--Burmeister smooth approximation:

\Psi^{\mathrm{FB}}\!\bigl(\hat{\lambda}^\mu_t,\; 1-\mu_t\bigr) \;=\; \hat{\lambda}^\mu_t + (1-\mu_t) - \sqrt{(\hat{\lambda}^\mu_t)^2 + (1-\mu_t)^2 + \varepsilon_{\mathrm{FB}}} \;=\; 0,

(11.36)

with the same regularization parameter $\varepsilon_{\mathrm{FB}} \geq 0$ used in Chapters Chapter 3--Chapter 5. In CDICE-DEQN we take $\varepsilon_{\mathrm{FB}} = 10^{-6}$ , equivalent to the IRBC chapter’s $\varepsilon = 10^{-3}$ under its $\varepsilon^2$ convention; the trained policy is insensitive to the choice in the range 10^-10 to 10^-4. At $\varepsilon_{\mathrm{FB}} = 0$ the zero set of $\Psi^{\mathrm{FB}}$ coincides with the positive axes in the $(\hat{\lambda}^\mu_t,\, 1-\mu_t)$ -plane, enforcing the original complementarity exactly but the function is non-differentiable at the origin; with $\varepsilon_{\mathrm{FB}} > 0$ the function is differentiable everywhere at the cost of a slight relaxation of exact complementarity.

11.7From FOCs to a Single Loss¶

The Lagrangian of Paragraph produces ten equilibrium conditions: the consumption FOC (11.30), the capital Euler (11.32), the abatement FOC (11.33), the budget/resource constraint $C_t + I_t = Y^{\mathrm{net}}_t$ , the three carbon-stock envelopes (one of which is the atmospheric-carbon envelope (11.34)), the two temperature-layer envelopes, and the Fischer--Burmeister abatement complementarity (11.36). In the DEQN solver two of these ten are enforced exactly by algebraic recovery rather than as squared residuals: the consumption FOC is inverted to yield $c_t$ from $\hat{\lambda}_t$ , and the abatement FOC is solved for $\hat{\lambda}^\mu_t$ and the resulting implied multiplier is fed straight into the Fischer--Burmeister condition. What remains is an eight-residual sum-of-squares loss with eight network outputs, structurally identical to the stationary DEQN of Chapters Chapter 2--Chapter 3. The only substantive difference is that the network must learn the shadow prices of all five climate state variables (three carbon stocks and two temperature layers) in addition to the economic choices, so that the planner has a gradient signal for how today’s decisions propagate through the carbon cycle and the energy balance into future damages.

11.7.0.1Policy network specification.¶

The policy function approximated by the neural network outputs an eight-dimensional vector,

\mathcal{N}_\rho(\bm{x}_t) \;\in\; \mathbb{R}^{8} \;:=\; \bigl(k_{t+1},\; \mu_t,\; \hat{\lambda}_t,\; \hat{\nu}^{\mathrm{AT}}_t,\; \hat{\nu}^{\mathrm{UO}}_t,\; \hat{\nu}^{\mathrm{LO}}_t,\; \hat{\eta}^{\mathrm{AT}}_t,\; \hat{\eta}^{\mathrm{OC}}_t\bigr),

(11.37)

comprising two choice variables ( $k_{t+1}$ , $\mu_t$ ), the consumption shadow price $\hat{\lambda}_t$ , and the five normalized climate multipliers. Note that the abatement KKT multiplier $\hat{\lambda}^\mu_t$ is not a network output: it is recovered algebraically inside the loss (see below). A key difference from the stationary DEQN of Chapters Chapter 2--Chapter 3 is that the network must learn the shadow prices of all climate constraints, not just the economic choices. Without the climate multipliers, the planner would have no gradient signal about how today’s decisions propagate through the carbon cycle and temperature dynamics into future damages.

11.7.0.2Bounds and positivity.¶

The output activations of $\mathcal{N}_\rho$ are chosen so that the bound and positivity constraints of the model hold for every input, eliminating the need for additional residuals. The capital level $k_{t+1}$ , the consumption shadow $\hat{\lambda}_t$ , and the abatement rate $\mu_t$ are each passed through a softplus, which guarantees $k_{t+1} > 0$ , $\hat{\lambda}_t > 0$ (so consumption recovered via (11.38) is positive), and $\mu_t \ge 0$ exactly. The upper bound $\mu_t \le 1$ is enforced jointly by the Fischer--Burmeister condition $l_8$ at the implied multiplier (11.39) and by a small quadratic upper-bound penalty $\propto \mathbb{E}[\max(\mu_t - 1, 0)^2]$ added to the training loss. The atmospheric-carbon shadow $\hat{\nu}^{\mathrm{AT}}_t$ is stored in the implementation as a positive marginal damage (see the sign-convention note in Paragraph) and is output through a softplus; the remaining climate multipliers $\hat{\nu}^{\mathrm{UO}}_t, \hat{\nu}^{\mathrm{LO}}_t, \hat{\eta}^{\mathrm{AT}}_t, \hat{\eta}^{\mathrm{OC}}_t$ are unconstrained and use linear output activations.

11.7.0.3How is consumption $c_t$ determined?¶

The consumption FOC (11.30) is enforced exactly by inversion rather than as a residual: given the network’s prediction of $\hat{\lambda}_t$ , consumption is recovered algebraically as

c_t \;=\; \bigl(\hat{\lambda}_t \cdot A_t^{1/\psi - 1}\,L_t^{-1}\bigr)^{-\psi},

(11.38)

so $c_t$ is not itself a network output. Positivity of $c_t$ is guaranteed because the implementation passes $\hat{\lambda}_t$ through a softplus activation, so $\hat{\lambda}_t > 0$ for every input.

11.7.0.4How is the abatement multiplier $\hat{\lambda}^\mu_t$ determined?¶

The same trick handles the abatement FOC (11.33): rather than have the network output $\hat{\lambda}^\mu_t$ and impose the FOC as a separate residual, we solve the FOC for $\hat{\lambda}^\mu_t$ and treat the resulting implied multiplier as a deterministic function of the other network outputs. Setting $\partial\mathcal{L}/\partial\mu_t = 0$ in (11.33) yields

\hat{\lambda}^{\mu,\mathrm{impl}}_t \;=\; -\hat{\lambda}_t\,\Theta'(\mu_t)\,k_t^{\alpha} \;-\; \hat{\nu}^{\mathrm{AT}}_t\,\sigma_t\, A_t\, L_t\,k_t^{\alpha}.

(11.39)

Plugged into the Fischer--Burmeister condition (11.36), this is the residual $l_8$ below. Two facts come for free. First, whenever $l_8 = 0$ holds and the smoothing parameter $\varepsilon_{\mathrm{FB}}$ is small, the abatement FOC also holds automatically, because $l_8$ couples the implied multiplier to the slack $1-\mu_t$ . Second, the network output dimension drops from nine to eight, which improves training stability: the network no longer has to discover that $\hat{\lambda}^\mu_t$ is exactly the right algebraic combination of $\hat{\lambda}_t,\, \mu_t,\, \hat{\nu}^{\mathrm{AT}}_t$ .

The network architecture uses two hidden layers with 1024 units each, SELU activation, and the Adam optimizer with learning rate 10^-5. Training alternates between broad sampling (Phase 1) and endogenous simulation (Phase 2), as described in Chapter Chapter 3.

11.7.0.5The 8 loss components.¶

Each remaining equilibrium condition from Paragraph becomes a residual $l_m = 0$ , and the network is asked to drive every $l_m$ to zero simultaneously along simulated paths. The mapping is one-for-one: $l_1$ is the capital-Euler FOC (11.32); $l_2$ is the budget constraint that closes (11.4); $l_3$ , $l_4$ , $l_5$ are the three carbon-reservoir envelope conditions, of which $l_3$ is (11.34); $l_6$ and $l_7$ are the two temperature-layer envelopes; and $l_8$ is the Fischer--Burmeister smoothing (11.36) of the KKT slack on $\mu_t \le 1$ , evaluated at the implied multiplier (11.39). The consumption FOC (11.30) and the abatement FOC (11.33) are enforced exactly via the inversions in (11.38) and (11.39), which is why the loss list contains eight entries instead of nine. Written out, the eight components are:

l_1 := \exp\!\bigl(g^A_t + g^L_t\bigr)\,\hat{\lambda}_t - \hat{\beta}_t\Bigl\{\hat{\lambda}_{t+1}\bigl[\bigl(1-\Omega(T_{\mathrm{AT},t+1}) - \Theta(\mu_{t+1})\bigr)\alpha k_{t+1}^{\alpha-1} + (1-\delta)\bigr] \nonumber

(11.40)

\quad + \hat{\nu}^{\mathrm{AT}}_{t+1}\,\sigma_{t+1}(1-\mu_{t+1})A_{t+1}L_{t+1}\alpha k_{t+1}^{\alpha-1}\Bigr\} \tag*{\text{(capital Euler)}}

(11.41)

l_2 := \bigl(1-\Omega(T_{\mathrm{AT},t}) - \Theta(\mu_t)\bigr)\,k_t^\alpha + (1-\delta)\,k_t - c_t - \exp\!\bigl(g^A_t + g^L_t\bigr)\,k_{t+1} \tag*{\text{(budget)}}

(11.42)

l_3 := \hat{\nu}^{\mathrm{AT}}_t - \hat{\beta}_t\!\left[\hat{\nu}^{\mathrm{AT}}_{t+1}(1-b_{12}) + \hat{\nu}^{\mathrm{UO}}_{t+1}\,b_{12} + \hat{\eta}^{\mathrm{AT}}_{t+1}\,c_1\,F_{\mathrm{2\times CO_2}}\,\tfrac{1}{\ln 2\,M_{\mathrm{AT},t+1}}\right] \tag*{\text{(atm.\ carbon)}}

(11.43)

l_4 := \hat{\nu}^{\mathrm{UO}}_t - \hat{\beta}_t\!\Bigl[\hat{\nu}^{\mathrm{AT}}_{t+1}\,b_{12}\,\tfrac{M^{\mathrm{AT}}_{\mathrm{EQ}}}{M^{\mathrm{UO}}_{\mathrm{EQ}}} + \hat{\nu}^{\mathrm{UO}}_{t+1}\!\Bigl(1-b_{12}\tfrac{M^{\mathrm{AT}}_{\mathrm{EQ}}}{M^{\mathrm{UO}}_{\mathrm{EQ}}}-b_{23}\Bigr) + \hat{\nu}^{\mathrm{LO}}_{t+1}\,b_{23}\Bigr] \tag*{\text{(upper ocean C)}}

(11.44)

l_5 := \hat{\nu}^{\mathrm{LO}}_t - \hat{\beta}_t\!\Bigl[\hat{\nu}^{\mathrm{UO}}_{t+1}\,b_{23}\,\tfrac{M^{\mathrm{UO}}_{\mathrm{EQ}}}{M^{\mathrm{LO}}_{\mathrm{EQ}}} + \hat{\nu}^{\mathrm{LO}}_{t+1}\!\Bigl(1-b_{23}\,\tfrac{M^{\mathrm{UO}}_{\mathrm{EQ}}}{M^{\mathrm{LO}}_{\mathrm{EQ}}}\Bigr)\Bigr] \tag*{\text{(lower ocean C)}}

(11.45)

l_6 := \hat{\eta}^{\mathrm{AT}}_t - \hat{\beta}_t\!\Bigl[-\hat{\lambda}_{t+1}\,\Omega'(T_{\mathrm{AT},t+1})\,k_{t+1}^\alpha + \hat{\eta}^{\mathrm{AT}}_{t+1}\!\Bigl(1-c_1\,\tfrac{F_{\mathrm{2\times CO_2}}}{\Delta T_{\mathrm{AT},\times 2}}-c_1 c_3\Bigr) + \hat{\eta}^{\mathrm{OC}}_{t+1}\,c_4\Bigr] \tag*{\text{(atm.\ temp.)}}

(11.46)

l_7 := \hat{\eta}^{\mathrm{OC}}_t - \hat{\beta}_t\!\left[\hat{\eta}^{\mathrm{AT}}_{t+1}\,c_1 c_3 + \hat{\eta}^{\mathrm{OC}}_{t+1}(1-c_4)\right] \tag*{\text{(ocean temp.)}}

(11.47)

l_8 := \hat{\lambda}^{\mu,\mathrm{impl}}_t + (1-\mu_t) - \sqrt{(\hat{\lambda}^{\mu,\mathrm{impl}}_t)^2 + (1-\mu_t)^2 + \varepsilon_{\mathrm{FB}}} \tag*{\text{(Fischer--Burmeister, implied multiplier)}}

(11.48)

Loss components $l_1$ -- $l_2$ enforce intertemporal optimality and feasibility, $l_3$ -- $l_7$ are the envelope conditions that price the five climate state variables, and $l_8$ jointly enforces the abatement FOC (via the implied multiplier) and the upper-bound complementarity $\mu_t \le 1$ .

11.7.0.6Total loss.¶

The DEQN loss aggregates all residuals along a simulated path:

\ell_\rho \;:=\; \frac{1}{N_{\text{path}}} \sum_{\bm{x}_t\,\text{on sim.\ path}} \;\sum_{m=1}^{8}\; \bigl(l_m(\bm{x}_t,\, \mathcal{N}_\rho(\bm{x}_t))\bigr)^2.

(11.49)

This is the same sum-of-squared-residuals structure as the $N$ -country IRBC model of Chapter Chapter 3, but with 8 equations per time step instead of the IRBC’s $2N+1$ ( $N$ Euler equations, $N$ Fischer--Burmeister conditions, and one aggregate resource constraint).

11.7.0.7State evolution.¶

To evaluate the loss along a simulated path, the full state vector (11.25) must be propagated forward. In CDICE the next-period state is:

\bm{x}_{t+1} = \bigl(k_{t+1},\; M^{\mathrm{AT}}_{t+1},\; M^{\mathrm{UO}}_{t+1},\; M^{\mathrm{LO}}_{t+1},\; T^{\mathrm{AT}}_{t+1},\; T^{\mathrm{OC}}_{t+1},\; \mu_{f,t+1},\; S_{f,t+1},\; \tau_{t+1};\; \theta\bigr)^T,

(11.50)

where:

$k_{t+1}$ comes from the network output (11.37) (choice variable);
$M^{\mathrm{AT}}_{t+1}$ , $M^{\mathrm{UO}}_{t+1}$ , $M^{\mathrm{LO}}_{t+1}$ , $T^{\mathrm{AT}}_{t+1}$ , $T^{\mathrm{OC}}_{t+1}$ are computed from the transition equations of Section Section 11.2 (carbon cycle and temperature dynamics);
the Bayesian belief states $\mu_{f,t+1}$ and $S_{f,t+1}$ are updated via the conjugate posterior (11.58)--(11.59) once the period- $t$ temperature observation is realized;
the bounded time index advances as $\tau_{t+1} = 1 - \exp\!\bigl(-\vartheta\,(t+\Delta_t)\bigr)$ , the image of the calendar increment $t \mapsto t+\Delta_t$ under the time-rescaling (11.24);
the pseudo-state parameters $\theta$ are held fixed within an episode and re-sampled across episodes (Section Section 11.11).

All deterministic transitions are differentiable; stochastic shock draws are handled via reparameterization / common random numbers, so the simulate-then-backpropagate loop can be executed end-to-end with automatic differentiation.

Remark 11.2

Detrend variables that grow with $A_t L_t$ (Eq. (11.23)).
Map unbounded time to $[0,1)$ via $\tau = 1 - e^{-\vartheta t}$ (Eq. (11.24)).
Normalize all Lagrange multipliers by $A_t^{1-1/\psi}\,L_t$ (Eq. (11.28)).
Derive FOCs from the Lagrangian, both economic and climate constraints (Eqs. (11.30)--(11.34)).
Enforce static FOCs by inversion: invert the consumption FOC for $c_t$ (Eq. (11.38)) and solve the abatement FOC for the implied multiplier $\hat{\lambda}^{\mu,\mathrm{impl}}_t$ (Eq. (11.39)).
Smooth the upper-bound KKT complementarity via Fischer--Burmeister, evaluated at the implied multiplier (Eq. (11.36)).
Form the loss as the sum of squared residuals over the 8 remaining conditions (Eq. (11.49)).
Train as in the stationary DEQN: simulate $\to$ record loss $\to$ backprop $\to$ repeat.

This is the deterministic CDICE-DEQN solver in its entirety. Companion notebook 02_DICE_DEQN_Library_Port.ipynb trains it against the CDICE library reference solution of Folini et al. (2025); the verification gate inside that notebook is the natural stopping point for a reader who wants only the deterministic core.

11.8From CDICE to Stochastic IAMs¶

The deterministic CDICE-DEQN of Section 11.7, together with the AR(1) productivity extension developed in the remarkbox below, is the right pedagogical anchor because it contains every mechanical component of an integrated assessment model: capital accumulation, emissions, carbon diffusion, temperature dynamics, damages, abatement costs, and the SCC as a shadow price. It is not yet the object one wants for quantitative climate-policy research. Three features are still missing.

First, climate policy is an intrinsically stochastic problem. Productivity, carbon intensity, damages, climate feedbacks, and tipping thresholds are not known constants. Once they are stochastic, a carbon tax is not a path but a state-contingent policy. Second, long-run climate risk makes time-additive CRRA preferences too restrictive: the intertemporal elasticity of substitution and risk aversion should be separate parameters. Third, climate policy is distributional. The representative-agent SCC answers a marginal pricing question, but an implementable policy also asks which cohorts pay the tax and which cohorts receive the transfers. This is the point at which the chapter moves from representative-agent DICE to stochastic overlapping-generations IAMs.

The transition is smooth if one keeps the computational object fixed. In every case the neural network approximates a policy map

\begin{aligned} u_t &= \mathcal N_\rho(\tilde{\bm x}_t), \\ \tilde{\bm x}_t &= (\text{economic states},\ \text{climate states},\ \text{beliefs},\ \text{parameters},\ \text{policy-rule coefficients}), \end{aligned}

(11.51)

and the loss is still a sum of normalized equilibrium residuals. The only changes are the variables appended to $\tilde{\bm x}_t$ and the conditional expectations appearing in the residuals. Table Table 11.4 summarizes the sequence.

Table 11.4:The layers of the climate-economy pipeline used in the remainder of the chapter. Each layer is a small extension of the previous one; no new numerical paradigm is introduced after the deterministic CDICE-DEQN.

Layer	Economic question	Computational change
Deterministic CDICE (Section 11.7)	What is the globally optimal abatement path and SCC at the baseline calibration?	Time-stamped DEQN; eight residuals; horizon $T_{\max}$ chosen so discounting absorbs transversality.
Stochastic DICE (AR(1) + GH quadrature, see the productivity-shock remarkbox below)	How do shocks alter the SCC distribution?	Add shock states; replace future terms by Gauss--Hermite expectations.
Bayesian learning on ECS (Section 11.9)	How does learning about climate sensitivity alter the SCC distribution?	Add belief mean and belief variance as states; one signal equation; conjugate Gaussian update.
Epstein--Zin DICE (Section 11.10)	How do risk aversion and IES separately price climate tails?	Add the value level as a network output; add one recursion residual and an EZ continuation-value weight.
Deep UQ (Section 11.11)	Which uncertain parameters drive SCC variation?	Treat parameters as pseudo-states; fit a GP surrogate for the QoI; compute Sobol, Shapley, and univariate effects.
Stochastic OLG-IAM (Section 11.12)	Can carbon taxes be welfare improving and Pareto improving across cohorts?	Treat tax coefficients and transfer shares as pseudo-states; fit GP surrogates for cohort welfare; solve constrained policy design on the surrogate.

Remark 11.3

Real climate--economy interactions are shot through with stochastic shocks. The minimal stochastic extension that already lets us reproduce the qualitative SCC fan-chart structure of Cai & Lontzek (2019) on a laptop adds an AR(1) shock to log TFP:

z_{t+1} = \rho_z\, z_t + \sigma_z\, \varepsilon_{t+1}, \qquad \varepsilon_{t+1} \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0,1),

(11.52)

with effective TFP $A_t \exp(z_t)$ . The state vector (11.25) acquires a new entry,

\tilde{\bm{x}}_t = \bigl(k_t,\; M^{\mathrm{AT}}_t,\; M^{\mathrm{UO}}_t,\; M^{\mathrm{LO}}_t,\; T^{\mathrm{AT}}_t,\; T^{\mathrm{OC}}_t,\; \tau_t,\; z_t\bigr)^\top,

(11.53)

and each forward-looking residual (11.41)--(11.47) acquires a conditional expectation over $\varepsilon_{t+1}$ ; the capital Euler, for example, becomes

\begin{aligned} e^{g^A_t + g^L_t}\,\hat{\lambda}_t \;=\; \hat{\beta}_t\,\mathbb{E}_t\Bigl[\; &\hat{\lambda}_{t+1}\bigl(\bigl(1-\Omega(T^{\mathrm{AT}}_{t+1}) - \Theta(\mu_{t+1})\bigr)\alpha\,k_{t+1}^{\alpha-1} + (1-\delta)\bigr) \\ &\;+\; \hat{\nu}^{\mathrm{AT}}_{t+1}\,\sigma_{t+1}(1-\mu_{t+1})\,A_{t+1}L_{t+1}\,\alpha\,k_{t+1}^{\alpha-1}\,\Bigr]. \end{aligned}

(11.54)

With $\varepsilon_{t+1}$ Gaussian, the conditional expectation is evaluated with a small number of Gauss--Hermite nodes $\{(\xi_q, w_q)\}_{q=1}^Q$ ,

\mathbb{E}_t[f(\varepsilon_{t+1})] \;\approx\; \frac{1}{\sqrt{\pi}}\sum_{q=1}^{Q} w_q\, f\bigl(\sqrt{2}\,\xi_q\bigr),

(11.55)

and each residual is replaced by its stochastic counterpart

l_m^{\mathrm{stoch}}(\tilde{\bm{x}}_t,\, \mathcal{N}_\rho) \;=\; \frac{1}{\sqrt{\pi}}\sum_{q=1}^{Q} w_q\, l_m\bigl(\tilde{\bm{x}}_t,\, \mathcal{N}_\rho;\, \varepsilon_{t+1} = \sqrt{2}\,\xi_q\bigr),

(11.56)

which the total loss (11.49) then aggregates as before. In practice $Q = 5$ nodes drive the quadrature error well below the training-noise floor; the GH evaluation is fully differentiable, so the autodiff backward pass is unchanged. When several independent shock dimensions appear simultaneously (productivity shock $\varepsilon_{t+1}$ , learning innovation $\tilde\epsilon_{T,t+1}$ , and the EZ certainty-equivalent integrand of Section 11.10), each conditional expectation is taken under a tensor product of one-dimensional GH rules at $Q$ nodes per dimension, i.e. $Q^d$ total nodes for $d$ shock dimensions; the autodiff backward pass traverses the quadrature unchanged. Forward-simulating $N_{\mathrm{MC}}$ trajectories of the AR(1) shock produces a Monte-Carlo SCC fan chart whose right-tail mass is the channel of Cai & Lontzek’s headline result. Companion notebook 03_Stochastic_DICE_DEQN.ipynb trains this stochastic extension end-to-end and is the natural anchor for Exercise 11.7.

11.9Bayesian Learning About Climate Sensitivity¶

11.9.0.1Why ECS is the natural learning state.¶

The equilibrium climate sensitivity (ECS), defined as the long-run atmospheric warming from a doubling of CO $_2$ , is the single most consequential and most uncertain parameter in the climate side of an IAM. Observational, paleoclimate, and model-based estimates place ECS in a likely (66%) range of roughly 2.5--4°C and a very-likely (90%) range of 2--5°C Sherwood et al., 2020Knutti et al., 2017Roe & Baker, 2007, and ECS uncertainty is the largest single contributor to SCC dispersion across model variants. Crucially, ECS is partially identified from temperature realizations conditional on emissions and forcing: a Bayesian planner who observes temperature paths can therefore update her posterior period by period, and the policy that maximizes ex-ante welfare conditions on the current posterior rather than on a fixed point estimate.

11.9.0.2How learning enters the state.¶

Promote the climate-feedback parameter $\lambda$ in (11.16) to a stochastic object by adding the feedback term $\varphi_{1C}\,\tilde f_{t+1}\,T^{\mathrm{AT}}_t$ to the right-hand side, where $\varphi_{1C}$ is a calibrated coupling coefficient (taken from Friedl et al. (2023)) and $\tilde f_{t+1} \sim \mathcal N(\mu_{f,t}, S_{f,t})$ is a per-period draw under the planner’s posterior over the unknown climate-feedback deviation. The unknown itself is time-invariant; the subscript $t{+}1$ indexes the period in which the subjective draw enters the temperature equation, and the planner’s posterior moments $(\mu_{f,t}, S_{f,t})$ shift over time as new temperature observations arrive. The planner observes the temperature-residual signal

y_{t+1} \;:=\; \varphi_{1C}\,T^{\mathrm{AT}}_t\,\tilde f_{t+1} \;+\; \tilde\epsilon_{T,t+1},\qquad \tilde\epsilon_{T,t+1} \sim \mathcal N(0, S_{\epsilon_T}),

(11.57)

and conjugate Gaussian--Gaussian updating delivers the posterior

\mu_{f,t+1} = \frac{S_{\epsilon_T}\,\mu_{f,t} + \varphi_{1C}\,T^{\mathrm{AT}}_t\,S_{f,t}\,y_{t+1}}{S_{\epsilon_T} + (\varphi_{1C}\,T^{\mathrm{AT}}_t)^2\,S_{f,t}}

(11.58)

S_{f,t+1} = \frac{S_{\epsilon_T} \cdot S_{f,t}}{S_{\epsilon_T} + (\varphi_{1C}\,T^{\mathrm{AT}}_t)^2\,S_{f,t}}

(11.59)

which the planner takes as two additional laws of motion for the belief states $(\mu_{f,t}, S_{f,t})$ . These two states occupy the slots already reserved in the augmented state vector (11.25). Equations (11.58)--(11.59) are the Kalman update for a scalar linear-Gaussian state-space model with observation gain $\varphi_{1C}\,T^{\mathrm{AT}}_t$ and noise variance $S_{\epsilon_T}$ ; cf. Bishop (2006) [§ 13.3] for the generic derivation. The DEQN algorithm of Section 11.5 is unchanged: the network simply receives two more inputs and learns a richer policy.

11.9.0.3Where this sits in the literature.¶

Bayesian learning about climate parameters in an integrated assessment frame has a long pedigree. Kelly & Kolstad (1999) and Kelly & Tan (2015) establish the basic Kelly--Kolstad result that learning takes decades to centuries in calibrated DICE-like settings, and that the tradeoff between mitigation (which lowers temperature variance) and information (which requires informative temperature paths) is sharp. Leach (2007) and Webster et al. (2008) sharpen the slow-learning result and quantify the policy errors induced by treating uncertainty as resolved too quickly. On the dynamic-programming side, Cai & Lontzek (2019) solve a stochastic-DICE variant with tipping-point hazards and recursive preferences. The robust-control program of Anderson et al. (2014), Barnett (2023), Barnett et al. (2023), and Barnett et al. (2020) addresses a complementary question (planner ambiguity over the data-generating process), and modern deep-learning solutions are the natural computational companion because tensor-product grids over belief states are infeasible at realistic state-vector sizes.

11.9.0.4Headline result from the UQ literature.¶

Friedl et al. (2023) solve the joint stochastic-DICE--Bayesian-learning DEQN with the methodology of this chapter and find two qualitative features that survive across the calibration cloud.^[1] First, ECS uncertainty is largely resolved within roughly ten years of optimal policy: the posterior variance $S_{f,t}$ shrinks by an order of magnitude over the first decade of the planner’s horizon, even though the absolute posterior mean takes longer to settle. Second, the SCC under learning is roughly half the no-learning SCC for moderate true ECS values, and roughly the same as the no-learning SCC at the upper tail of the ECS distribution; learning is a strong substitute for precautionary mitigation in the moderate-ECS regime, and a weak substitute in the tail-ECS regime. The asymmetry is policy-relevant: the value of waiting to learn falls sharply once the planner suspects she is in the tail. The broader teaching point is that uncertainty is not automatically a reason to abate more: its policy effect depends on whether the uncertainty is static, learnable, or associated with irreversible tail risk. Figure Figure 11.5 illustrates the two qualitative features.

Schematic of the two qualitative features reported by . Left: posterior variance S_{f,t} relative to its prior value, on a logarithmic scale. The variance falls by roughly an order of magnitude over the first decade, mirroring the Kelly--Kolstad slow-learning result but accelerated by the deeper signal--noise ratio of the modern climate calibration. Right: \mathrm{SCC}_0 as a function of the true ECS, with and without Bayesian learning. Learning approximately halves the SCC at moderate ECS values where uncertainty is the dominant driver of precautionary abatement, but converges to the no-learning curve at the upper tail where the underlying physical damage dominates. Curves are illustrative; the magnitudes are those quoted in the body text. — Figure 11.5:Schematic of the two qualitative features reported by Friedl *et al.* (2023). Left: posterior variance $S_{f,t}$ relative to its prior value, on a logarithmic scale. The variance falls by roughly an order of magnitude over the first decade, mirroring the Kelly--Kolstad slow-learning result but accelerated by the deeper signal--noise ratio of the modern climate calibration. Right: $\mathrm{SCC}_0$ as a function of the true ECS, with and without Bayesian learning. Learning approximately halves the SCC at moderate ECS values where uncertainty is the dominant driver of precautionary abatement, but converges to the no-learning curve at the upper tail where the underlying physical damage dominates. Curves are illustrative; the magnitudes are those quoted in the body text.

11.10Epstein--Zin Preferences¶

11.10.0.1Why recursive preferences for climate.¶

The time-additive CRRA-IES aggregator (11.5) ties risk aversion and intertemporal substitution together. Climate policy is exactly the environment in which this restriction is least attractive. A planner may want a high IES $\psi$ to govern intertemporal substitution across long horizons, and a separate high coefficient of relative risk aversion $\gamma_u$ to price low-probability climate disasters. Recursive Kreps--Porteus preferences, following Epstein & Zin (1989) and Weil (1989), implement this separation.^[2]

Working with the normalized per-capita value $v_t$ and per-capita consumption $c_t = C_t/L_t$ , and writing $\beta^{\mathrm{EZ}}_t := \exp(-\rho\,\Delta_t)$ for the one-period Epstein--Zin discount factor, the recursion is

v_t = \left[ (1-\beta^{\mathrm{EZ}}_t)\, c_t^{1-1/\psi} + \beta^{\mathrm{EZ}}_t \left(\mathbb E_t\!\left[v_{t+1}^{1-\gamma_u}\right]\right)^{\frac{1-1/\psi}{1-\gamma_u}} \right]^{\frac{1}{1-1/\psi}},

(11.60)

with the usual logarithmic limits when $\psi = 1$ or $\gamma_u = 1$ , subject to the same budget constraint, capital-accumulation law, and climate dynamics as before.

11.10.0.2What changes in the DEQN loss.¶

The value level $v_t$ becomes an additional network output, paired with a ninth residual that enforces the recursion (11.60):

\mathcal R^{\mathrm{EZ}}_t = v_t - \left[ (1-\beta^{\mathrm{EZ}}_t)\, c_t^{1-1/\psi} + \beta^{\mathrm{EZ}}_t \left(\mathbb E_t\!\left[v_{t+1}^{1-\gamma_u}\right]\right)^{\frac{1-1/\psi}{1-\gamma_u}} \right]^{\frac{1}{1-1/\psi}}.

(11.61)

In the deterministic CRRA-IES core of Section 11.7, $v_t$ never appears explicitly, which is why eight residuals suffice there. The Euler and costate residuals of Paragraph keep their deterministic form but receive a Bansal--Yaron certainty-equivalent weighting inside each conditional expectation. It is convenient to write the one-step recursive-pricing kernel as

\mathcal M^{\mathrm{EZ}}_{t,t+1} = \hat\beta_t \left( \frac{v_{t+1}}{\left(\mathbb E_t[v_{t+1}^{1-\gamma_u}]\right)^{1/(1-\gamma_u)}} \right)^{1/\psi - \gamma_u} \left(\frac{c_{t+1}}{c_t}\right)^{-1/\psi},

(11.62)

where $\hat\beta_t$ inherits the deterministic growth normalization of (11.29). In the code, (11.62) is just a multiplicative weight on next-period marginal-value terms; the certainty-equivalent operator inside the Kreps--Porteus aggregator becomes a second nested expectation. The DEQN loss inherits one extra Gauss--Hermite quadrature step and one extra network output, but no new algorithmic ingredient.

11.10.0.3Interpretation for the SCC.¶

Crost & Traeger (2013) and Crost & Traeger (2014) establish the analytic baseline: in a deterministic IAM, decoupling risk aversion from the IES changes the optimal carbon tax only when stochastic risk is present, but the change can be quantitatively large once it is. Jensen & Traeger (2014) and Traeger (2023) Traeger (2021) extend the result to closed-form ACE-class settings and show that for reasonable risk aversion above $1/\psi$ , the SCC roughly doubles relative to CRRA; Cai & Lontzek (2019) reach the same conclusion in a fully stochastic DICE variant. Intuitively, recursive preferences change the SCC because carbon emissions affect the distribution of long-run consumption, not only its mean: if damages create low-consumption tail states, a high $\gamma_u$ raises the SCC through the disaster-insurance channel. The sign of the IES effect depends on which shock dominates: in TFP-driven economies higher $\psi$ dampens the SCC because consumption smoothing absorbs the productivity risk, whereas in temperature-driven economies higher $\psi$ amplifies the SCC because the planner cares more about late-horizon consumption losses. Bansal et al. (2016) make the asset-pricing case for the same channel: long-run temperature shifts price into expected returns through the EZ aggregator, and ignoring them understates the welfare cost of carbon emissions. This is why stochastic DICE with Epstein--Zin preferences is a better teaching object than deterministic DICE for climate-finance questions: it connects welfare, tail risk, and asset-pricing logic in a single equilibrium loss.

11.11Deep Uncertainty Quantification via Surrogates¶

Deep UQ answers a different question from solving one stochastic IAM. The object is now a scalar quantity of interest,

q(\theta) = \mathrm{SCC}_{2100}(\theta), \qquad \theta\in\Theta\subset\mathbb R^{d_\theta},

(11.63)

where $\theta$ collects uncertain structural parameters: the ECS or its prior mean $\mu_{f,0}$ , the prior variance $S_{f,0}$ , the pure rate of time preference $\rho$ , the IES $\psi$ , risk aversion $\gamma_u$ , the damage curvature $\pi_2$ , and any tipping parameters included in the experiment. Direct global sensitivity analysis would require solving the IAM thousands of times. Deep UQ replaces this infeasible outer loop by two amortizations.

11.11.0.1Amortization 1: parameters as pseudo-states.¶

The pseudo-state trick of Friedl et al. (2023) collapses the outer loop into a single DEQN training pass. Uncertain parameters $\theta$ are appended to the network’s input,

\tilde{\bm{x}}_t = \bigl(\underbrace{\bm x_t}_{\text{economic + climate states}},\; \underbrace{\theta}_{\text{uncertain parameters}}\bigr),\qquad u_t = \mathcal N_\rho(\tilde{\bm x}_t),

(11.64)

held fixed within each simulation episode and re-sampled across episodes from a design distribution $\mathcal D_\theta$ . One trained network therefore approximates the policy function for every $\theta$ in $\mathcal D_\theta$ ; evaluating any new $\theta$ requires only a forward pass. For very large pseudo-state dimensions the active-subspace methods of Section 9.5 compress $\theta$ before the next step. This is the same idea as the parameterized policy networks in Chapter Chapter 10; here the target is not an SMM criterion but an SCC distribution.

11.11.0.2Amortization 2: a GP for the quantity of interest.¶

After training, the DEQN is evaluated at a design set $\{\theta_i\}_{i=1}^n$ and the corresponding QoI values $q_i = q(\theta_i)$ are computed by forward simulation. Fit a Gaussian-process surrogate

q(\theta) = m(\theta) + \varepsilon(\theta), \qquad m(\theta)\mid \{(\theta_i,q_i)\}_{i=1}^n \sim \mathcal{GP}\bigl(\mu_n(\theta),\, k_n(\theta,\theta')\bigr).

(11.65)

The GP is cheap enough to evaluate millions of times, so the expensive IAM is no longer called inside Sobol, Shapley, or univariate-effect estimators. Bayesian active learning improves the design by adding points where the GP posterior uncertainty is largest or where integrated posterior variance is most reduced, following the toolkit of Chapter Chapter 9 (see Figure Figure 10.1 and Table Table 10.1).

11.11.0.3Sobol, Shapley, univariate effects.¶

Three complementary global sensitivity indices answer different questions about how $\theta$ drives the SCC. The first-order Sobol index $S_i$ of Sobol (2001) measures the share of output variance explained by $\theta_i$ alone,

S_i = \frac{\mathrm{Var}\bigl(\mathbb E[q(\theta)\mid\theta_i]\bigr)}{\mathrm{Var}(q(\theta))},

(11.66)

and the total-effect index captures both direct and interaction effects,

S_i^{\mathrm{tot}} = 1 - \frac{\mathrm{Var}\bigl(\mathbb E[q(\theta)\mid\theta_{-i}]\bigr)}{\mathrm{Var}(q(\theta))}.

(11.67)

For independent inputs the $\{S_i\}$ sum to at most one, while the $\{S_i^{\mathrm{tot}}\}$ can exceed one in the presence of interactions; equality $\sum_i S_i^{\mathrm{tot}} = 1$ characterizes additive models. Shapley effects, introduced into sensitivity analysis by Owen (2014) and developed further by Song et al. (2016) and Iooss & Prieur (2019), allocate $\mathrm{Var}(q)$ across parameters via cooperative-game averaging over all subsets of other parameters Shapley, 1953, sum exactly to $\mathrm{Var}(q)$ (raw) or one (normalized), and handle correlated inputs cleanly. Univariate-effect plots show the conditional mean $\mathbb E[q(\theta)\mid\theta_i]$ as $\theta_i$ varies and capture the directional response that Sobol indices average over. Saltelli & D'Hombres (2010) and Saltelli et al. (2008) give the standard estimators and best-practice warnings.

Remark 11.5

Choose a parameter domain $\Theta$ and a sampling law $\mathcal D_\theta$ for the uncertain climate, damage, and preference parameters.
Train the stochastic CDICE-DEQN on $(\bm x_t, z_t, \mu_{f,t}, S_{f,t}, \theta)$ , resampling $\theta$ across episodes and holding it fixed within an episode.
Generate $n$ design evaluations $\{(\theta_i, q_i)\}_{i=1}^n$ from the trained network, where $q_i$ is typically $\mathrm{SCC}_{2100}$ or an expected welfare functional.
Fit a GP surrogate $\theta \mapsto q(\theta)$ , validate by leave-one-out cross-validation (target: LOO $R^2 \ge 0.95$ or LOO RMSE below $5\%$ of the QoI standard deviation), and enrich the design with Bayesian active learning if the threshold is not met.
Compute Sobol, Shapley, and univariate effects on the GP surrogate, not on the structural model.

The reason this pipeline is the only feasible route is computational: direct Monte Carlo on Sobol or Shapley indices requires $O(10^4)$ to $O(10^6)$ evaluations of the structural model at fresh $\theta$ draws. Even at one DEQN solve per parameter vector, that price tag is several core-decades. The DEQN-with-pseudo-states amortizes one loop, and the GP surrogate amortizes the other; the sensitivity indices are then computed on the GP rather than on the IAM.

11.11.0.4Empirical headline.¶

Friedl et al. (2023) apply the pipeline to a stochastic DICE variant with Epstein--Zin preferences and Bayesian learning, and find that two ingredients dominate the SCC variance across 2020--2100: the mean of the ECS belief (roughly 50--70% of the total-effect Sobol share) and the curvature parameter of the damage function (roughly 15--25%).^[3] Together these account for 70--90% of the SCC variance. Risk aversion contributes a few percentage points; the pure rate of time preference and the IES contribute negligibly once damage curvature is conditioned on. The policy lesson is that under deep uncertainty the SCC should be reported as a distribution, not a point estimate, and that climate-policy design should target tail insurance against the upper ECS--damage corner rather than precision over the central calibration. Figure Figure 11.6 sketches the resulting variance decomposition.

Schematic of the total-effect Sobol shares of \mathrm{SCC}_{2100} variance reported by . Midpoints reflect the ranges quoted in the text (ECS mean 50--70%, damage curvature 15--25%), with horizontal error bars on the two leading parameters indicating the spread across horizon dates and damage-function specifications. The shape, two parameters carrying almost the entire variance, is what motivates the tail-insurance framing in the closing paragraph. — Figure 11.6:Schematic of the total-effect Sobol shares of $\mathrm{SCC}_{2100}$ variance reported by Friedl *et al.* (2023). Midpoints reflect the ranges quoted in the text (ECS mean 50--70%, damage curvature 15--25%), with horizontal error bars on the two leading parameters indicating the spread across horizon dates and damage-function specifications. The shape, two parameters carrying almost the entire variance, is what motivates the tail-insurance framing in the closing paragraph.

11.12Constrained Pareto-Improving Carbon Tax in OLG-IAMs¶

The SCC analysis of Section 11.11 is still the marginal welfare cost of one extra ton of carbon to a representative agent. Climate policy, however, redistributes welfare across cohorts: today’s workers pay abatement costs while tomorrow’s households inherit a cooler planet. A Pareto-improving carbon tax must transfer enough revenue back to current cohorts that no generation is worse off than under business-as-usual. This section closes Movement 3 by walking through the constrained-Pareto OLG-IAM of Kübler et al. (2026), reusing the DEQN-with-pseudo-states machinery of Section 11.11 and the GP surrogate of Chapter Chapter 9. The Pareto-improvement criterion is closely related to the social-security reform literature Krueger & Kubler, 2006, to recent work on intergenerational climate policy Karp et al., 2024Kotlikoff et al., 2021, and to the constrained-optimal-tax frontier of Douenne et al. (2024).

11.12.0.1Notation reset for this section.¶

The OLG-IAM uses different conventions than the representative-agent CDICE block of Section 11.2, following Kübler et al. (2026), and we summarize the differences here so the reader is not surprised. $\Omega_t(T_t)$ now denotes the retained-output factor, so net output is $\Omega_t \Phi K^\alpha L^{1-\alpha}$ rather than $(1-\Omega-\Theta)Y^{\mathrm{gross}}$ . $p^{\mathrm{tax}}_t$ is the carbon tax (a per-tCO $_2$ price); to avoid clashing with the transformed-time variable $\tau_t$ of Section 11.3, this section uses $p^{\mathrm{tax}}_t$ for the tax throughout, in line with the price-level interpretation. $e_t$ denotes the per-period emissions flow (in GtC), and $E_t = \sum_{s\le t} e_s$ is cumulative emissions through date $t$ ; this is the convention of the climate-emulator literature Dietz & Venmans, 2019 and of the companion paper, and it is the source of the section’s frequent “cumulative-emissions tax” phrasing. Finally, the policy vector that the planner ultimately optimizes over is $\vartheta = (\vartheta_{\mathrm{tax}}, \omega)$ , the joint vector of tax-rule coefficients and cohort transfer shares defined in Step 1 below.

11.12.0.2From CDICE to a TCRE emulator.¶

The OLG-IAM uses a much simpler climate side than the 5-state CDICE module of Section 11.2 (three carbon stocks plus two temperature layers). Once the planner’s horizon is converted to cumulative-emissions form $E_t = \sum_{s\le t} e_s$ , the linear Transient Climate Response to cumulative carbon Emissions (TCRE) approximation collapses the carbon-cycle and energy-balance machinery to a single algebraic relation $T^{\mathrm{AT}}_t \approx \sigma_{\mathrm{CCR}}\,E_t$ Dietz & Venmans, 2019, which removes five climate states from the planner’s optimization. The simplification is essential: it is what makes the OLG state space (12 cohort assets + 5 climate / shock states + $\vartheta$ pseudo-states) tractable end-to-end on a GPU. The reader who finds the change abrupt should treat the TCRE relation as a reduced-form summary of the same physics that drove Section 11.2.5--Section 11.2.6, fitted directly to long-run paths rather than block-by-block. Figure Figure 11.7 contrasts the two climate sides.

$Climate side of CDICE versus TCRE. The 5-state CDICE module on the left, in which atmospheric carbon, two ocean carbon reservoirs, atmospheric temperature, and ocean temperature all enter the planner’s state, is collapsed in the OLG-IAM to a single algebraic relation between cumulative emissions and atmospheric temperature, T^{\mathrm{AT}}_t \approx \sigma_{\mathrm{CCR}}\,E_t. The simplification trades fidelity to short-run climate dynamics for tractability of the 12-cohort heterogeneous-agent state space and is what makes the bilevel policy search of end-to-end feasible.$

Figure 11.7:Climate side of CDICE versus TCRE. The 5-state CDICE module on the left, in which atmospheric carbon, two ocean carbon reservoirs, atmospheric temperature, and ocean temperature all enter the planner’s state, is collapsed in the OLG-IAM to a single algebraic relation between cumulative emissions and atmospheric temperature, $T^{\mathrm{AT}}_t \approx \sigma_{\mathrm{CCR}}\,E_t$ . The simplification trades fidelity to short-run climate dynamics for tractability of the 12-cohort heterogeneous-agent state space and is what makes the bilevel policy search of Section 11.12 end-to-end feasible.

11.12.1The OLG-IAM Model¶

The model features $A=12$ overlapping generations of selfish agents (ages 20--80 in 5-year periods), a competitive firm, and a simplified, cumulative-emissions climate module in the spirit of Dietz & Venmans (2019):

Technology: Output is $Y_t = \Omega_t(T_t)\,\Phi(\mu_t)\,K_t^\alpha L_t^{1-\alpha}$ with retained-output damage factor $\Omega_t$ and net-of-abatement-cost factor $\Phi(\mu_t) = 1 - \theta_1\mu_t^{\theta_2}$ ; emissions are $e_t = (1-\mu_t)\kappa_t Y_t$ with stochastic carbon intensity $\kappa_t$ ; the period resource constraint is $C_t + K_{t+1} = Y_t + (1-\delta)K_t$ .
Households: Each agent maximizes $\mathbb{E}_t\sum_{j=1}^{A}\beta^{j-1}\, C_{t+j-1,j}^{1-\sigma_u}/(1-\sigma_u)$ (where $\sigma_u$ is the household CRRA risk-aversion coefficient, distinct from the climate-chapter notation $\sigma_t$ for emissions intensity) subject to the budget constraint $C_{t,j} + a_{t+1,j+1} = (1+r_t)\,a_{t,j} + w_t\,l_j + \mathbb{T}_{t,j}$ , where $\mathbb{T}_{t,j}$ is the transfer from carbon tax revenue and $j$ runs over all $A=12$ cohorts alive at $t$ (newborns included).
Climate: The climate emulator imposes a near-linear relationship between cumulative emissions and atmospheric temperature, $T^{\mathrm{AT}}_t \approx \sigma_{\mathrm{CCR}}\,E_t$ where $E_t = \sum_{s\le t} e_s$ is cumulative carbon, augmented by a stochastic tipping mechanism: damages depend on $T^{\mathrm{AT}}_t$ relative to a threshold $TP_t$ via a Weitzman-type retained-output factor that becomes steeply convex once $T^{\mathrm{AT}}_t$ approaches $TP_t$ .
Stochastic shocks: Carbon intensity $\kappa_t$ follows an AR(1) with time-varying persistence; the tipping threshold $TP_t$ follows a bounded random walk that becomes absorbing once it has been crossed.

The household Euler equation takes the standard form $C_{t,j}^{-\sigma_u} = \beta\,\mathbb{E}_t[(1+r_{t+1})\,C_{t+1,j+1}^{-\sigma_u}]$ for $j = 1,\ldots,A-1$ , and market clearing requires that aggregate savings equal the capital stock: $\sum_j a_{t,j} = K_t$ . Figure Figure 11.8 simulates this model without policy intervention; it fixes the business-as-usual (BAU) baseline against which every Pareto-improving policy below is benchmarked, and supplies the cohort-by-cohort participation constraints for the constrained policy search.

Business-as-usual baseline for the 12-cohort stochastic OLG-IAM of . Without policy intervention the median warming reaches roughly 3\,{}^\circC over the 150-year horizon, and the upper tail of damages is substantially larger than the mean. Every Pareto-improving policy below is benchmarked against this baseline, which also supplies the participation constraints for the constrained policy search. Figure extracted from . — Figure 11.8:Business-as-usual baseline for the 12-cohort stochastic OLG-IAM of Kübler *et al.* (2026). Without policy intervention the median warming reaches roughly $3\,{}^\circ$ C over the 150-year horizon, and the upper tail of damages is substantially larger than the mean. Every Pareto-improving policy below is benchmarked against this baseline, which also supplies the participation constraints for the constrained policy search. Figure extracted from Kübler *et al.* (2026).

11.12.2The 3-Step ML Pipeline¶

Finding an optimal carbon tax rule in this OLG economy is a bilevel optimization problem: the outer level searches over tax parameters, and the inner level solves the full stochastic general equilibrium for each candidate tax. Kübler et al. (2026) decompose this into three steps, summarized in Figure Figure 11.9:

Figure 11.9:Three-step machine-learning pipeline for constrained carbon-tax design. The DEQN amortizes equilibrium solution across tax parameters, the GP surrogate maps policy parameters to welfare and cohort utilities, and the final optimization imposes the Pareto constraints on the surrogate.

11.12.2.1Step 1: DEQN with pseudo-states.¶

The tax-rule coefficients $\vartheta_{\mathrm{tax}}$ and the $A=12$ transfer shares $\omega = (\omega_1,\ldots,\omega_{12})$ are appended to the state of the neural network as pseudo-states. The transfer shares are non-negative weights satisfying $\sum_{j=1}^{A} \omega_j = 1$ , with cohort $j$ ’s lump-sum transfer given by $\mathbb{T}_{t,j} = \omega_j\,p^{\mathrm{tax}}_t\,e_t$ from the government’s resource constraint $\sum_j \mathbb{T}_{t,j} = p^{\mathrm{tax}}_t\,e_t$ . The simplex constraint $\omega \in \Delta^{A-1}$ is enforced by sampling unconstrained logits and applying a softmax before feeding $\omega$ into the network, so the DEQN never sees an infeasible transfer profile. All cohorts alive at $t$ , including the newborn cohort, receive a transfer. The number of tax parameters depends on the rule: a simple linear rule on cumulative emissions has $\vartheta_{\mathrm{tax}} = (\vartheta_0,\vartheta_E) \in \mathbb{R}^2$ (so a 14-dimensional pseudo-state vector together with the 12 transfer shares), and a richer rule that adds dependence on carbon intensity and tipping has $\vartheta_{\mathrm{tax}} \in \mathbb{R}^4$ (a 16-dimensional pseudo-state vector with the 12 shares). The DEQN learns the equilibrium for all candidate tax-and-transfer configurations at once, so that simulating any $(\vartheta_{\mathrm{tax}}, \omega)$ requires only a forward pass. The network architecture, optimizer schedule, and training-pool design follow Kübler et al. (2026) verbatim; the exact configuration is documented in the companion repository linked at the end of this section.

11.12.2.2Step 2: GP surrogate.¶

At each design point $\vartheta = (\vartheta_{\mathrm{tax}}, \omega)$ , the trained DEQN is simulated to obtain Monte-Carlo estimates of expected lifetime utility for the 40 tracked cohorts (12 alive at $t=0$ plus 28 future cohorts born during the planner’s 150-year horizon). Independent GPs are then fitted to map $\vartheta$ to expected aggregate welfare $\mathcal{W}(\vartheta)$ and to each of the 40 cohort welfares $\tilde{U}_t(\vartheta)$ . The design itself uses Latin-hypercube sampling augmented with Bayesian active learning: the size scales with the dimension of $\vartheta$ , with roughly 500 points sufficient for the 14-dimensional “linear-in- $E$ + transfers” specification (Section 5.3 of Kübler et al. (2026)) and roughly 800 points for the 16-dimensional “richer rule + transfers” specification (Section 5.4). Figure Figure 11.10 shows the resulting welfare surface for the two-parameter linear-in-cumulative-emissions rule, with transfer shares held at the Pareto-optimal solution: the contour exposes the low-dimensional ridge along which intercept and slope trade off cleanly, and on which the Step-3 optimizer searches.

Gaussian-process welfare surrogate over the two-dimensional tax-parameter slice (\vartheta_0, \vartheta_E) of the linear-in-cumulative-emissions rule, with transfer shares \omega held at the Pareto-optimal solution. The contour exposes the low-dimensional welfare surface on which the constrained optimizer of Eq. searches once the DEQN has amortized the equilibrium solve. Figure extracted from . — Figure 11.10:Gaussian-process welfare surrogate over the two-dimensional tax-parameter slice $(\vartheta_0, \vartheta_E)$ of the linear-in-cumulative-emissions rule, with transfer shares $\omega$ held at the Pareto-optimal solution. The contour exposes the low-dimensional welfare surface on which the constrained optimizer of Eq. (11.68) searches once the DEQN has amortized the equilibrium solve. Figure extracted from Kübler *et al.* (2026).

11.12.2.3Step 3: Constrained optimization.¶

The planner solves

\vartheta^* = \argmax_{\vartheta = (\vartheta_{\mathrm{tax}}, \omega)}\;\mathcal{W}(\vartheta) \qquad \text{s.t.}\quad \tilde{U}_t(\vartheta) \geq U_t \;\;\forall\, t,\;\; \omega \in \Delta^{A-1},

(11.68)

where $U_t$ is the business-as-usual (BAU) welfare of cohort $t$ and $\Delta^{A-1}$ is the standard simplex on $A=12$ shares. The Pareto constraint ensures that no generation is worse off; whenever the welfare-maximizing $\vartheta^\ast$ lies strictly inside the feasible polytope (which is the case in every scenario reported below) it is also strictly Pareto-improving for at least one cohort, so the weak constraint $\tilde U_t \ge U_t$ and the textbook strict-improvement requirement coincide at the optimum. Because each evaluation of $\mathcal{W}$ and $\tilde U_t$ is a forward pass through the trained GP rather than a fresh DEQN simulation, the constrained search reduces to a sequence of small SLSQP problems (the paper uses 500 random restarts of scipy.optimize.minimize) that complete in seconds. By contrast, replacing the surrogate with brute-force re-solves of the full SOLG IAM at every candidate $\vartheta$ would require on the order of tens of thousands of core-hours per candidate, which is the comparison the paper draws against traditional methods.

11.12.3Results: Why Transfers Matter¶

The unconstrained welfare-maximizing cumulative-emissions tax is the natural benchmark. With a linear rule $p^{\mathrm{tax}}_t = \vartheta_0 + \vartheta_E\,E_t$ and a fixed declining transfer scheme $\omega = \bar\omega$ , the policy cuts emissions aggressively, stabilizes mean warming around $2.7\,{}^{\circ}\mathrm C$ , and raises aggregate social welfare by about $1.6\%$ in consumption-equivalent terms. But it imposes losses of up to roughly $5\%$ on initial generations: it is therefore welfare-improving in the social-welfare-function sense, but not Pareto improving. Figure Figure 11.11 shows the failure: the welfare-gains panel records the losses for transition generations that the social-welfare-function aggregate hides.

Welfare-improving but not Pareto-improving cumulative-emissions tax with a fixed exogenous transfer scheme. The policy strongly reduces climate risk and raises aggregate welfare by about 1.6\% in consumption-equivalent terms, but the welfare-gains panel shows losses for transition generations. Figure extracted from . — Figure 11.11:Welfare-improving but not Pareto-improving cumulative-emissions tax with a fixed exogenous transfer scheme. The policy strongly reduces climate risk and raises aggregate welfare by about $1.6\%$ in consumption-equivalent terms, but the welfare-gains panel shows losses for transition generations. Figure extracted from Kübler *et al.* (2026).

Endogenizing the transfer shares changes the conclusion. With the same simple tax base and an optimized transfer simplex, Kübler et al. (2026) report the optimized coefficients^[4]

p^{\mathrm{tax}}_t = \vartheta_0 + \vartheta_E\,E_t, \qquad (\vartheta_0,\, \vartheta_E) = (-0.186,\, 0.225),

(11.69)

together with transfer shares

\omega = (0.128,\, 0.051,\, 0.058,\, 0.089,\, 0.149,\, 0.090,\, 0.066,\, 0.143,\, 0.076,\, 0.048,\, 0.039,\, 0.061),

(11.70)

which sum to one up to rounding. Figure Figure 11.13 plots this transfer profile against cohort index; the non-monotone shape is what allows a less aggressive cumulative-emissions tax to satisfy the Pareto constraint at every age, and it is the single most informative graphical summary of the constrained-optimal-policy step. The negative intercept $\vartheta_0 = -0.186$ is not a subsidy in practice: the planner’s horizon starts well into the industrial era at a strictly positive cumulative-emissions stock $E_0 > 0$ , so the effective tax $\vartheta_0 + \vartheta_E\,E_t$ is positive for every relevant $E_t$ along the optimum. The negative intercept simply registers that the linear-in- $E$ rule undershoots a constant carbon price near $E = 0$ and ramps up roughly proportionally to cumulative emissions thereafter. The combined policy makes every tracked cohort weakly better off than under BAU. The aggregate welfare gain is more modest than under the unconstrained optimum, at about $0.42\%$ in consumption-equivalent terms, but the right tail of damages is truncated: the 99th percentile of damages falls to roughly $7\%$ of output rather than about $9\%$ under BAU. Figure Figure 11.12 reports the full result. Comparing its welfare-gains panel with that of Figure Figure 11.11 is the section’s headline: a lower, simpler tax combined with an optimized transfer system shifts every cohort weakly into the gains region.

Pareto-improving cumulative-emissions tax with optimized intergenerational transfers, at the coefficients of --. The tax is less aggressive than the unconstrained rule, but the optimized transfer system shields current cohorts while preserving climate-risk reduction for future cohorts. Aggregate welfare rises by about 0.42\%. Figure extracted from . — Figure 11.12:Pareto-improving cumulative-emissions tax with optimized intergenerational transfers, at the coefficients of (11.69)--(11.70). The tax is less aggressive than the unconstrained rule, but the optimized transfer system shields current cohorts while preserving climate-risk reduction for future cohorts. Aggregate welfare rises by about $0.42\%$ . Figure extracted from Kübler *et al.* (2026).

$Optimized transfer-share profile \omega_j across the 12 cohorts alive at t = 0, drawn directly from . The profile is decidedly non-monotone: the largest shares go to cohorts 1 (oldest), 5, and 8, which are precisely the cohorts the participation constraint \tilde U_t \ge U_t binds most tightly for under the un-transferred tax of Figure . The non-monotone shape is what allows a less aggressive cumulative-emissions tax to satisfy Pareto improvement at every age.$

Figure 11.13:Optimized transfer-share profile $\omega_j$ across the 12 cohorts alive at $t = 0$ , drawn directly from (11.70). The profile is decidedly non-monotone: the largest shares go to cohorts 1 (oldest), 5, and 8, which are precisely the cohorts the participation constraint $\tilde U_t \ge U_t$ binds most tightly for under the un-transferred tax of Figure Figure 11.11. The non-monotone shape is what allows a less aggressive cumulative-emissions tax to satisfy Pareto improvement at every age.

The richer rule of Section 11.12 adds carbon intensity and a tipping-state statistic,

p^{\mathrm{tax}}_t = \vartheta_0 + \vartheta_E\,E_t + \vartheta_\kappa\,\kappa_t + \vartheta_{TP}(1-D_t),

(11.71)

where $D_t$ is the climate-tipping state of the model (built from the proximity of $T^{\mathrm{AT}}_t$ to the stochastic threshold $TP_t$ and the absorbed-tipping flag). Its optimized coefficients are

(\vartheta_0,\, \vartheta_E,\, \vartheta_\kappa,\, \vartheta_{TP}) = (-0.237,\, 0.203,\, 0.037,\, 0.012),

(11.72)

with the associated aggregate welfare gain rising only from about $0.42\%$ to about $0.45\%$ . The cohort-by-cohort welfare profile (not plotted; see Kübler et al. (2026) for the figure) again keeps every cohort weakly above its BAU baseline, and the marginal welfare improvement from the extra two policy-state coefficients is small. This is the substantive headline of Kübler et al. (2026): once intergenerational transfers are optimized, the simple cumulative-emissions tax captures most of the feasible Pareto-improving welfare gain. More policy-state variables improve the fit to climate risk, but the participation constraints bind tightly enough that the marginal welfare benefit of policy complexity is small. $D_t$ is a deterministic function of variables already in the SOLG state, so it can be evaluated inside each forward pass; the exact functional form is in the paper.

11.12.3.1Runtime in numbers.¶

On a standard laptop (Apple M1), the OLG DEQN trains in roughly four wall-clock hours; on a high-end accelerator such as an NVIDIA GH200, training drops to the order of minutes Kübler et al., 2026. Adding the GP fits over 500 (resp. 800) design points and the constrained Step-3 optimization keeps the entire pipeline within the same order of magnitude, while the comparable brute-force re-solve of the SOLG model at every candidate $\vartheta$ would dominate by orders of magnitude (the paper reports tens of thousands of core-hours for one fixed-parameter calibration, which would have to be repeated for every Step-3 candidate vector).

11.12.3.2Companion code.¶

The full production OLG-IAM solver, including the DEQN training loop with $(\vartheta_{\mathrm{tax}}, \omega)$ pseudo-states and the bilevel policy search, is hosted in the companion repository sischei/JPE_Macro_Using_ML_to_compute_constrained_optimal_carbon_tax_rules, which accompanies Kübler et al. (2026). The classroom notebook in Lecture 17 of this course exposes a reduced surrogate-only version that loads pre-trained GP surrogates and reproduces the constrained-optimization step (Step 3) interactively, but does not retrain the OLG DEQN end-to-end; readers who want the full pipeline should clone the companion repository.

11.13Discussion and Outlook¶

The combination of DEQNs, pseudo-states, and GP surrogates provides a scalable and transparent framework for climate economics that overcomes key limitations of traditional methods.

11.13.0.1Comparison with traditional IAM solutions.¶

Standard IAMs (such as the GAMS implementation of DICE) rely on shooting methods or nonlinear programming solvers that find deterministic optimal paths. These approaches struggle with stochastic extensions: Monte Carlo integration over shocks is expensive, and certainty equivalence (replacing random variables with their means) misses the welfare cost of tail risks. The DEQN approach approximates the stochastic recursive solution over the chosen training distribution and state/pseudo-state domain (with Bayesian learning and recursive Epstein--Zin utility) in a single training run.

11.13.0.2Limitations.¶

Several limitations should be noted. First, the CDICE climate module, while calibrated to CMIP benchmarks, remains a reduced-form emulator and cannot capture spatial heterogeneity or regional climate impacts. Second, the OLG-IAM treats each generation as identical within a cohort; within-cohort heterogeneity (e.g., geographic exposure to climate damages) would require further extensions along the lines of Chapter Chapter 6. Third, the linear tax rules are interpretable and implementable but may leave welfare gains on the table relative to fully nonlinear rules.

11.13.0.3Extensions.¶

Active research frontiers include: multi-region IAMs with trade and carbon leakage Nordhaus & Yang, 1996; richer damage specifications including tipping cascades; endogenous technical change in abatement technology; and embedding climate modules in continuous-time heterogeneous-agent models (Chapter Chapter 8) to study the joint dynamics of climate risk and wealth inequality. The methodological toolkit developed in this course (DEQNs for equilibrium computation, PINNs for continuous-time PDEs, deep surrogates for uncertainty quantification, and Young’s method for distribution tracking) provides the computational infrastructure for these extensions.

11.13.0.4The three movements, in one synthesis.¶

Movement 1 established that solving an IAM by DEQN requires three modifications relative to the stationary toolkit of Chapter Chapter 2: time enters as a state, the training pool is built by simulating $K$ forward trajectories from a calibrated initial state rather than by sampling an ergodic distribution, and the missing transversality is absorbed numerically by choosing the horizon $T_{\max}$ long enough that discounting suffices (or, on short horizons, by adding an explicit terminal residual). Movement 2 put that algorithm to work on a worked stochastic DICE economy, producing the eight-residual loss whose minimization delivers the deterministic policy and, with one extra Gauss--Hermite layer, the AR(1) SCC fan chart. Movement 3 layered four extensions onto the same spine: Bayesian learning over the climate sensitivity, recursive Epstein--Zin preferences, global UQ of the SCC via pseudo-states and GP surrogates, and constrained Pareto-improving carbon-tax design in a heterogeneous-agent OLG-IAM. Chapter Chapter 12 threads these into the broader synthesis with the rest of the course.

Remark 11.7

Climate-economy IAMs are the natural showcase for the methodological stack: a moderately high-dimensional non-stationary DSGE solved by DEQN, a GP+BAL surrogate for SCC sensitivity, and Sobol/Shapley decomposition for deep uncertainty quantification.
DICE (and the CDICE recalibration) provide the workhorse model; Exercise 11.3 asks the reader to use the closed-form ACE expression of Traeger (2023) as an analytic benchmark for the DEQN-trained CDICE solution.
Pareto-improving carbon-tax rules Kübler et al., 2026 demonstrate that the surrogate machinery has direct policy relevance: linear, intergenerationally fair, implementable rules can be designed via constrained optimization on the surrogate.
Deep UQ matters because the SCC distribution under climate, damage, and preference uncertainty is wider than any pointwise estimate suggests; reporting the distribution rather than a number is the responsible default.

11.14Further Reading¶

Nordhaus (2017), the canonical DICE update.
Folini et al. (2025), the CDICE recalibration used in the deep-learning solution.
Traeger (2023), the analytic ACE benchmark.
Dietz (2024) Fernández-Villaverde et al. (2025)Ploeg & Rezai (2026), recent surveys on climate macroeconomics.
Kübler et al. (2026), Pareto-improving carbon-tax design.
Friedl et al. (2023), deep uncertainty quantification methodology.

11.15Exercises¶

Worked solutions and guidance for these exercises appear in Appendix Appendix F.

Exercise 11.8

[Advanced/project] Tipping-point regime-switching damages. This exercise temporarily switches from the additive damage-fraction convention $\Omega^N(T) = \pi_1 T + \pi_2 T^2$ used in the chapter body and in notebook 02 (Eq. (11.19)) to the multiplicative retained-output convention $\Omega^{\mathrm{ret}}(T) = 1/(1 + \pi_2 T^2)$ , the form discussed in Section 11.2.7 as an alternative, so that the regime-switching modification below has a single multiplicative knob. First make this substitution in notebook 02_DICE_DEQN_Library_Port.ipynb (which ships with the additive form $\Omega^N$ ) and re-calibrate $\pi_2$ so that the retained-output form matches the additive baseline at $T = 2.5\,{}^\circ\mathrm{C}$ , i.e. choose $\pi_2$ such that $1 - 1/(1+\pi_2 T^2) = \pi_2^{N} T^2$ at $T = 2.5\,{}^\circ\mathrm{C}$ with $\pi_2^N = 0.00236$ from Table Table 11.2. Then replace the smooth retained-output factor $\Omega^{\mathrm{ret}}(T) = 1/(1 + \pi_2 T^2)$ with a regime-switching specification. At each step, with hazard rate $\lambda_\mathrm{TP}(T) = \lambda_0 + \lambda_1\max(0, T - T_\mathrm{thresh})$ , an irreversible tipping event occurs. If the event has occurred, multiply the damage term in the denominator by $D_\mathrm{TP}=1.5$ , so retained output becomes $\Omega^{\mathrm{TP}}(T)=1/(1+D_\mathrm{TP}\pi_2T^2)$ . Calibrate $\lambda_0 = 0.001$ , $\lambda_1 = 0.05$ , $T_\mathrm{thresh} = 2.0\,{}^\circ\mathrm{C}$ . Retrain the DEQN solver and report (i) SCC at $t = 0$ under the regime-switching specification vs. the smooth baseline; (ii) the time path of optimal abatement $\mu_t$ in both cases; (iii) the unconditional probability of a tipping event by 2100. Sweep $T_\mathrm{thresh}$ over $\{1.5, 2.0, 2.5\}\,{}^\circ\mathrm{C}$ and plot the SCC against the threshold. Discuss the policy implications: a lower threshold raises the SCC by what factor, and what does this imply for near-term tax design under deep uncertainty about $T_\mathrm{thresh}$ itself?

Exercise 11.9

[Advanced/project] Real options value of waiting. Consider a stylized two-period climate--policy decision. At $t = 0$ the planner does not know the equilibrium climate sensitivity $\mathrm{ECS}$ , with prior $\mathrm{ECS} \sim \mathcal{U}([\mathrm{ECS}_L, \mathrm{ECS}_H])$ where $\mathrm{ECS}_L = 2$ , $\mathrm{ECS}_H = 5$ $^\circ\mathrm{C}$ . Two choices: (a) act now, abate at level $\mu \in [0,1]$ with cost $\Theta(\mu) = \theta\mu^2$ ; (b) wait, abate at $t=1$ after observing a noisy signal $\widehat{\mathrm{ECS}} = \mathrm{ECS} + \varepsilon$ , $\varepsilon \sim \mathcal{N}(0, \sigma_\varepsilon^2)$ . Damages are $D(\mu, \mathrm{ECS}) = \alpha\,\mathrm{ECS}\cdot(1 - \mu)$ . The planner minimizes expected total cost. (i) Derive closed-form expressions for the optimal $\mu^\star$ and the expected total cost under each strategy. (ii) Define the value of waiting as the cost difference: $\mathrm{VoW} = \mathbb{E}[\mathrm{cost}_\mathrm{wait}] - \mathbb{E}[\mathrm{cost}_\mathrm{now}]$ . Show that as the signal becomes informative ( $\sigma_\varepsilon \to 0$ ), $\mathrm{VoW}$ becomes negative (waiting is preferred): more information allows better decisions. (iii) Now add an irreversibility wedge $\eta\,\mu_1^2$ paid only on the wait branch, penalizing deferred action (e.g., capital stock accumulates carbon faster while waiting, so the wait-period abatement is more costly than an equivalent abatement at $t = 0$ ). Show that for sufficiently large $\eta$ , $\mathrm{VoW}$ is positive (act now is preferred), even with substantial learning. Connect this trade-off to the Bayesian-learning section of the chapter and to the broader literature on real options in climate policy.

Exercise 11.10

**{ref}sec-nsdeqn_algo on a one-dimensional non-stationary problem. The toy economy: a planner picks $u_t \in \mathbb R$ to minimise $\sum_{t=0}^{T_{\max}-1} \bigl[(x_t - x^\ast_t)^2 + r u_t^2\bigr] + \lambda_T (x_{T_{\max}} - x^\ast_{T_{\max}})^2$ subject to the linear-Gaussian law of motion $x_{t+1} = \alpha\,x_t + u_t + g_t + \sigma\,\varepsilon_{t+1}$ , $\varepsilon_{t+1}\sim\mathcal N(0,1)$ , with deterministic drift $g_t = g_0 + g_1 t$ and a calendar-time target path $x^\ast_t = a + b t$ . Set $T_{\max} = 50$ , $\alpha = 0.95$ , $g_0 = 0.02$ , $g_1 = 0.001$ , $r = 0.1$ , $\sigma = 0.05$ , $\lambda_T = 5$ , $a = 0$ , $b = 0.04$ . (i) Derive the closed-form LQ-Riccati policy and state your classical baseline before reaching for the DEQN. (ii) Train a small neural network $u_t = \mathcal N_\rho(x_t, \tau_t)$ with $\tau_t = 1 - \exp(-\vartheta\,t)$ , sampling initial states from a uniform prior on $[a - 1, a + 1]$ , stratifying the mini-batch over ten calendar-time bins of $[0, T_{\max}]$ , and adding the terminal penalty $\lambda_T (x_{T_{\max}} - x^\ast_{T_{\max}})^2$ to the loss. (iii) Verify that the trained network reproduces the LQ-Riccati baseline within 1% in the mean-squared-action metric. (iv) Ablate each of the three modifications in turn (drop $\tau_t$ from the input; remove stratification; remove the terminal penalty) and report which ablation hurts most as a function of $T_{\max}$ .

Footnotes¶

The numerical claims in this paragraph quote the headline results of Friedl et al. (2023); consult that paper for the precise figures and the underlying calibration grid.
↩
Notation note. We use $\psi$ for the IES and $\gamma_u$ for risk aversion throughout this section, following Friedl et al. (2023). The IRBC chapter (Chapter Chapter 3) used $\gamma$ for the IES under the bundled CRRA-IES convention; here in the Epstein--Zin block we deliberately decouple the two parameters, so the symbol switch is intentional. The CRRA limit is recovered at $\gamma_u = 1/\psi$ .
↩
Variance-share ranges quoted from Friedl et al. (2023); the spread reflects different points along the planner’s horizon and different damage-function specifications.
↩
All numerical coefficients, welfare gains, and damage-quantile values in this subsection are quoted from Kübler et al. (2026); consult the paper for the source tables and figures.
↩

References¶

Folini, D., Friedl, A., Kübler, F., & Scheidegger, S. (2025). The Climate in Climate Economics. The Review of Economic Studies, 92(1), 299–338. 10.1093/restud/rdae011
Friedl, A., Kübler, F., Scheidegger, S., & Usui, T. (2023). Deep Uncertainty Quantification: With an Application to Integrated Assessment Models.
Kübler, F., Scheidegger, S., & Surbek, O. (2026). Using Machine Learning to Compute Constrained Optimal Carbon Tax Rules. Journal of Political Economy: Macroeconomics.
Hassler, J., Krusell, P., & Smith Jr, A. A. (2016). Environmental macroeconomics. In Handbook of macroeconomics (Vol. 2, pp. 1893–2008). Elsevier.
Dietz, S. (2024). Chapter 1 - Introduction to integrated assessment modeling of climate change (L. Barrage & S. Hsiang, Eds.; Vol. 1, pp. 1–51). North-Holland. https://doi.org/10.1016/bs.hesecc.2024.10.002
Fernández-Villaverde, J., Gillingham, K. T., & Scheidegger, S. (2025). Climate Change Through the Lens of Macroeconomic Modeling. Annual Review of Economics, 17, 125–150. https://doi.org/10.1146/annurev-economics-091124-045357
van der Ploeg, F., & Rezai, A. (2026). Climate Change, Climate Policy, and the Macroeconomy (CEPR Discussion Paper No. No. 21153). CEPR Press. https://cepr.org/publications/dp21153
Golosov, M., Hassler, J., Krusell, P., & Tsyvinski, A. (2014). Optimal taxes on fossil fuel in general equilibrium. Econometrica, 82(1), 41–88.
Cai, Y., & Lontzek, T. S. (2019). The Social Cost of Carbon with Economic and Climate Risks. Journal of Political Economy, 127(6), 2684–2734. 10.1086/701890
Nordhaus, W. D. (2017). Revisiting the Social Cost of Carbon. Proceedings of the National Academy of Sciences, 114(7), 1518 LP – 1523. 10.1073/pnas.1609244114
Nordhaus, W. D., & Yang, Z. (1996). A regional dynamic general-equilibrium model of alternative climate-change strategies. The American Economic Review, 741–765.
Traeger, C. P. (2023). ACE — Analytic Climate Economy. American Economic Journal: Economic Policy, 15(3), 372–406. 10.1257/pol.20210297
Nordhaus, W. D. (1994). Managing the global commons: the economics of climate change. MIT press Cambridge, MA.
Nordhaus, W. D. (2008). A Question of Balance: Weighing the Options on Global Warming Policies. Yale University Press, New Haven, CT.
Roe, G. H., & Baker, M. B. (2007). Why is climate sensitivity so unpredictable? Science, 318(5850), 629–632.