Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5 Overlapping Generations Models with DEQNs

University of Lausanne

In Chapters Chapter 2--Chapter 3 all agents were infinitely lived. We now extend the DEQN framework to overlapping generations (OLG) models Diamond, 1965, where AA finitely-lived cohorts coexist in every period. OLG models introduce lifecycle savings, intergenerational transfers, age-dependent heterogeneity, and inequality constraints on portfolio choices, phenomena that are central to fiscal policy analysis, pension reform, and demographic modeling. We proceed in two stages. We first solve a deliberately small 6-agent OLG that admits a closed-form solution Krueger & Kubler, 2004, which gives a clean ground truth against which to validate the neural-network solver. We then scale up to the 56-agent research benchmark of Azinovic et al. (2022), where the no-short-sale-of-capital constraint kh0k'^h\ge 0 binds on a non-trivial slice of the ergodic set; that constraint introduces a kink, the main new computational challenge of the benchmark, and we handle it by combining softplus output activations (for non-negativity) with squared product residuals for the orthogonality conditions in the loss. The model also carries a collateral constraint kh+κbh0k'^h+\kappa\,b'^h\ge 0 that the current notebook parameterization of q^h\hat q^h keeps slack on the learned ergodic set; we develop both constraints below so that the architecture is in place when a future calibration makes the collateral side bind.

5.1Why Overlapping Generations?

In the Brock--Mirman and IRBC models of Chapters Chapter 2--Chapter 3, all agents are infinitely lived. Picture instead a photograph of the economy taken at a single instant: it contains a twenty-something just entering the workforce with no savings, a forty-something at peak earnings putting money aside, and a retiree drawing down a lifetime of accumulated wealth, all making decisions in the same period and all linked through the prices that their collective saving determines. The infinitely-lived-agent assumption collapses this picture and rules out several economically important phenomena:

An OLG economy consists of AA cohorts that coexist in each period: a new cohort of age 1 is born, the oldest cohort of age AA dies, and everyone else ages by one period. Crucially, the number of agent types is finite, so the cross-sectional distribution has only AA entries and the state space remains finite-dimensional, in contrast to the continuum-of-agents models treated in Chapter Chapter 6. The mechanism that ties the three phenomena above together is consumption smoothing over a hump-shaped earnings path (Figure Figure 5.1): because labor income rises and then falls over the lifecycle while agents prefer a steady consumption stream, they accumulate assets in their high-earning years and run them down afterwards, and the equilibrium interest rate is whatever clears the resulting demand for savings against the economy’s capital stock.

Stylized lifecycle profiles in an OLG economy (schematic, not a solution of the model). Labor income (blue) is hump-shaped, peaking in mid-career, while agents prefer a roughly flat consumption path (green); so they accumulate assets out of income during their high-earning years and run them down near the end of life. The asset profile (red, dashed) is therefore a hump that starts near zero for the newborn cohort, peaks toward the end of working life, and returns to zero for the oldest cohort, which consumes everything. The 6-agent analytic model of  is a stripped-down version of this picture (only the youngest cohort earns labor income); the 56-agent benchmark of  reproduces the full hump.

Figure 5.1:Stylized lifecycle profiles in an OLG economy (schematic, not a solution of the model). Labor income (blue) is hump-shaped, peaking in mid-career, while agents prefer a roughly flat consumption path (green); so they accumulate assets out of income during their high-earning years and run them down near the end of life. The asset profile (red, dashed) is therefore a hump that starts near zero for the newborn cohort, peaks toward the end of working life, and returns to zero for the oldest cohort, which consumes everything. The 6-agent analytic model of Section 5.2 is a stripped-down version of this picture (only the youngest cohort earns labor income); the 56-agent benchmark of Section 5.5 reproduces the full hump.

We develop the OLG framework in two stages. Section Section 5.2 works through the 6-agent model with a closed-form solution, maps it to a DEQN (Section 5.3), and validates the trained network against the analytical savings rates; Section 5.4 then explains how binding borrowing and collateral constraints are encoded, and Section 5.5 solves the 56-agent research benchmark with exactly the same training loop.

5.2The 6-Agent Analytic OLG Model

Krueger & Kubler (2004) proposed a deliberately simple OLG model with a closed-form solution, making it an ideal validation benchmark for the DEQN approach. We develop it here as the first of the two OLG instances of this chapter; Section 5.3 maps it to a DEQN and validates the trained network against the closed form derived below.

We instantiate the OLG environment with A=6A=6 overlapping cohorts, indexed by age h{1,,6}h \in \{1,\ldots,6\}. Time is discrete and infinite. The model equations below are written for general AA and specialized to A=6A=6 in the calibration that follows.

5.2.1Household problem.

An agent of age hh at time tt maximizes expected lifetime utility:

max{ct+jh+j,kt+j+1h+j+1}j=0Ah  Et ⁣[j=0Ahβju(ct+jh+j)],\max_{\{c_{t+j}^{h+j},\, k_{t+j+1}^{h+j+1}\}_{j=0}^{A-h}} \;\mathbb{E}_t\!\left[\sum_{j=0}^{A-h} \beta^j\, u(c_{t+j}^{h+j})\right],

subject to the period budget constraint

cth+kt+1h+1=rtkth+wthincth,c_t^h + k_{t+1}^{h+1} = r_t \, k_t^h + w_t \, \ell^h \equiv \mathrm{inc}_t^h,

where kthk_t^h denotes capital holdings, rtr_t is the gross return on capital, wtw_t is the wage, h\ell^h is an age-dependent labor endowment, and incth\mathrm{inc}_t^h is total income.

5.2.2Boundary conditions.

5.2.3Euler equations.

The first-order conditions yield A1A-1 Euler equations (for ages h=1,,A1h=1,\ldots,A-1):

u(cth)=βEt ⁣[rt+1u(ct+1h+1)].u'(c_t^h) = \beta\,\mathbb{E}_t\!\left[r_{t+1}\, u'(c_{t+1}^{h+1})\right].

5.2.4Firm problem and market clearing.

A representative firm operates a Cobb--Douglas technology with value added Ft=ηtKtαLt1αF_t = \eta_t K_t^\alpha L_t^{1-\alpha}, where ηt\eta_t is a TFP shock and Lt=h=1AhL_t = \sum_{h=1}^A \ell^h; the gross resource available to households is Yt=Ft+(1δt)Kt=rtKt+wtLtY_t = F_t + (1-\delta_t)K_t = r_t K_t + w_t L_t (it is YtY_t, not FtF_t, that the notebook passes as an engineered feature). Competitive factor markets imply:

rt=αηtKtα1Lt1α+(1δt),wt=(1α)ηtKtαLtα,r_t = \alpha\,\eta_t\, K_t^{\alpha-1}L_t^{1-\alpha} + (1-\delta_t), \qquad w_t = (1-\alpha)\,\eta_t\, K_t^\alpha L_t^{-\alpha},

where δt\delta_t is the depreciation rate (potentially stochastic). Market clearing requires that aggregate capital at t+1t+1 is the sum of holdings across cohorts:

h=2Akt+1h=Kt+1,\sum_{h=2}^{A} k_{t+1}^{h} = K_{t+1},

with kt+11=0k_{t+1}^{1}=0 as a newborn boundary condition (cohort 1 enters life with no assets), and where kt+1hk_{t+1}^{h} for h=2,,Ah=2,\dots,A is the savings of cohort h1h-1 at date tt (which becomes the date-(t+1)(t+1) holdings of the cohort once it has aged by one period).

5.2.5Calibration.

The model has A=6A=6 agents with log utility (γ=1\gamma = 1), Cobb--Douglas production (α=0.3\alpha = 0.3), and discount factor β=0.7\beta = 0.7. Only agent 1 works (=(1,0,0,0,0,0)\ell = (1,0,0,0,0,0)); this stripped-down labor profile is what gives the closed form below, not a realistic lifecycle assumption, and the 56-agent benchmark of Section 5.5 restores a hump-shaped endowment. Four exogenous shock states combine TFP η{0.95,1.05}\eta \in \{0.95,1.05\} and depreciation δ{0.5,0.9}\delta \in \{0.5,0.9\}, with i.i.d. transitions (πss=0.25\pi_{ss'} = 0.25).

5.2.6Analytical solution.

With log utility and i.i.d. shocks, the optimal savings rate has a closed form. Define the age-dependent savings rate:

βh=β1βAh1βAh+1,h=1,,A1.\beta_h = \beta \cdot \frac{1 - \beta^{A-h}}{1 - \beta^{A-h+1}}, \qquad h = 1, \ldots, A-1.

The optimal policy is then kh=βhinchk'^h = \beta_h \cdot \mathrm{inc}^h: each agent saves a fixed fraction of total income, regardless of the shock. Two features of the calibration drive this clean form. First, under log utility the income and substitution effects of a return shock exactly cancel, so the savings rate is invariant to (rt,wt)(r_t, w_t). Second, because the shocks are i.i.d. there is nothing about the future to forecast, so the rate does not depend on the current shock either; only the horizon matters. The fraction βh\beta_h therefore declines with age: cohort hh has only AhA-h remaining periods over which to spread its future income, so the marginal incentive to carry resources forward weakens as hh grows. For A=6A=6, β=0.7\beta=0.7, Table Table 5.1 reports the resulting savings rates.

Table 5.1:Closed-form age-specific savings rates in the 6-agent analytic OLG with log utility and β=0.7\beta=0.7.

Age hh12345
βh\beta_h0.6600.6390.6050.5430.412

Young agents save more (more periods ahead); old agents save less; Figure Figure 5.2 plots the same numbers across hh. This vector is the validation target: at convergence, the trained network’s average sigmoid output should reproduce βh\beta_h cohort by cohort.

Closed-form savings rates \beta_h from Table  for the 6-agent analytic OLG (\beta=0.7, log utility). The monotone decline with age reflects the shrinking forward horizon: cohort h has only A-h remaining periods over which to consume future income, so the marginal incentive to save weakens as h grows. This is the validation target the trained DEQN’s average sigmoid output should match cohort by cohort.

Figure 5.2:Closed-form savings rates βh\beta_h from Table Table 5.1 for the 6-agent analytic OLG (β=0.7\beta=0.7, log utility). The monotone decline with age reflects the shrinking forward horizon: cohort hh has only AhA-h remaining periods over which to consume future income, so the marginal incentive to save weakens as hh grows. This is the validation target the trained DEQN’s average sigmoid output should match cohort by cohort.

5.3Mapping the Analytic OLG to a DEQN

The mapping follows the same “states \to network \to loss” structure as Brock--Mirman (Chapter Chapter 2). We now write each block explicitly for the 6-agent analytic model just set up; this is exactly what slides II.7--II.9 of lectures/lecture_08_olg_models_deqns/slides/lecture_08_olg_models_deqns.tex render in pictures. The 56-agent benchmark of Section 5.5 extends the same template with two extra policy blocks (multipliers, bond price) and an additional market-clearing residual; we write that version out there.

5.3.1State xt\x_t entering the network.

What does the network actually need to know? The informational state of the analytic model is just the pair

(zt,kt)    {1,,4}×RAwherekt=(kt1,,ktA),\bigl(z_t,\,\bm{k}_t\bigr) \;\in\; \{1,\ldots,4\}\times\R^A \qquad\text{where}\qquad \bm{k}_t = (k_t^1,\ldots,k_t^A),

the current shock index plus the cross-sectional capital distribution. This is the minimal vector that pins down the equilibrium, and it is what slide II.8 displays in the FREE signature. Everything else, the aggregate capital Kt=hkthK_t=\sum_h k_t^h, the prices (rt,wt)(r_t,w_t), output YtY_t, each cohort’s income, the row of next-period transition probabilities, is a deterministic function of (zt,kt)(z_t,\bm{k}_t). The network could in principle re-derive all of it from the raw pair, but there is no reason to make it: we hand the network those quantities pre-computed, which is a pure change of input coordinates that leaves the equilibrium map untouched and frees the network’s capacity for the one genuinely hard thing it has to learn, the savings policy. Concretely the notebook feeds an extended state of dimension 16+4A16+4A,

xt  =  (zt,1{zt},ηt,δt,Kt,Lt,rt,wt,Yt12 aggregate,  kt1:A,fwt1:A,linct1:A,inct1:A4A per-agent,  π(zt,)4 transition probs)    R16+4A,\x_t \;=\; \bigl(\,\underbrace{z_t,\,\mathbf{1}\{z_t\},\,\eta_t,\delta_t,K_t,L_t,r_t,w_t,Y_t}_{12\text{ aggregate}},\; \underbrace{k_t^{1:A},\,\mathrm{fw}_t^{1:A},\,\mathrm{linc}_t^{1:A},\,\mathrm{inc}_t^{1:A}}_{4A\text{ per-agent}},\; \underbrace{\pi(z_t,\cdot)}_{4\text{ transition probs}}\bigr) \;\in\; \R^{16+4A},

with 1{zt}\mathbf{1}\{z_t\} the 4-state one-hot of the current shock, Kt=hkthK_t=\sum_h k_t^h, Lt=hhL_t=\sum_h \ell^h, (rt,wt)(r_t,w_t) from (5.4), YtY_t the gross resource ηtKtαLt1α+(1δt)Kt\eta_t K_t^\alpha L_t^{1-\alpha} + (1-\delta_t)K_t, and the per-agent blocks fwth=rtkth\mathrm{fw}_t^h = r_t k_t^h (capital income), lincth=wth\mathrm{linc}_t^h = w_t\,\ell^h (labor income), incth=fwth+lincth\mathrm{inc}_t^h = \mathrm{fw}_t^h + \mathrm{linc}_t^h (total income). Since the map (zt,kt)xt(z_t,\bm{k}_t)\mapsto\x_t is deterministic, (5.7) and (5.8) carry exactly the same information. For A=6A=6 this is 16+46=4016+4\cdot 6 = 40 inputs (the notebook constant FEATURE_DIM).

5.3.2Policies approximated by the network.

A single multilayer perceptron with a sigmoid savings-fraction output head approximates the equilibrium policy as a function of the state. (Throughout this OLG chapter we use θ\theta for the network parameters rather than the ρ\rho of Chapters Chapter 2--Chapter 3; both refer to the same object, and the switch follows the convention of the public OLG reference implementation.)

  Nθ:  R16+4A    RA1,Nθ(xt)  =  (β^1(xt),,β^A1(xt)),a^h(xt):=β^h(xt)incth  \boxed{\;\mathcal{N}_\theta:\;\R^{16+4A} \;\longrightarrow\; \R^{A-1},\qquad \mathcal{N}_\theta(\x_t) \;=\; \bigl(\hat\beta^1(\x_t),\ldots,\hat\beta^{A-1}(\x_t)\bigr),\qquad \hat a^h(\x_t) := \hat\beta^h(\x_t)\,\mathrm{inc}_t^h\;}

where the network output β^h(xt)(0,1)\hat\beta^h(\x_t)\in(0,1) is cohort hh’s savings rate and a^h(xt)\hat a^h(\x_t) its savings level (slide II.9, output column). This parameterization mirrors the closed-form solution’s structure (each cohort saves a fixed fraction of income, Eq. (5.6)). Cohort AA saves nothing by terminal boundary, so the network has A1A-1 outputs rather than AA. Three by-construction guarantees follow:

5.3.3Equilibrium residual.

Each cohort h{1,,A1}h\in\{1,\ldots,A-1\} contributes one relative Euler-equation residual, built from three quantities. First, the implied current consumption, read off from the budget (5.2) as c^h(xt):=inctha^h(xt)\hat c^h(\x_t) := \mathrm{inc}_t^h - \hat a^h(\x_t). Second, the next-state map Φ\Phi, which combines the current policy with a fresh shock zt+1z_{t+1} to produce next period’s extended state x^t,+=Φ(xt,zt+1;θ)\hat\x_{t,+}=\Phi(\x_t,z_{t+1};\theta) (the construction of Φ\Phi is spelled out in the next paragraph). Third, the implied next-period consumption c^h+1(x^t,+)\hat c^{h+1}(\hat\x_{t,+}) of the cohort that has just aged from hh to h+1h+1. The relative Euler-equation residual is then

eREEh(xt)  :=  (u)1 ⁣(βE ⁣[r(x^t,+)u ⁣(c^h+1(x^t,+))])c^h(xt)    1,h=1,,A1,e_{\mathrm{REE}}^h(\x_t) \;:=\; \frac{(u')^{-1}\!\Bigl(\beta\,\E{\,r(\hat\x_{t,+})\,u'\!\bigl(\hat c^{h+1}(\hat\x_{t,+})\bigr)\,}\Bigr)}{\hat c^h(\x_t)} \;-\; 1, \qquad h=1,\ldots,A-1,

with u(c)=lncu(c)=\ln c in the analytic model so (u)1(y)=1/y(u')^{-1}(y) = 1/y. Equation (5.10) is the unit-free residual of the standard Euler equation (5.3): a value of 10-3 means cohort hh’s implied consumption is mispriced by 0.1%0.1\% relative to the conditional certainty equivalent. This is the residual displayed in slide II.7.

5.3.4Sampling the conditional expectation.

The expectation in (5.10) is over the next-period shock zt+1z_{t+1}. Because the analytic-model shock has only four states with i.i.d. transition πss=1/4\pi_{ss'} = 1/4, the expectation is computed exactly (no Monte Carlo) by summing over the four next-period shocks: E ⁣[r(x^t,+)u(c^h+1)]=14s=14r(Φ(xt,s;θ))u ⁣(c^h+1(Φ(xt,s;θ))).\E{\,r(\hat\x_{t,+})\,u'(\hat c^{h+1})\,} = \tfrac14\,\sum_{s'=1}^{4} r(\Phi(\x_t,s';\theta))\,u'\!\bigl(\hat c^{h+1}(\Phi(\x_t,s';\theta))\bigr). For each candidate ss' the next-state map Φ\Phi ages the cross-section by one period, sets the newborn to k1=0k^1=0, evaluates the firm prices (5.4) at Kt+1K_{t+1}, and produces the next-period extended state x^t,+\hat\x_{t,+} on which the network is evaluated again to obtain a^h(x^t,+)\hat a^h(\hat\x_{t,+}) and hence c^h+1(x^t,+)\hat c^{h+1}(\hat\x_{t,+}). When the shock has more states or is continuous, the same construction is replaced by a sample of zt+1π(zt)z_{t+1}\sim\pi(\cdot|z_t) inside the mini-batch (see Section 5.5).

5.3.5The DEQN loss for the analytic OLG.

Given a mini-batch Dtrain{xj}j=1ND_{\mathrm{train}}\subset\{\x_j\}_{j=1}^{N} sampled from the ergodic set of the current policy, the loss is the mean-squared relative Euler residual averaged across cohorts and states:

  LDtrain(θ)  =  1Dtrain  1A1xjDtrain  h=1A1(eREEh(xj))2  \boxed{\;\mathcal{L}_{D_{\mathrm{train}}}(\theta) \;=\; \frac{1}{|D_{\mathrm{train}}|}\;\frac{1}{A-1}\, \sum_{\x_j\in D_{\mathrm{train}}}\;\sum_{h=1}^{A-1} \bigl(e_{\mathrm{REE}}^h(\x_j)\bigr)^2\;}

(matching slide II.7). Two small barrier-style additive penalties on rescaled negative-consumption and negative-aggregate-capital hinges are summed in alongside (5.11) to keep training numerically robust away from convergence; in the notebook they carry the weight PENALTY_WEIGHT =10=10 and act on terms such as max(c^h,0)/(1+c^h)\max(-\hat c^h,0)/(1+|\hat c^h|) rather than the raw squared hinge. With the sigmoid savings-fraction head described above these hinges are in fact identically zero (savings stay in [0,inch][0,\mathrm{inc}^h], so c^h0\hat c^h\ge 0 and Kt+10K_{t+1}\ge 0 always), so the penalties are pure backstops and do not bias the solution.[1]

5.3.6DEQN architecture and training.

The network takes a 40-dimensional input (the extended state (5.8), 16+4×616 + 4 \times 6) and outputs 5 savings rates β^h\hat\beta^h via a 4010050540 \to 100 \to 50 \to 5 architecture with ReLU hidden layers and a sigmoid savings-fraction output (9,400\approx 9{,}400 parameters). Training uses the episode-based procedure from Chapter Chapter 2: the current network generates a capital path (episode), equilibrium residuals are computed and used for SGD updates, and a new episode is simulated periodically. The companion notebook exposes a RUN_MODE switch with three calibrated budgets: "smoke" (\sim25 training segments, \sim30 s on CPU; a code-path sanity check, well short of convergence), "teaching" (\sim500 segments, \sim5 min on CPU; savings rates match the closed form to a few parts in 104 and mean relative Euler errors are already 103\sim 10^{-3} on the simulated cloud, though larger off-trajectory), and "production" (\sim10,000 segments with longer trajectories, several hours on CPU; mean Euler errors 103\sim 10^{-3} or below, matching Table 3 of Azinovic et al. (2022)). Adam is used throughout (learning rate 3×104\sim 3\times 10^{-4} in the short presets, 10-5 in the production preset); the analogous decay to 10-6 used by the 56-agent benchmark (Section 5.5) is not needed at the analytic model’s scale.

5.4Inequality Constraints and KKT Complementarity

The 6-agent calibration above is deliberately frictionless: the no-short-sale-of-capital constraint never binds on its ergodic set, so we could solve it with a plain sigmoid-savings head and no multipliers. Realistic OLG economies are not so kind. The 56-agent benchmark of the next section carries a no-short-sale-of-capital constraint that binds on a non-trivial slice of states and a collateral constraint that the current notebook parameterization keeps slack on the learned ergodic set; binding inequality constraints in general bring in Karush--Kuhn--Tucker (KKT) complementarity, with its characteristic non-smooth orthogonality condition. This section sets out how the DEQN framework encodes that complementarity; the next section puts it to work.

The no-short-sale-of-capital constraint kh0k'^h \geq 0 introduces a complementarity condition via the Karush--Kuhn--Tucker (KKT) system:

kh0,λh0,khλh=0,k'^h \geq 0, \qquad \lambda^h \geq 0, \qquad k'^h \cdot \lambda^h = 0,

where λh\lambda^h is the KKT multiplier on the constraint. In a generic non-linear program, the orthogonality condition khλh=0k'^h \cdot \lambda^h = 0 is non-smooth at the origin and cannot be differentiated through naively.

The DEQN setup of Azinovic et al. (2022) sidesteps the kink by splitting enforcement across the architecture and the loss:

This product form is what the public reference implementation accompanying Azinovic et al. (2022) uses, and what we adopt in the 56-agent benchmark of Section 5.5 (Notebook lecture_08_10_OLG_Benchmark_DEQN_persistent.ipynb). As noted above, in the 6-agent analytic calibration of Section 5.2 the no-short-sale-of-capital constraint is non-binding everywhere on the ergodic set, so λh0\lambda^h \equiv 0 and the multipliers (and the KKT residual) drop out of both the network output and the loss; that is why the mapping there was the simpler Nθ:R16+4ARA1\mathcal{N}_\theta: \mathbb{R}^{16+4A} \to \mathbb{R}^{A-1} of Section 5.3 above, with no multiplier outputs. The smoother Fischer--Burmeister (FB) reformulation, Φ(a,b)=a+ba2+b2\Phi(a,b) = a + b - \sqrt{a^2 + b^2}, is an alternative used in the IRBC notebook of Chapter Chapter 3 for the investment-irreversibility constraint.

5.4.1When to choose product form vs. Fischer--Burmeister.

The product form (khλh)2(k'^h \lambda^h)^2 is simpler, gradient-cheaper, and sufficient whenever the constraint is rarely active on the ergodic set, since the optimizer just needs to verify slackness in expectation. The Fischer--Burmeister residual Φ(a,b)2\Phi(a,b)^2 keeps gradient information on both sides of the active set: when the constraint is frequently binding (e.g. the IRBC irreversibility constraint on a non-trivial fraction of states), product-form gradients vanish whenever the constraint is locally inactive, which can stall training; FB does not have this pathology. As a rule of thumb: product form for occasionally-binding KKT, FB for frequently-binding KKT. In the OLG benchmark of Section 5.5 the no-short-sale-of-capital constraint binds on a thin slice of the ergodic set, so the product form was sufficient; the IRBC application of the previous chapter binds more often and benefits from FB.

5.4.2The two OLG models we solve, side by side.

We have now built and solved the first of the two OLG instances that anchor the rest of the manuscript: the 6-agent analytic model used to validate the DEQN against a closed form (Sections Section 5.2--Section 5.3). The second is the 56-agent benchmark of Azinovic et al. (2022), developed in the next section. Table Table 5.2 summarizes the structural and computational gap between them before we turn to it.

Table 5.2:The two OLG models solved in this chapter, side by side. The economic richness of the 56-agent benchmark adds two assets, an effectively binding no-short-sale-of-capital constraint (the collateral constraint is kept slack by the q^h\hat q^h parameterization), persistent shocks, lifecycle labor, and adjustment costs, raising the network input dimension from 40 to 240 and the output dimension from 5 to 221. The DEQN training loop is structurally identical in both cases. Each variant additionally ships with a feedback-free exogenous-sampling companion notebook (lecture_08_07_OLG_Analytic_DEQN_exogenous.ipynb, lecture_08_09_OLG_Benchmark_DEQN_exogenous.ipynb) that exercises the same model under a non-co-evolving training cloud.

6-agent analytic (Section 5.2)56-agent benchmark (Section 5.5)
Cohorts AA6 (childhood-style)56 (ages 25--80, one period = one year)
UtilityLog (γ=1\gamma=1)CRRA (γ=2\gamma=2)
Shocksi.i.d. TFP & depreciation, 4 statesPersistent Markov on (η,δ)(\eta,\delta)
Labor profileOnly youngest cohort worksHump-shaped lifecycle endowment h\ell^h
AssetsCapital onlyCapital ++ bonds
ConstraintsNone binding in calibrationNo-short-sale of capital kh ⁣ ⁣0k'^h\!\ge\!0 binds; collateral kh ⁣+ ⁣κbh ⁣ ⁣0k'^h\!+\!\kappa b'^h\!\ge\!0 kept slack by the q^h\hat q^h parameterization
Adjustment costNoneQuadratic ζ2(kh ⁣ ⁣rkh)2\tfrac{\zeta}{2}(k'^h\!-\!rk^h)^2
Network input dim40 (extended; minimal 7)240 (extended; minimal 113)
Output dim5 (savings rates of cohorts 1--5)221 (4(A1)+14(A-1)+1: policies, multipliers, price)
Loss terms5 Euler ++ market clearing by construction221: 4(A1)4(A-1) Euler/KKT ++ 1 bond clearing
NetworkInput(40) \to 100 \to 50 \to 5Input(240) \to 128 \to 128 \to 221 (teaching) / Input(240) \to 1000 \to 1000 \to 221 (production)
Validation targetClosed-form βh\beta_h of Krueger & Kubler (2004)Mean Euler residual on simulated trajectory
Notebooklecture_08_08_OLG_Analytic_DEQN_persistent.ipynblecture_08_10_OLG_Benchmark_DEQN_persistent.ipynb

5.5The 56-Agent Benchmark

Table Table 5.2 above previewed the gap; we now develop the second model in full. The benchmark of Azinovic et al. (2022) scales the OLG framework to A=56A = 56 agents (ages 25--80) with several realistic features:

5.5.1Lifecycle labor endowments.

The labor endowment profile ehe^h follows Brumm et al. (2017). In the implementation used here, ehe^h is a quadratic in age that rises from 0.60 at age 25, peaks at 1.36\approx 1.36 around age 53, then decays linearly between ages 62\sim 62 and 70\sim 70 to a flat post-retirement floor of 0.64\approx 0.64. Table Table 5.3 lists the values produced by the notebook formula at a few representative ages.

Table 5.3:Representative points on the lifecycle labor-endowment profile in the 56-agent benchmark.

Age25304048536580
ehe^h0.600.851.201.341.361.040.64

This hump-shaped profile ensures realistic savings heterogeneity: young agents with low labor income and no initial wealth are borrowing-constrained, mid-career agents with high earnings accumulate both capital and bonds, and older agents gradually decumulate toward the end of life.

5.5.2Persistent aggregate shocks.

The 4-state Markov chain combines TFP η\eta and depreciation δ\delta into the pairs (η,δ){(0.978,0.08),(1.022,0.08),(0.978,0.11),(1.022,0.11)}(\eta,\delta) \in \{(0.978, 0.08),\, (1.022, 0.08),\, (0.978, 0.11),\, (1.022, 0.11)\}. The transition matrix is persistent (diagonal entries \sim0.63--0.88), in contrast to the i.i.d. shocks in the analytic model. This persistence creates richer dynamics in capital accumulation: a sequence of bad TFP draws can push young agents deep into their borrowing constraint, producing endogenous amplification that a single-period shock would not generate.

5.5.3Budget constraint.

Each agent of age hh faces:

ch+kh+pbh+Ψh=rkh+bh+weh.c^h + k'^h + p\cdot b'^h + \Psi^h = r\cdot k^h + b^h + w\cdot e^h.

The collateral constraint kh+κbh0k'^h + \kappa\,b'^h \geq 0 acts as a margin requirement: it limits bond borrowing (bh<0b'^h < 0) relative to capital holdings. Since κ=1/(1δmax)\kappa = 1/(1-\delta_{\max}), the constraint tightens when depreciation is high, precisely when agents are most likely to seek insurance through borrowing.

5.5.4State xt\x_t entering the network.

The informational state of the benchmark is the triple (zt,kt,bt){1,,4}×RA×RA(z_t,\,\bm{k}_t,\,\bm{b}_t) \in \{1,\ldots,4\}\times\R^A\times\R^A, where kt=(kt1,,ktA)\bm{k}_t = (k_t^1,\ldots,k_t^A) and bt=(bt1,,btA)\bm{b}_t = (b_t^1,\ldots,b_t^A) are the cross-sectional capital and bond distributions, so the minimal state has dimension 1+2A=1131+2A = 113. As in the analytic case, the notebook feeds the network an extended state of the same 16+4A16+4A form -- twelve aggregate scalars (shock index and its one-hot, ηt\eta_t, δt\delta_t, KtK_t, LtL_t, rtr_t, wtw_t, and the gross resource Yt=ηtKtαLt1α+(1δt)KtY_t = \eta_t K_t^\alpha L_t^{1-\alpha} + (1-\delta_t)K_t), four per-agent blocks (kthk_t^h, financial income rtkth+bthr_t k_t^h + b_t^h, labor income wtehw_t e^h, and cash rtkth+bth+wtehr_t k_t^h + b_t^h + w_t e^h -- the bond holdings bthb_t^h are recoverable from financial income and are not passed as a separate block, and the bond price p^t\hat p_t is an output, not an input), and the row of next-period transition probabilities π(zt,)\pi(z_t,\cdot) (used by the conditional-expectation block of the loss); concretely 240=12+4×56+4240 = 12 + 4\times 56 + 4 (the notebook constant FEATURE_DIM). This is the analogue of slide III.8.

5.5.5Policies approximated by the network.

A single network Nθ\mathcal{N}_\theta with softplus output produces a 4(A1)+14(A-1)+1-dimensional vector that is sliced into five economic blocks (slide III.9):

  Nθ:  R240    R4(A1)+1,Nθ(xt)  =  (k^1:A1,  λ^b1:A1,  q^1:A1,  μ^1:A1,  p^)(xt)  \boxed{\;\mathcal{N}_\theta:\;\R^{240} \;\longrightarrow\; \R^{4(A-1)+1},\qquad \mathcal{N}_\theta(\x_t) \;=\; \bigl(\hat k'^{1:A-1},\;\hat \lambda_b^{1:A-1},\;\hat q^{1:A-1},\;\hat \mu^{1:A-1},\;\hat p\bigr)(\x_t)\;}

where k^h\hat k'^h are capital savings, λ^bh\hat\lambda_b^h the no-short-sale-of-capital multipliers, q^hk^h+κb^h\hat q^h \equiv \hat k'^h + \kappa\,\hat b'^h the collateral requirement (from which bond holdings are recovered as b^h=(q^hk^h)/κ\hat b'^h = (\hat q^h - \hat k'^h)/\kappa), μ^h\hat\mu^h the collateral-constraint multipliers, and p^\hat p the equilibrium bond price. Each raw output is mapped to an admissible value: softplus for the multipliers, and a bounded-exponential map around a baseline for the positive levels. Concretely, writing zhkz^k_h, zhqz^q_h, and zpz^p for the raw network outputs, the heads are

k^h  =  kbaselinehexp(tanhzhk),q^h  =  qbaselinehexp(tanhzhq),p^  =  pbaselineexp(tanhzp),\hat k'^h \;=\; k^h_{\mathrm{baseline}}\,\exp(\tanh z^k_h), \quad \hat q^h \;=\; q^h_{\mathrm{baseline}}\,\exp(\tanh z^q_h), \quad \hat p \;=\; p_{\mathrm{baseline}}\,\exp(\tanh z^p),

so the four non-negativity inequalities k^h0\hat k'^h\ge 0, λ^bh0\hat\lambda_b^h\ge 0, q^h0\hat q^h\ge 0, μ^h0\hat\mu^h\ge 0 hold by construction, leaving the orthogonality conditions of the KKT systems to be enforced softly in the loss (next paragraph).[2] The production network uses 1000×10001000\times 1000 hidden units ( ⁣1.5\sim\!1.5M parameters); the teaching version uses 128×128128\times 128.

5.5.6Equilibrium residuals.

Each cohort h{1,,A1}h\in\{1,\ldots,A-1\} contributes four residuals, one per equilibrium condition (slide III.6). To keep the displayed form compact, introduce numerator/denominator shorthands for the two Euler conditions:

Nkh(xt):=βE ⁣[rt+1Dkh+1(x^t,+)u(c^t+1h+1)]+λ^bh+μ^h,Dkh(xt):=1+ζ(k^hrtkth),Nbh(xt):=βE ⁣[u(c^t+1h+1)]+κμ^h,Dbh(xt):=p^.\begin{aligned} \mathcal{N}^h_k(\x_t) &:= \beta\,\E{r_{t+1}\,\mathcal{D}^{h+1}_k(\hat\x_{t,+})\,u'(\hat c^{h+1}_{t+1})} + \hat\lambda_b^h + \hat\mu^h, \quad & \mathcal{D}^h_k(\x_t) &:= 1 + \zeta\bigl(\hat k'^h - r_t k_t^h\bigr), \\[2pt] \mathcal{N}^h_b(\x_t) &:= \beta\,\E{u'(\hat c^{h+1}_{t+1})} + \kappa\,\hat\mu^h, & \mathcal{D}^h_b(\x_t) &:= \hat p. \end{aligned}

Here Dkh\mathcal{D}^h_k is the marginal-adjustment-cost wedge from Ψh=ζ2(khrtkh)2\Psi^h = \tfrac{\zeta}{2}(k'^h - r_t k^h)^2: the capital Euler equation in envelope form reads u(cth)Dkh=βE ⁣[rt+1Dkh+1u(ct+1h+1)]+λbh+μhu'(c_t^h)\,\mathcal{D}^h_k = \beta\,\E{r_{t+1}\,\mathcal{D}^{h+1}_k\,u'(c_{t+1}^{h+1})} + \lambda_b^h + \mu^h, so the same wedge appears next period on the marginal return to capital (this is the factor adj_factor_next in the notebook). With ζ=0\zeta = 0 it collapses to the textbook Euler equation. The bond Euler reduces to the textbook stochastic-discount-factor form p^=βE[u(c)]/u(c)\hat p = \beta\,\mathbb{E}[u'(c')]/u'(c) only when the collateral constraint is slack (μ^h=0\hat\mu^h = 0); whenever μ^h>0\hat\mu^h > 0, the bond price carries an additional shadow-value term κμ^h/u(c^h)\kappa\hat\mu^h/u'(\hat c^h) that captures the value of relaxing the collateral constraint. The four per-cohort residuals are then

eREE,kh(xt):=(u)1(Nkh(xt)/Dkh(xt))c^h(xt)1,(Euler, k)eREE,bh(xt):=(u)1(Nbh(xt)/Dbh(xt))c^h(xt)1,(Euler, b)eKKT,bh(xt):=λ^bhk^h,(borrowing complementarity)eKKT,ch(xt):=μ^h(k^h+κb^h)=μ^hq^h.(collateral complementarity)\begin{aligned} e_{\mathrm{REE},k}^h(\x_t) &:= \frac{(u')^{-1}\bigl(\mathcal{N}^h_k(\x_t)\,/\,\mathcal{D}^h_k(\x_t)\bigr)}{\hat c^h(\x_t)} - 1, & \text{(Euler, } k \text{)}\\[2pt] e_{\mathrm{REE},b}^h(\x_t) &:= \frac{(u')^{-1}\bigl(\mathcal{N}^h_b(\x_t)\,/\,\mathcal{D}^h_b(\x_t)\bigr)}{\hat c^h(\x_t)} - 1, & \text{(Euler, } b \text{)}\\[2pt] e_{\mathrm{KKT},b}^h(\x_t) &:= \hat\lambda_b^h \cdot \hat k'^h, & \text{(borrowing complementarity)}\\ e_{\mathrm{KKT},c}^h(\x_t) &:= \hat\mu^h \cdot \bigl(\hat k'^h + \kappa\,\hat b'^h\bigr) = \hat\mu^h \cdot \hat q^h. & \text{(collateral complementarity)} \end{aligned}

On top of these per-agent residuals the bond market must clear: bonds are in zero net supply, so the residual is the cross-sectional sum of bond holdings against the target Bˉ=0\bar B = 0,

eMC,b(xt)  :=  h=1Ab^h(xt)    Bˉ  =  1κh=1A1(q^hk^h),Bˉ=0.e_{\mathrm{MC},b}(\x_t) \;:=\; \sum_{h=1}^{A} \hat b'^h(\x_t) \;-\; \bar B \;=\; \frac{1}{\kappa}\sum_{h=1}^{A-1}\bigl(\hat q^h - \hat k'^h\bigr), \qquad \bar B = 0.

Capital-market clearing Kt+1=h=2Akt+1hK_{t+1} = \sum_{h=2}^{A} k_{t+1}^h is once again satisfied by construction and does not appear as a residual. The conditional expectation in the two Euler equations is computed exactly as in (5.10): by summing over the four next-period shocks weighted by the persistent-Markov transition probabilities π(zt,)\pi(z_t,\cdot).

5.5.7The DEQN loss for the 56-agent benchmark.

Stack the four per-cohort residuals into one squared-cohort term Rh(x)2:=(eREE,kh)2+(eREE,bh)2+(eKKT,bh)2+(eKKT,ch)2,R^h(\x)^2 := (e_{\mathrm{REE},k}^h)^2 + (e_{\mathrm{REE},b}^h)^2 + (e_{\mathrm{KKT},b}^h)^2 + (e_{\mathrm{KKT},c}^h)^2, then add the bond-market-clearing residual. The mini-batch loss is

  LDtrain(θ)  =  1Dtrain  14(A1)+1xjDtrain ⁣[  h=1A1Rh(xj)2  +  (eMC,b(xj))2  ]  \boxed{\;\mathcal{L}_{D_{\mathrm{train}}}(\theta) \;=\; \frac{1}{|D_{\mathrm{train}}|}\;\frac{1}{4(A-1)+1}\, \sum_{\x_j\in D_{\mathrm{train}}}\!\Biggl[\;\sum_{h=1}^{A-1} R^h(\x_j)^2 \;+\; \bigl(e_{\mathrm{MC},b}(\x_j)\bigr)^2\;\Biggr]\;}

(matching slide III.6). With A=56A=56 this is 4×55+1=2214\times 55 + 1 = 221 squared residuals per training state. Each residual enters with weight one: no adaptive loss balancing (cf. Chapter Chapter 4) is applied because the relative-Euler convention (5.10) already homogenizes the per-cohort Euler scales, and the product-form KKT residuals are unit-free under the softplus head; ReLoBRaLo or GradNorm would be the natural next step if a future calibration broke this homogeneity. Comparison with (5.11): the analytic case is the special instance of (5.19) in which the no-short-sale-of-capital constraint never binds (so λbh0\lambda_b^h\equiv 0), there are no bonds (so all bb- and collateral-related blocks drop out), and 4(A1)+14(A-1)+1 collapses to A1A-1. The two losses are the same template instantiated at different complexity. Table Table 5.4 unpacks the residual blocks.

Table 5.4:Residual blocks entering the 56-agent benchmark loss for one training state.

ComponentSymbolCount
Euler (capital)eREE,khe_{\mathrm{REE},k}^h55
Euler (bonds)eREE,bhe_{\mathrm{REE},b}^h55
KKT (borrowing)eKKT,bh=λ^bhk^he_{\mathrm{KKT},b}^h = \hat\lambda_b^h\,\hat k'^h55
KKT (collateral)eKKT,ch=μ^hq^he_{\mathrm{KKT},c}^h = \hat\mu^h\,\hat q^h55
Market clearing (bonds)eMC,b=hb^he_{\mathrm{MC},b} = \sum_h \hat b'^h1
Total residuals221

5.5.8Training and results.

Production training uses 60,000 episodes at lr =105= 10^{-5} followed by 140,000 episodes at lr =106= 10^{-6}, with runtime of several hours on GPU. The teaching version (\sim200 segments, 128-128 hidden units) runs in a few minutes on CPU and is meant to show the mechanics and qualitative lifecycle patterns, not final accuracy. The loss trajectory typically exhibits oscillations, caused by re-simulation of the capital path at each episode, but the overall trend is steadily downward.

5.5.9Lifecycle diagnostics.

The trained model produces economically plausible lifecycle patterns. Capital savings khk'^h follow a hump shape that mirrors the labor income profile: young agents save little (borrowing constraint binds), mid-career agents accumulate rapidly, and older agents decumulate. Bond holdings bhb'^h are initially negative (young agents borrow against future income) and increase with age as agents shift from illiquid capital to liquid bonds. Bond prices vary across shock states, with higher prices in high-TFP states reflecting stronger demand for savings. In the teaching run the Euler residuals are still large enough to treat the output as diagnostic; in production runs the mean Euler equation errors are of order 10-4--10-3 for both capital and bond equations (matching Table 3 of Azinovic et al. (2022)), corresponding to a \sim0.01%--0.1% deviation in consumption. Market clearing residuals are comparably small. Convergence is also confirmed by the policy-drift check on the fixed anchor cloud: the run is treated as time-invariant once policy_drift_rms and policy_drift_max fall below their prescribed tolerances.

5.6Further Reading

5.7Exercises

Worked solutions and guidance for these exercises appear in Appendix Appendix F.

Footnotes
  1. The 56-agent benchmark of Section 5.5 adds two genuine extras to (5.11): KKT product residuals (because the borrowing and collateral constraints actually bind) and an explicit bond-market-clearing residual (because the network outputs each agent’s bond holding independently). An orthogonal extension is to encode capital-market clearing exactly via a dedicated output layer that rescales unnormalised cohort savings so that h=2Akt+1h=Kt+1\sum_{h=2}^{A} k_{t+1}^{h} = K_{t+1} holds by construction; Azinovic-Yang & Žemlička (2024) adopt this design in an OLG economy with rare disasters.

  2. In the current notebook implementation q^h\hat q^h is parameterized relative to k^h\hat k'^h, so it cannot fall to zero while k^h>0\hat k'^h>0; the collateral-complementarity residual is then satisfied by μ^h0\hat\mu^h\to 0, and the collateral constraint is effectively non-binding on the learned ergodic set, consistent with the chapter-opening note. Allowing it to bind exactly requires a free positive slack output (a softplus head on q^h\hat q^h); the architecture above already accommodates this swap.

References
  1. Diamond, P. A. (1965). National Debt in a Neoclassical Growth Model. American Economic Review, 55(5), 1126–1150.
  2. Krueger, D., & Kubler, F. (2004). Computing equilibrium in OLG models with stochastic production. Journal of Economic Dynamics and Control, 28(7), 1411–1436.
  3. Azinovic, M., Gaegauf, L., & Scheidegger, S. (2022). DEEP EQUILIBRIUM NETS. International Economic Review, 63(4), 1471–1525. 10.1111/iere.12575
  4. Brumm, J., Kubler, F., & Scheidegger, S. (2017). Computing Equilibria in Dynamic Stochastic Macro-Models with Heterogeneous Agents. In Advances in Economics and Econometrics: Eleventh World Congress (B. Honoré, A. Pakes, M. Piazzesi, and L. Samuelson, eds.) (Vol. 2, pp. 185–230). Cambridge University Press.
  5. Auerbach, A. J., & Kotlikoff, L. J. (1987). Dynamic Fiscal Policy. Cambridge University Press.
  6. Azinovic-Yang, M., & Žemlička, J. (2024). Intergenerational consequences of rare disasters. Available at SSRN 4386477. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4386477