Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

3 The International Real Business Cycle Model

University of Lausanne

Having established the DEQN framework on the one-dimensional Brock--Mirman model in Chapter Chapter 2, we now scale it to the multi-country international real business cycle (IRBC) model of Backus et al. (1992). This model features NN countries with heterogeneous productivity, complete markets, irreversible investment, and convex capital adjustment costs. It is the standard testbed for high-dimensional solution methods in macroeconomics, and applying DEQNs to it illustrates how the framework handles high-dimensional state spaces, multiple equilibrium conditions, and complementarity constraints.

3.1Why IRBC for Macro-Finance Research?

Beyond its computational-testbed role, the IRBC model is the workhorse framework for open-economy asset pricing and international risk sharing. Several first-order questions in macro-finance can be posed sharply within it:

The IRBC model is therefore an interesting substantive object, not merely a scaling test. Its combination of a clean complete-markets benchmark and rich, realistic frictions makes it a natural next step after the one-country Brock--Mirman benchmark of Chapter Chapter 2.

3.1.0.1A calibration caveat for the puzzles above.

The shock decomposition of Section 3.2 below, zj=ρzzj+σe(εj+εagg)z^{j\prime} = \rho_z z^j + \sigma_e(\varepsilon^j + \varepsilon^{\mathrm{agg}}), hard-wires a cross-country innovation correlation of exactly 1/21/2 for any number of countries NN. The consumption-correlation and Backus--Smith puzzles cited in the bullets above should therefore be read as statements about this specific calibration: richer correlation structures, country-specific factor loadings, or fewer aggregate factors would change the quantitative bite of the puzzles in this model. This is calibration, not theory.

3.2Model Setup

Table 3.1:Symbol cheat-sheet for the IRBC model. Note the IES-vs-CRRA convention: here γj\gamma_j is the intertemporal elasticity, and the implied risk aversion is 1/γj1/\gamma_j; later chapters on continuous-time HA models and climate use γ\gamma for CRRA and ψ\psi for IES.

SymbolRoleRange / signCalibration
_jIES of country jj (not CRRA)>0>0[0.25,1.0][0.25, 1.0] linearly spaced
^jPareto weight on country jj>0>0(Atfpδ)1/γj(A_{\mathrm{tfp}}-\delta)^{1/\gamma_j}
_tAggregate resource-constraint multiplier>0>0λss=1\lambda_{\mathrm{ss}} = 1
_t^jIrreversibility KKT multiplier on Ij0I^j \ge 00\ge 00 in slack regime
A_TFP normalization constant>0>00.0559\approx 0.0559
Capital share in Cobb--Douglas(0,1)\in (0,1)0.36
^jQuadratic adjustment-cost level0\ge 0κ=0.50\kappa=0.50
_zTFP persistence[0,1)\in [0,1)0.95
_eInnovation s.d. per component>0>00.01
^jIdiosyncratic innovationN(0,1)\mathcal N(0,1)i.i.d. across j,tj,t
^Aggregate innovationN(0,1)\mathcal N(0,1)common factor
Adjustment-cost intensity0\ge 00.50

The international real business cycle (IRBC) model, introduced by Backus et al. (1992), extends the single-country growth model to NN heterogeneous countries, each endowed with country-specific capital kjk^j and total factor productivity zjz^j. The model features complete markets, irreversible investment, and convex capital adjustment costs, and serves as the workhorse test case for high-dimensional solution methods Brumm & Scheidegger, 2017. Here, we apply the DEQN methodology of Azinovic et al. (2022) to this setting.

3.2.0.1Preferences.

Each country jj has CRRA utility

uj(c)  =  {c11/γj111/γj,γj1,lnc,γj=1,u^j(c) \;=\; \begin{cases} \dfrac{c^{1-1/\gamma_j} - 1}{1 - 1/\gamma_j}, & \gamma_j \neq 1,\\[6pt] \ln c, & \gamma_j = 1, \end{cases}

where the intertemporal elasticity of substitution (IES) γj\gamma_j is heterogeneous across countries; risk aversion under this CRRA specification equals 1/γj1/\gamma_j. Notation warning: this chapter uses γ\gamma for the IES, while later chapters on continuous-time HA models and climate use γ\gamma for CRRA risk aversion and ψ\psi for the IES. The convention is stated explicitly at the start of each chapter. A social planner maximizes

max  t=0βtE ⁣[j=1Nτjuj(ctj)]\max \; \sum_{t=0}^{\infty} \beta^t \, \E{\sum_{j=1}^{N} \tau^j \, u^j(c_t^j)}

with Pareto weights τj>0\tau^j > 0.

3.2.0.2Production.

Country jj produces Yj=Atfpexp(zj)(kj)ζY^j = A_\mathrm{tfp} \exp(z^j)(k^j)^\zeta, where the total factor productivity constant AtfpA_\mathrm{tfp} is calibrated to normalize the steady-state capital stock to unity. In steady state (where zj=0z^j = 0, kj=1k^j = 1, and kj=1k^{j\prime} = 1), the Euler equation implies:

Atfp=1/β1+δζ.A_\mathrm{tfp} = \frac{1/\beta - 1 + \delta}{\zeta}.

This normalization ensures that the deterministic steady state lies at (k,z)=(1,0)(k^\star, z^\star) = (1, 0) for all countries, which simplifies the network’s learning task and provides a natural center for the training distribution.

3.2.0.3TFP process.

Log productivity follows an AR(1) with common and idiosyncratic shocks:

zj=ρzzj+σe(εj+εagg),εj,εaggN(0,1) i.i.d.,ρz<1z^{j\prime} = \rho_z z^j + \sigma_e(\varepsilon^j + \varepsilon^\mathrm{agg}), \qquad \varepsilon^j, \varepsilon^\mathrm{agg} \sim \mathcal{N}(0,1)\text{ i.i.d.}, \qquad |\rho_z| < 1

The persistence restriction ρz<1|\rho_z|<1 guarantees stationarity of the TFP process, which in turn underlies the existence of an ergodic distribution on which DEQN training samples (Section Section 2.3). Here σe\sigma_e is the per-component standard deviation, so the marginal innovation variance for country jj is 2σe22\sigma_e^2 and the cross-country innovation covariance is σe2\sigma_e^2. These two facts imply a fixed cross-country innovation correlation of 1/21/2 regardless of NN, a direct consequence of the equal-weighted aggregate-shock decomposition εj+εagg\varepsilon^j + \varepsilon^\mathrm{agg}. Asset-pricing implications (in particular the international consumption-correlation puzzle and the cyclicality of trade balances) inherit this hard-wired common-factor structure: results below should be interpreted with that calibration choice in mind. If a desired total innovation scale σˉ\bar\sigma is targeted instead, set σe=σˉ/2\sigma_e = \bar\sigma/\sqrt{2}.

3.2.0.4Adjustment costs and irreversibility.

Changing the capital stock incurs a quadratic adjustment cost:

Γj=κ2kj(kjkj1) ⁣2,\Gamma^j = \frac{\kappa}{2}\, k^j \left(\frac{k^{j\prime}}{k^j} - 1\right)^{\!2},

with marginal derivatives that appear in the Euler equations:

Γjkj=κ(kjkj1),Γjkj=κ2(1(kjkj) ⁣2).\begin{aligned} \frac{\partial \Gamma^j}{\partial k^{j\prime}} &= \kappa\left(\frac{k^{j\prime}}{k^j} - 1\right), & \frac{\partial \Gamma^j}{\partial k^j} &= \frac{\kappa}{2}\left(1 - \left(\frac{k^{j\prime}}{k^j}\right)^{\!2}\right). \end{aligned}

Note that Γj/kj\partial\Gamma^j/\partial k^j is negative whenever kj>kjk^{j\prime} > k^j, i.e. in expanding states. Consequently the term Γj/kj-\partial\Gamma^j/\partial k^j that appears in the marginal product of capital below (3.14) raises MPK in expansion phases; a reader who plugs in Γ/k|\partial\Gamma/\partial k| here will introduce a sign error. Investment is irreversible: Ij=kj(1δ)kj0I^j = k^{j\prime} - (1-\delta)k^j \geq 0.

3.2.0.5Pareto-weight calibration.

With heterogeneous IES γj\gamma_j, a symmetric deterministic steady state is most easily obtained by choosing the Pareto weights as

τj  =  (Atfpδ)1/γj,j=1,,N.\tau^j \;=\; \bigl(A_\mathrm{tfp} - \delta\bigr)^{1/\gamma_j}, \qquad j=1,\ldots,N.

The derivation is a two-step inversion of the planner’s first-order condition. The consumption-sharing condition (3.12) (derived in the next section from the FOC for ctjc^j_t) reads τj(ctj)1/γj=λt\tau^j (c^j_t)^{-1/\gamma_j} = \lambda_t, so ctj=(λt/τj)γjc^j_t = (\lambda_t / \tau^j)^{-\gamma_j}. In the deterministic steady state with the normalizations λss=1\lambda_\mathrm{ss} = 1 and kssj=1k^j_\mathrm{ss} = 1 we want every country to consume the same amount cssj=Atfpδc^j_\mathrm{ss} = A_\mathrm{tfp} - \delta implied by the resource constraint cssj=YssjIssjc^j_\mathrm{ss} = Y^j_\mathrm{ss} - I^j_\mathrm{ss}. Setting (1/τj)γj=Atfpδ(1/\tau^j)^{-\gamma_j} = A_\mathrm{tfp} - \delta and solving for τj\tau^j gives Eq. (3.7). The symmetric steady state thus serves as a natural anchor for training: the network’s initial predictions need only match this point to avoid infeasible economies during the early simulated trajectories.

3.2.0.6Reference calibration.

Throughout the companion notebooks lecture_04_01_IRBC_DEQN_smooth.ipynb and lecture_04_02_IRBC_DEQN_irreversible.ipynb, we use the quarterly calibration summarized in Table Table 3.2. The implied total factor productivity and deterministic steady-state quantities can then be computed analytically.

Table 3.2:Reference IRBC calibration used in the companion notebook. Countries’ IES values γj\gamma_j are linearly spaced in [γmin,γmax][\gamma_{\min}, \gamma_{\max}]. Pareto weights are computed from (3.7).

SymbolNameValueDescription
β\betaDiscount factor0.99Quarterly
ζ\zetaCapital share0.36Cobb--Douglas
δ\deltaDepreciation0.01Low quarterly rate
ρz\rho_zTFP persistence0.95Highly persistent
σe\sigma_eShock std. dev.0.01Small innovations
κ\kappaAdjustment-cost intensity0.50Moderate frictions
γmin\gamma_{\min}Min IES0.25Risk aversion =4=4
γmax\gamma_{\max}Max IES1.00Log utility
kk^\starSteady-state capital1.00Normalization

3.2.0.7Worked steady state.

Equation (3.3) is most compactly written as Atfp=(1/β1+δ)/ζA_\mathrm{tfp}=(1/\beta - 1 + \delta)/\zeta; multiplying numerator and denominator by β\beta gives the algebraically equivalent form Atfp=(1β(1δ))/(ζβ)A_\mathrm{tfp}=(1-\beta(1-\delta))/(\zeta\beta) used below. Substituting the reference values:

Atfp=1β(1δ)ζβ=10.990.990.360.99    0.0559,Yj=Atfp(k)ζ0.0559,Ij=δk=0.01,cj=YjIj0.0459.\begin{aligned} A_\mathrm{tfp} &= \frac{1-\beta(1-\delta)}{\zeta\,\beta} = \frac{1 - 0.99 \cdot 0.99}{0.36 \cdot 0.99} \;\approx\; 0.0559, \\ Y^\star_j &= A_\mathrm{tfp}\,(k^\star)^\zeta \approx 0.0559, \qquad I^\star_j = \delta\,k^\star = 0.01, \qquad c^\star_j = Y^\star_j - I^\star_j \approx 0.0459. \end{aligned}

The aggregate resource constraint (3.17) is then satisfied country by country, YjIjcj=0Y^\star_j - I^\star_j - c^\star_j = 0, as a trivial check. These numbers provide a baseline against which the trained network’s predictions on an out-of-sample simulation can be compared.

3.3The Planner’s Problem and Equilibrium Conditions

3.3.0.1The planner’s problem.

The social planner maximizes the weighted sum of utilities across all NN countries, subject to the aggregate resource constraint (3.17), the irreversibility constraints, the production technology, and the TFP process (3.4):

max{ctj,kt+1j}j,t  t=0βtE ⁣[j=1Nτjuj(ctj)]\max_{\{c_t^j,\, k_{t+1}^j\}_{j,t}} \; \sum_{t=0}^{\infty} \beta^t \, \E{\sum_{j=1}^{N} \tau^j \, u^j(c_t^j)}

with Pareto weights τj>0\tau^j > 0 and discount factor β(0,1)\beta \in (0,1).

3.3.0.2The Lagrangian.

Following the same approach as in Section Section 2.4 for the Brock--Mirman model, we form the Lagrangian by attaching discounted multipliers to each constraint. Let βtλt\beta^t \lambda_t be the multiplier on the aggregate resource constraint at date tt, and βtμtj\beta^t \mu_t^j the multiplier on the irreversibility constraint for country jj at date tt. The Lagrangian is:

L=E[t=0βt(j=1Nτj(ctj)11/γj111/γj+λtj=1N(Ytj+(1δ)ktjkt+1jΓtjctj)+j=1Nμtj(kt+1j(1δ)ktj))].\begin{split} \mathcal{L} = \mathbb{E}\Biggl[\sum_{t=0}^{\infty} \beta^t \Biggl( &\sum_{j=1}^{N} \tau^j \, \frac{(c_t^j)^{1-1/\gamma_j} - 1}{1-1/\gamma_j} + \lambda_t \sum_{j=1}^{N} \bigl(Y_t^j + (1-\delta)k_t^j - k_{t+1}^j - \Gamma_t^j - c_t^j\bigr) \\ &+ \sum_{j=1}^{N} \mu_t^j \bigl(k_{t+1}^j - (1-\delta)k_t^j\bigr) \Biggr)\Biggr]. \end{split}

The planner chooses ctjc_t^j and kt+1jk_{t+1}^j for each country jj and each date tt. The complementary slackness conditions require μtj0\mu_t^j \geq 0, Itj0I_t^j \geq 0, and μtjItj=0\mu_t^j \cdot I_t^j = 0. Two notation reminders before we differentiate. First, the irreversibility multiplier is μtj\mu_t^j, not the resource-constraint multiplier λt\lambda_t; the two play different roles (λt\lambda_t shadow-prices the aggregate goods market; μtj\mu_t^j shadow-prices country jj’s individual investment floor) and they enter the FOCs through entirely different channels. Second, μtj0\mu_t^j \geq 0 is the standard KKT sign: the multiplier on a \geq-constraint is non-negative at the optimum, and the Fischer--Burmeister residual constructed below packages this sign restriction together with the slackness condition into a single smooth squared term that is compatible with SGD.

3.3.0.3FOC w.r.t. ctjc_t^j:

Differentiating the Lagrangian with respect to ctjc_t^j:

Lctj=βt[τj(ctj)1/γjλt]=0τj(ctj)1/γj=λt.\frac{\partial \mathcal{L}}{\partial c_t^j} = \beta^t \bigl[\tau^j (c_t^j)^{-1/\gamma_j} - \lambda_t\bigr] = 0 \qquad\Longrightarrow\qquad \tau^j (c_t^j)^{-1/\gamma_j} = \lambda_t.

This is the consumption-sharing condition: the planner equates the Pareto-weighted marginal utility of consumption across all countries to a common shadow price λt\lambda_t. Solving (3.11) for ctjc_t^j:

ctj=(λtτj)γj.c_t^j = \left(\frac{\lambda_t}{\tau^j}\right)^{-\gamma_j}.

This shows that all NN consumption levels are determined by the single variable λt\lambda_t: a higher shadow price (resources are scarcer) lowers consumption in every country. Countries with a higher IES γj\gamma_j respond more elastically to changes in λt\lambda_t.

3.3.0.4FOC w.r.t. kt+1jk_{t+1}^j:

The variable kt+1jk_{t+1}^j appears in three places in the Lagrangian: (i) the date-tt resource constraint with coefficient λt(1+Γtj/kt+1j)-\lambda_t(1 + \partial\Gamma_t^j/\partial k_{t+1}^j), (ii) the date-tt irreversibility constraint with coefficient +μtj+\mu_t^j, and (iii) the date-(t ⁣+ ⁣1)(t\!+\!1) terms via output Yt+1jY_{t+1}^j, depreciated capital (1δ)kt+1j(1-\delta)k_{t+1}^j, adjustment costs Γt+1j\Gamma_{t+1}^j, and the irreversibility constraint. Differentiating and collecting terms:

Lkt+1j=βt ⁣[λt ⁣(1+Γtjkt+1j)+μtj]+βt+1Et ⁣[λt+1 ⁣(Yt+1jkt+1j+(1δ)Γt+1jkt+1j)μt+1j(1δ)]=0.\frac{\partial \mathcal{L}}{\partial k_{t+1}^j} = \beta^t \!\left[-\lambda_t\!\left(1 + \frac{\partial \Gamma_t^j}{\partial k_{t+1}^j}\right) + \mu_t^j\right] \\ + \beta^{t+1}\,\mathbb{E}_t\!\left[\lambda_{t+1}\!\left(\frac{\partial Y_{t+1}^j}{\partial k_{t+1}^j} + (1-\delta) - \frac{\partial \Gamma_{t+1}^j}{\partial k_{t+1}^j}\right) - \mu_{t+1}^j(1-\delta)\right] = 0.

Now define the marginal product of capital (inclusive of depreciation and adjustment cost effects):

MPKj    1δ+ζAtfpexp(zj)(kj)ζ1Γjkj,\mathrm{MPK}^j \;\equiv\; 1-\delta + \zeta A_\mathrm{tfp}\exp(z^j)(k^j)^{\zeta-1} - \frac{\partial \Gamma^j}{\partial k^j},

and note from (3.6) that Γtj/kt+1j=κ(kt+1j/ktj1)\partial \Gamma_t^j / \partial k_{t+1}^j = \kappa(k_{t+1}^j/k_t^j - 1). Dividing (3.13) by βt\beta^t and substituting the MPK definition:

λt ⁣(1+Γtjkt+1j)μtj=βEt ⁣[λt+1MPKt+1j(1δ)μt+1j].\lambda_t\!\left(1 + \frac{\partial \Gamma_t^j}{\partial k_{t+1}^j}\right) - \mu_t^j = \beta\,\mathbb{E}_t\!\bigl[\lambda_{t+1}\,\mathrm{MPK}_{t+1}^j - (1-\delta)\,\mu_{t+1}^j\bigr].

This is the Euler equation for country jj. The left-hand side is the cost of investing one more unit in country jj’s capital: the shadow price λt\lambda_t of the resources used (scaled by the marginal adjustment cost) minus the value μtj\mu_t^j of relaxing the irreversibility constraint. The right-hand side is the expected discounted benefit: next period’s shadow price times the marginal product of capital, minus the option-value loss from tightening next period’s irreversibility constraint.

3.3.0.5Relative error form.

For numerical purposes, regroup (3.15) so that the cost-of-investment term λt(1+Γtj/kt+1j)\lambda_t(1+\partial\Gamma_t^j/\partial k_{t+1}^j) stands alone on the left, λt(1+Γtj/kt+1j)=βEt[λt+1MPKt+1j(1δ)μt+1j]+μtj\lambda_t(1+\partial\Gamma_t^j/\partial k_{t+1}^j) = \beta\,\mathbb{E}_t[\lambda_{t+1}\,\mathrm{MPK}_{t+1}^j - (1-\delta)\mu_{t+1}^j] + \mu_t^j, and divide through by it. This gives a scale-free formulation:

βEt ⁣[λMPKj(1δ)μj]+μjλ(1+Γj/kj)1=0,j=1,,N.\frac{\beta\,\mathbb{E}_t\!\left[\lambda' \cdot \mathrm{MPK}^{j\prime} - (1-\delta)\mu^{j\prime}\right] + \mu^j}{\lambda(1+\partial\Gamma^j/\partial k^{j\prime})} - 1 = 0, \qquad j=1,\ldots,N.

This ensures that all NN Euler equations are dimensionless and residuals can be interpreted directly as percentage deviations from optimality.

3.3.0.6Aggregate resource constraint.

All output is allocated to consumption, investment, and adjustment costs:

j=1N[Yj+(1δ)kjkjΓjcj]=0.\sum_{j=1}^{N}\bigl[Y^j + (1-\delta)k^j - k^{j\prime} - \Gamma^j - c^j\bigr] = 0.

3.3.0.7Summary of equilibrium conditions.

The complete system consists of three blocks:

  1. Consumption sharing (3.12): determines all NN consumption levels from λt\lambda_t.

  2. Euler equations (3.16): NN intertemporal optimality conditions, one per country.

  3. Aggregate resource constraint (3.17): closes the model by equating world supply and demand.

In addition, the NN irreversibility constraints are enforced via complementary slackness (μj0\mu^j \geq 0, Ij0I^j \geq 0, μjIj=0\mu^j I^j = 0).

3.3.0.8Fischer--Burmeister complementarity.

The irreversibility constraint is enforced via a smoothed Fischer--Burmeister residual:

FBε(μj,Ij)=μj+Ij(μj)2+(Ij)2+ε2=0.\mathrm{FB}_\varepsilon(\mu^j, I^j) = \mu^j + I^j - \sqrt{(\mu^j)^2 + (I^j)^2 + \varepsilon^2} = 0.

The exact Fischer--Burmeister map is the limiting case FB0(μ,I)=μ+Iμ2+I2\mathrm{FB}_0(\mu,I)=\mu+I-\sqrt{\mu^2+I^2}. Its zero set coincides with the positive axes in the (μ,I)(\mu, I)-plane, ensuring μj0\mu^j \geq 0, Ij0I^j \geq 0, and μjIj=0\mu^j \cdot I^j = 0 (Figure Figure 3.1). The smoothed version with ε>0\varepsilon > 0 rounds the corner at the origin and is differentiable there, improving numerical conditioning at the cost of a slight relaxation of exact complementarity. The companion notebooks use ε=104\varepsilon = 10^{-4} as the default; tighter values (10-6--10-5) are sometimes preferred when complementarity must hold to higher accuracy, at the cost of stiffer gradients near the origin.

The Fischer--Burmeister complementarity function, drawn in the investment--multiplier plane: investment I^j on the horizontal axis, the irreversibility multiplier \mu^j on the vertical axis. The exact map \mathrm{FB}_0(\mu,I)=\mu+I-\sqrt{\mu^2+I^2} packs the three Karush--Kuhn--Tucker conditions \mu\ge 0, I\ge 0, \mu I=0 into a single smooth equation: \mathrm{FB}_0=0 holds exactly on the two heavy blue half-axes and nowhere else. The horizontal half-axis (\mu=0, I>0) is the investing regime, where the country invests a strictly positive amount, the irreversibility constraint is slack, and its shadow price \mu is therefore zero. The vertical half-axis (I=0, \mu>0) is the constrained regime, where the constraint binds, investment is pinned at zero, and \mu>0 measures how much the planner would pay to relax it; the origin is the knife-edge where both hold with equality. The open interior of the first quadrant (\mu>0 and I>0 together) is infeasible because it violates complementarity, and there \mathrm{FB}_0>0 strictly (since \mu+I>\sqrt{\mu^2+I^2} whenever both are positive). This is exactly what makes the function useful as a loss term: when the network’s predicted (\mu^j,I^j) lands in that forbidden region, the squared residual \mathrm{FB}_\varepsilon^2 is positive and its negative gradient -\nabla\mathrm{FB}_0 (green arrow) pushes the iterate back toward the nearest feasible half-axis, so the network learns which regime applies at each state without any explicit regime switch. The exact map has a single kink, at the origin; the smoothed version \mathrm{FB}_\varepsilon(\mu,I)=\mu+I-\sqrt{\mu^2+I^2+\varepsilon^2} actually used in the code rounds that corner, restoring differentiability everywhere at the price of an \mathcal{O}(\varepsilon) relaxation of exact complementarity.

Figure 3.1:The Fischer--Burmeister complementarity function, drawn in the investment--multiplier plane: investment IjI^j on the horizontal axis, the irreversibility multiplier μj\mu^j on the vertical axis. The exact map FB0(μ,I)=μ+Iμ2+I2\mathrm{FB}_0(\mu,I)=\mu+I-\sqrt{\mu^2+I^2} packs the three Karush--Kuhn--Tucker conditions μ0\mu\ge 0, I0I\ge 0, μI=0\mu I=0 into a single smooth equation: FB0=0\mathrm{FB}_0=0 holds exactly on the two heavy blue half-axes and nowhere else. The horizontal half-axis (μ=0\mu=0, I>0I>0) is the investing regime, where the country invests a strictly positive amount, the irreversibility constraint is slack, and its shadow price μ\mu is therefore zero. The vertical half-axis (I=0I=0, μ>0\mu>0) is the constrained regime, where the constraint binds, investment is pinned at zero, and μ>0\mu>0 measures how much the planner would pay to relax it; the origin is the knife-edge where both hold with equality. The open interior of the first quadrant (μ>0\mu>0 and I>0I>0 together) is infeasible because it violates complementarity, and there FB0>0\mathrm{FB}_0>0 strictly (since μ+I>μ2+I2\mu+I>\sqrt{\mu^2+I^2} whenever both are positive). This is exactly what makes the function useful as a loss term: when the network’s predicted (μj,Ij)(\mu^j,I^j) lands in that forbidden region, the squared residual FBε2\mathrm{FB}_\varepsilon^2 is positive and its negative gradient FB0-\nabla\mathrm{FB}_0 (green arrow) pushes the iterate back toward the nearest feasible half-axis, so the network learns which regime applies at each state without any explicit regime switch. The exact map has a single kink, at the origin; the smoothed version FBε(μ,I)=μ+Iμ2+I2+ε2\mathrm{FB}_\varepsilon(\mu,I)=\mu+I-\sqrt{\mu^2+I^2+\varepsilon^2} actually used in the code rounds that corner, restoring differentiability everywhere at the price of an O(ε)\mathcal{O}(\varepsilon) relaxation of exact complementarity.

The complementarity conditions μj0\mu^j \geq 0, Ij0I^j \geq 0, μjIj=0\mu^j \cdot I^j = 0 have a natural economic interpretation: when investment is strictly positive (Ij>0I^j > 0), the irreversibility constraint is slack and the multiplier is zero (μj=0\mu^j = 0); conversely, when the constraint binds (Ij=0I^j = 0), the multiplier is positive, reflecting the shadow value of the binding constraint. The FB function smoothly encodes both regimes, allowing the neural network to learn which regime applies for each state without explicit regime switching.

3.4DEQN Formulation

3.4.0.1From Brock--Mirman to IRBC.

It is useful to see the IRBC as the natural extension of the one-country benchmark of Chapter Chapter 2. Table Table 3.3 summarizes what changes.

Table 3.3:The DEQN template is the same in both cases; only the input/output dimensions, the number of loss terms, and the presence of complementarity constraints change.

Brock--Mirman (Ch. Chapter 2)IRBC (this chapter)
Countries1NN
States(K,z)(K, z)(k1,,kN,z1,,zN)(k^1,\ldots,k^N, z^1,\ldots,z^N)
PoliciesCC(k1,,kN,λ,μ1,,μN)(k^{1\prime},\ldots,k^{N\prime}, \lambda, \mu^1,\ldots,\mu^N)
Loss terms1 EulerNN Euler ++ 1 ARC ++ NN Fischer--Burmeister
Constraintsnoneirreversibility, convex adjustment costs
Shocks per period1N+1N+1 (one idiosyncratic per country + one aggregate)
Output activationsoftplus or sigmoidsoftplus
Analytical solutionyes (log utility, δ=1\delta=1)no

The full system of equations comprises NN Euler equations, NN Fischer--Burmeister conditions, and 1 aggregate resource constraint, totaling 2N+12N+1 equations. Table Table 3.4 summarizes how the problem dimensions scale with NN.

Table 3.4:Scaling of the IRBC state, policy, equation, and quadrature dimensions with the number of countries NN. The state, policy, and equation counts grow linearly. Tensor-product Gauss--Hermite quadrature grows as QN+1Q^{N+1}, while the Stroud-3 monomial rule uses only 2(N+1)2(N+1) nodes; this is why the notebook uses Gauss--Hermite only for the two-country classroom case and switches to monomial or QMC rules in larger IRBC applications.

NNStatesPoliciesEquationsShock dim.GH nodes (Q=3Q=3)Stroud-3 nodes
24553276
5101111672912
10202121111.8×1051.8\times 10^522
50100101101512.2×1024\sim 2.2\times 10^{24}102
1002002012011011.5×1048\sim 1.5\times 10^{48}202
Quadrature-cost crossover for the IRBC model as a function of the number of countries N. Tensor-product Gauss--Hermite (red) grows exponentially in N and becomes infeasible by N=10; the Stroud-3 monomial rule (blue) grows linearly and stays well under 10^3 nodes even at N=100. This is the operational reason every IRBC application beyond the classroom N=2 case uses monomial or QMC integration.

Figure 3.2:Quadrature-cost crossover for the IRBC model as a function of the number of countries NN. Tensor-product Gauss--Hermite (red) grows exponentially in NN and becomes infeasible by N=10N=10; the Stroud-3 monomial rule (blue) grows linearly and stays well under 103 nodes even at N=100N=100. This is the operational reason every IRBC application beyond the classroom N=2N=2 case uses monomial or QMC integration.

The neural network maps the full state vector s=(k1,,kN,z1,,zN)R2N\bm{s} = (k^1,\ldots,k^N, z^1,\ldots,z^N) \in \R^{2N} to all 2N+12N+1 policy variables (k1,,kN,λ,μ1,,μN)(k^{1\prime},\ldots,k^{N\prime}, \lambda, \mu^1,\ldots,\mu^N) simultaneously through the small Swish--softplus network in Figure Figure 3.3.

Reference network architecture used for the N-country IRBC model. The diagram shows the irreversible companion notebook (lecture_04_02_IRBC_DEQN_irreversible.ipynb): two hidden layers of 64 Swish units mapping the 2N-dimensional state to a 2N+1-dimensional output (N capital choices, the resource-constraint multiplier \lambda, and the N irreversibility multipliers \mu^j); softplus on the \lambda and \mu^j heads enforces non-negativity, and capital choices use the bounded growth head described below. The smooth-benchmark companion (lecture_04_01_IRBC_DEQN_smooth.ipynb) drops the \mu^j block, leaving an N+1-dimensional output head and no Fischer--Burmeister residual; in both notebooks the capital head is parameterized as the bounded log-growth k_{t+1}^j = k_t^j\exp\{\bar g\,\tanh r_j(\bm s)\} (smooth) or the additive form k_{t+1}^j = (1-\delta)k_t^j + \mathrm{softplus}(r_j) (irreversible), both of which keep k_{t+1}^j > 0 by construction.

Figure 3.3:Reference network architecture used for the NN-country IRBC model. The diagram shows the irreversible companion notebook (lecture_04_02_IRBC_DEQN_irreversible.ipynb): two hidden layers of 64 Swish units mapping the 2N2N-dimensional state to a 2N+12N+1-dimensional output (NN capital choices, the resource-constraint multiplier λ\lambda, and the NN irreversibility multipliers μj\mu^j); softplus on the λ\lambda and μj\mu^j heads enforces non-negativity, and capital choices use the bounded growth head described below. The smooth-benchmark companion (lecture_04_01_IRBC_DEQN_smooth.ipynb) drops the μj\mu^j block, leaving an N+1N+1-dimensional output head and no Fischer--Burmeister residual; in both notebooks the capital head is parameterized as the bounded log-growth kt+1j=ktjexp{gˉtanhrj(s)}k_{t+1}^j = k_t^j\exp\{\bar g\,\tanh r_j(\bm s)\} (smooth) or the additive form kt+1j=(1δ)ktj+softplus(rj)k_{t+1}^j = (1-\delta)k_t^j + \mathrm{softplus}(r_j) (irreversible), both of which keep kt+1j>0k_{t+1}^j > 0 by construction.

The hidden layers use the Swish activation swish(x)=xσ(x)\mathrm{swish}(x) = x \cdot \sigma(x), while the output layer employs the softplus function ln(1+ex)\ln(1+e^x) to keep the multipliers and capital choice positive. Two approximation caveats deserve emphasis. First, softplus(x)>0\mathrm{softplus}(x) > 0 for all xx, so the multipliers μj\mu^j are strictly positive rather than exactly zero when the constraint is slack; complementarity is enforced only approximately. Second, irreversibility requires Ij=kj(1δ)kj0I^j = k^{j\prime} - (1-\delta)k^j \geq 0; a softplus on kjk^{j\prime} alone does not enforce this, since the network can output a positive kjk^{j\prime} that nonetheless implies negative investment. A cleaner alternative is to output investment directly via Ij=softplus(rj)I^j = \mathrm{softplus}(r^j) and set kj=(1δ)kj+Ijk^{j\prime} = (1-\delta)k^j + I^j, which hard-enforces the constraint by construction.

The total DEQN loss aggregates the equilibrium conditions. In the smooth benchmark (companion notebook lecture_04_01_IRBC_DEQN_smooth.ipynb) only the Euler and aggregate-resource-constraint residuals appear:

ρsmooth=1Nsi=1Ns[j=1N(Eulerj(si))2+(ARC(si))2].\ell^{\mathrm{smooth}}_\rho = \frac{1}{N_s} \sum_{i=1}^{N_s} \left[ \sum_{j=1}^{N} \bigl(\mathrm{Euler}^j(\bm{s}_i)\bigr)^2 + \bigl(\mathrm{ARC}(\bm{s}_i)\bigr)^2 \right].

The irreversibility extension (companion notebook lecture_04_02_IRBC_DEQN_irreversible.ipynb) augments (3.19) with the Fischer--Burmeister complementarity block:

ρirrev=ρsmooth  +  1Nsi=1Nsj=1N(FBj(si))2,\ell^{\mathrm{irrev}}_\rho = \ell^{\mathrm{smooth}}_\rho \;+\; \frac{1}{N_s}\sum_{i=1}^{N_s} \sum_{j=1}^{N} \bigl(\mathrm{FB}^j(\bm{s}_i)\bigr)^2,

where NsN_s is the number of training states. When the individual loss components differ in magnitude across countries (which is typical when countries differ in size or calibration), an adaptive loss-balancing scheme from Chapter Chapter 4 (e.g., ReLoBRaLo, SoftAdapt, GradNorm) can be applied to reweight the components during training.

3.4.0.2Representative implementation.

The architecture is a 2-hidden-layer Swish network with a softplus output head. In the smooth benchmark the head has dimension N+1N + 1 (the NN capital choices and the resource-constraint multiplier λ\lambda); in the irreversible extension the head expands to 2N+12N + 1, adding the irreversibility multipliers μj0\mu^j \ge 0 (softplus enforces non-negativity by construction). Only the irreversible loss carries a non-textbook line, the Fischer--Burmeister smoothing of the complementarity 0μjIj00 \le \mu^j \perp I^j \ge 0:

def fischer_burmeister(mu, I, eps=1e-4):
    return mu + I - tf.sqrt(mu**2 + I**2 + eps**2)

Program 1:Fischer--Burmeister smoothing of μI\mu \perp I (irreversible companion notebook only).

This residual is then squared elementwise and averaged across the mini-batch and across the NN countries, in line with the squared-residual treatment of the Euler and ARC blocks; that elementwise square is what makes the gradient field push iterates toward the complementarity axes (see Figure Figure 3.1). Inside the per-batch cost function of the irreversible notebook, this residual is squared and averaged alongside the Euler-equation residual (whose conditional expectation is handled by the Stroud-3 monomial rule of Section 2.6.3 -- 2(N+1)2(N+1) nodes for the NN idiosyncratic and one aggregate shock) and the aggregate-resource-constraint residual. The smooth companion implements the same compute_cost pipeline with the μj\mu^j outputs and the FB block removed.

3.5Persistent-Simulation Training

The companion notebooks train the IRBC DEQN with a single training pipeline: a continuing ensemble of stochastic trajectories that evolves alongside the policy network. There is no Phase 1 / Phase 2 switch and no reset to the steady state between training segments.

What makes the single-pipeline approach feasible is that both companion notebooks parameterize the policy so that capital cannot leave the feasible set, even at random initialization. In the smooth notebook the network outputs a bounded log-growth term, kt+1j=ktjexp{gˉtanhrj(s)}k_{t+1}^j = k_t^j\exp\{\bar g\,\tanh r_j(\bm{s})\}, which keeps kt+1jk_{t+1}^j strictly positive and per-period capital growth bounded by exp{±gˉ}\exp\{\pm\bar g\}. In the irreversible notebook the policy network outputs an investment fraction shaped by a sigmoid head and the law of motion kt+1j=(1δ)ktj+Ijk_{t+1}^j = (1-\delta)k_t^j + I^j is hard-coded with Ij0I^j \ge 0. Either choice removes the reason historical implementations needed a uniform-sampling burn-in: the simulation cannot diverge.

A SAMPLING_MODE switch (simulation vs exogenous) is exposed for ablation studies and debugging, exogenous sampling on a wide box can be useful to confirm that a finding is not an artefact of the ergodic set, but the default simulation mode runs for the entire training horizon without a phase change.

A typical schedule on the two-country benchmark uses M=10M = 10 trajectories of length T=256T = 256 per segment, a batch size of 256, and one or a small number of optimizer passes per segment, with Adam at learning rate η103\eta \sim 10^{-3} and a cosine decay; convergence is read off the diagnostics of the next section rather than off a phase-transition criterion. As a budgeting reference, the companion notebooks typically run on the order of 200--500 training segments before mean Euler errors drop below 10-3 on a held-out trajectory.

3.6Results and Scalability

The DEQN approach has been successfully applied to IRBC models with up to N=100N=100 countries (200 state variables, 201 policy outputs), producing equilibrium errors below 10-3 in all Euler equations, a level comparable to the best existing solution methods at a fraction of the computational cost, while substantially mitigating curse-of-dimensionality effects in practice.

3.6.0.1Convergence diagnostics.

The quality of the DEQN solution is assessed using several complementary diagnostics:

  1. Euler equation errors: For each country jj, compute maxsStestEulerj(s)\max_{\bm{s} \in \mathcal{S}_\mathrm{test}} |\mathrm{Euler}^j(\bm{s})|. Errors below 10-3 indicate that the optimality condition is violated by less than 0.1% of consumption, an acceptable tolerance for most applications.

  2. Resource constraint residual: Verify that ARC(s)<104|\mathrm{ARC}(\bm{s})| < 10^{-4} on the test set.

  3. Complementarity check (irreversible companion only): Confirm that FBj0\mathrm{FB}^j \approx 0 and that the multiplier μj\mu^j is positive only when investment is at its lower bound.

  4. Economic diagnostics: Verify that the ergodic distribution of capital, output, and consumption has sensible properties (e.g., positive trade balances for productive countries, capital flowing to high-productivity states).

  5. Policy-drift / time-invariance check: Evaluate the policy on a fixed anchor cloud X_anchor after each monitoring interval and report policy_drift_rms and policy_drift_max. The architecture has no calendar-time input, so any fixed weight vector is a stationary recursive policy by construction; the empirical question is whether SGD has stopped moving the policy function. The run is treated as time-invariant once both drift statistics fall below the prescribed tolerances TIME_INVARIANCE_TOL_RMS and TIME_INVARIANCE_TOL_MAX.

  6. Zero-shock stochastic steady state (SSS): Iterate the learned policy from ZERO_SHOCK_N_STARTS dispersed feasible starts with all shocks set to zero. A well-trained policy converges to a common point with IjδkjI^j \approx \delta\,k^j and (in the irreversible case) μj0\mu^j \approx 0; the SSS is a fixed point of the learned stochastic policy that is not imposed during training.

3.6.1Validation Protocol

To keep the manuscript self-contained, we summarize here the validation diagnostics used for the IRBC model:

  1. Held-out residual table. Evaluate mean and max absolute residuals on an out-of-sample test set for each equation block (Euler and ARC always; FB only in the irreversible companion). In the two-country benchmark, typical values are mean 104\sim 10^{-4} and max 103\sim 10^{-3} for Euler/ARC, with smaller FB residuals.

  2. Euler-side comparison. Compare left and right sides of the Euler equation directly on the test set (scatter around the 45-degree line). Target thresholds are mean relative error below 10-3 and max relative error below 10-2.

  3. Constraint diagnostics (irreversible companion only). Verify Ij0I^j \ge 0 everywhere and that (μj,Ij)(\mu^j, I^j) lies close to the complementarity axes (μj0\mu^j \approx 0 when Ij>0I^j > 0).

  4. Economic sanity checks. Confirm market-wide accounting identities (e.g., trade balances summing to zero), sensible consumption-sharing behavior, and stable ergodic state distributions around economically plausible regions.

  5. Policy-drift / time-invariance check. Track policy_drift_rms and policy_drift_max on a fixed anchor cloud across training segments; flag the run as time-invariant once both drop below the prescribed tolerances. This check distinguishes “the policy has stabilized” from “the residuals are small”; both are needed for a trustworthy recursive solution.

  6. Zero-shock stochastic steady state. Simulate the learned policy with all shocks set to zero from several dispersed feasible initial states. Convergence to a single point with IjδkjI^j \approx \delta\,k^j (and μj0\mu^j \approx 0 in the irreversible case) is a coordinate-free sanity check that complements the held-out residual table.

This protocol makes solution quality auditable and comparable across model sizes and network configurations.

3.6.1.1Policy function properties.

The learned policy functions exhibit the expected economic properties. Consumption sharing follows the Pareto-weight and IES structure in (3.12): holding the common shadow price λt\lambda_t fixed, a higher Pareto weight raises country jj’s consumption, and with heterogeneous IES the consumption ratio varies with λt\lambda_t. Productivity affects consumption only indirectly through the equilibrium shadow price and the resource constraint, not through a mechanical bilateral ratio zj/zkz^j/z^k. This is the textbook complete-markets prediction: the cross-country consumption ratio depends on the Pareto weights τj/τk\tau^j/\tau^k and the IES gap, not on the productivity differential. The empirical failure of this prediction is the consumption-correlation puzzle introduced in Section 3.1; a closely related but distinct failure is the Backus--Smith puzzle, which concerns the correlation between relative consumption growth and the real exchange rate, predicted to be near one under complete markets but empirically near zero or even negative. Any model that aims to reproduce either puzzle has to break some of the assumptions used here (e.g. by restricting the asset menu, Heathcote & Perri (2002), or adding non-traded goods, Heathcote & Perri (2013)). Investment responds procyclically to productivity shocks: a high realization of zjz^j raises the marginal product of capital in country jj, triggering increased investment. When the irreversibility constraint binds (Ij=0I^j = 0), capital cannot be disinvested and the multiplier μj\mu^j becomes positive; the network learns this regime-switching behavior smoothly through the Fischer--Burmeister loss. Trade balances adjust to channel resources toward productive countries: positive trade balances (net exports of goods) correspond to countries whose current productivity exceeds the average, and the implied capital flows are consistent with standard international macroeconomic theory.

The key advantage of the DEQN approach is its scaling behavior: while traditional Cartesian grid-based methods Judd, 1998 exhibit exponential growth in computation time as NN increases, and even adaptive sparse grid methods Brumm & Scheidegger, 2017, which significantly mitigate the curse of dimensionality, become computationally demanding for N>10N > 10, DEQN runtimes in our implementations are reported close to linear in NN over a broad range of model sizes (see Azinovic et al. (2022), Table 2 and surrounding discussion, for timings across N{2,,100}N \in \{2, \ldots, 100\}). This favorable empirical scaling arises because the network’s parameter count grows roughly linearly (more input/output neurons), while each SGD step avoids state-space grids. The companion notebooks (lecture_04_01_IRBC_DEQN_smooth.ipynb and lecture_04_02_IRBC_DEQN_irreversible.ipynb) only run the N=2N=2 case, so the linear-scaling claim cannot be reproduced from the in-class material; readers who wish to verify it directly should consult the published timings or replicate the larger-NN runs from the Azinovic et al. codebase.

3.6.1.2Comparison with adaptive sparse grids.

The approach of Brumm & Scheidegger (2017) handles kinks in the policy function (e.g., those induced by the irreversibility constraint) by refining the grid locally around the kink using hierarchical surplus indicators. This keeps the method accurate but the grid remains anchored to a hypercube, so computation still scales poorly once the number of active kinks or the dimensionality grows. DEQNs do not represent kinks by grid refinement; instead, they fit a smooth approximator (Swish/softplus network) to the Fischer--Burmeister-regularized problem, which produces a globally smooth policy that tracks the true piecewise structure without needing localized grid points. The two methods are therefore complementary: adaptive sparse grids give deterministic error bounds on a hypercube; DEQNs give simulation-based error bounds on the ergodic set with no grid at all. From a theoretical perspective, Montanelli & Du (2019) establish error bounds showing that deep ReLU networks can approximate functions on sparse grids without the exponential growth in parameters that afflicts classical polynomial methods, providing formal underpinning for why deep learning can mitigate (though not eliminate) the high-dimensional approximation cost. Exact runtimes depend on architectural choices, quadrature design, and hardware; the robust finding is that the DEQN formulation avoids explicit tensor-product state grids and remains computationally viable in dimensions where standard methods become prohibitively expensive.

Beyond the IRBC setting, closely related neural-equilibrium methods have been applied to other policy-relevant problems. Nuño et al. (2024) use DEQNs to compute optimal monetary policy rules under persistent supply shocks, replacing the linearization step around steady state with a globally trained policy network. Bretscher et al. (2022) apply DEQN to multi-country international real business cycles with comparative advantage. Most recently, Azinovic-Yang & Žemlička (2025) replace the endogenous cross-sectional state with a truncated history of exogenous aggregate shocks (the sequence-space representation), so that the network’s input dimension scales with the truncation horizon rather than with the number of agents, which is the heterogeneous-agent extension developed in Chapter Chapter 6.

3.7Further Reading

3.8Exercises

Worked solutions and guidance for these exercises appear in Appendix Appendix F.

References
  1. Backus, D. K., Kehoe, P. J., & Kydland, F. E. (1992). International real business cycles. Journal of Political Economy, 745–775.
  2. Heathcote, J., & Perri, F. (2002). Financial Autarky and International Business Cycles. Journal of Monetary Economics, 49(3), 601–627.
  3. Heathcote, J., & Perri, F. (2013). The International Diversification Puzzle Is Not As Bad As You Think. Journal of Political Economy, 121(6), 1108–1159.
  4. Brumm, J., & Scheidegger, S. (2017). Using Adaptive Sparse Grids to Solve High-Dimensional Dynamic Models. Econometrica, 85(5), 1575–1612. 10.3982/ECTA12216
  5. Azinovic, M., Gaegauf, L., & Scheidegger, S. (2022). DEEP EQUILIBRIUM NETS. International Economic Review, 63(4), 1471–1525. 10.1111/iere.12575
  6. Judd, K. L. (1998). Numerical methods in economics. The MIT press.
  7. Montanelli, H., & Du, Q. (2019). New Error Bounds for Deep ReLU Networks Using Sparse Grids. SIAM Journal on Mathematics of Data Science, 1(1), 78–92.
  8. Nuño, G., Renner, P., & Scheidegger, S. (2024). Monetary policy with persistent supply shocks [Techreport]. CESifo Working Paper Series.
  9. Bretscher, L., Fernández-Villaverde, J., & Scheidegger, S. (2022). Ricardian Business Cycles [SSRN Scholarly Paper]. 10.2139/ssrn.4278274
  10. Azinovic-Yang, M., & Žemlička, J. (2025). Deep Learning in the Sequence Space. 10.48550/arXiv.2509.13623
  11. Pichler, P. (2011). Solving the multi-country real business cycle model using a monomial rule Galerkin method. Journal of Economic Dynamics and Control, 35(2), 240–251.
  12. Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods (Vol. 63). SIAM.