Do banks need belts and braces?

Marcus Buckmann, Paula Gallego Marquez, Mariana Gimpelewicz and Sujit Kapadia

Bank failures are very costly for society. Following the 2007/2008 global financial crisis, international regulators introduced a package of new banking regulations, known as Basel III. This includes a wider range of capital and liquidity requirements to protect banks from different risks. But could the additional complexity be unnecessary or even increase risks, as some have argued? In a recent staff working paper, we assess the value of multiple regulatory requirements by examining how different combinations of metrics might have helped prior to the 2007/2008 crisis in gauging banks that subsequently failed. Our results generally support the case for a small portfolio of different regulatory metrics: having belts and braces (or suspenders) can strengthen the resilience of the banking system.

Why do we need different regulatory measures for banks?

Before the global financial crisis, regulation relied heavily on a single metric to measure a bank’s health – the risk-weighted capital ratio, which is the ratio of a bank’s capital (or capacity to absorb losses) relative to a measure of its assets that takes into account their riskiness. But there are several reasons why using multiple regulatory requirements could be a good idea. For example, the Tinbergen Rule tells us that every policy objective should be addressed using a distinct policy tool. You can think of this as packing for a ski trip – you want to pack a coat to avoid frostbite, sunscreen to prevent sunburn and a helmet to protect against injury. Basel III applies that spirit to banking regulation, targeting different vulnerabilities using different regulatory requirements, including:

  • The risk-weighted capital ratio (RWCR) was retained with some adjustments made to the methods for calculating the riskiness of different assets, alongside a more stringent calibration overall. All else equal, it asks banks to have more capital if they perform riskier activities.
  • The leverage ratio (LR) is the simple ratio of a bank’s capital to its assets. It does not take into account banks’ riskiness but rather restricts the overall size of bank balance sheets and limits excessive debt in the system.
  • The net stable funding ratio (NSFR) requires that banks have sufficient deposits and other funding that is stable over a one-year period relative to the (typically long-term) maturity of their assets. It helps ensure banks can meet their medium-term obligations.

Another reason for using multiple requirements is that there is uncertainty about the world and banking regulation. For example, measuring risk for setting capital requirements relies on models, which in turn relies on good historical data to tell us what might happen in the next crisis. But in practice models are prone to errors and, even with perfect historical data, some risks may simply be unknowable and the next crisis could be unprecedented – who in 2019 would have predicted a global pandemic during 2020?

Using multiple metrics for capital and liquidity could seem redundant, but it helps insure against these uncertainties. When driving, for example, we have no way of knowing if we might suffer a road accident – or how severe this accident could be. So we use a seatbelt and an airbag, even if their objectives overlap, as neither of them provides perfect protection.

But we need to understand whether the increase in complexity is justified. Is each additional requirement adding protection (like seatbelts and airbags) or are some redundant (like belts and braces)? 

Testing how useful regulations are

In our paper, we use simple rules of thumb to compare how different regulatory metrics and combinations of metrics could have helped predict which banks subsequently failed during the global financial crisis. We apply the rules to the capital and liquidity ratios of 76 large banks globally (with over $100 billion assets) in 2006.

To illustrate these simple rules, let’s start with an individual metric, say the LR. We set a threshold, for example:

if the LR is less than 3%, we predict that the bank failed during the crisis.

To see how well this works, we look at how many banks had a LR below 3% before the crisis and check how many of those failed and how many survived. To illustrate this, we can think of the now all-too-familiar efficacy of Covid-19 tests. The proportion of people who have Covid-19 testing positive (or, in our case, banks that had a LR below 3% and failed) is the hit rate. And the proportion of people without Covid-19 that get a false positive (or, in our case, banks that had a LR below 3% and survived) is the false alarm rate.

A perfect rule has a hit rate of 100% and a false-alarm rate of 0%. In reality, our rules are not perfect. At the extreme, we could set a threshold so high it would predict all banks failed, so we would have a hit rate of 100%, but also a very high false-alarm rate. So we vary the thresholds and look at the trade-off between hits and false alarms.

Let’s now examine a portfolio of three Basel III metrics: the LR, RWCR and NSFR (Basel III also introduced a liquidity coverage ratio, but we are unable to consider it due to insufficient data). We can suppose that if a bank dips below at least one threshold, it will fail. For example:

if the LR is less than 3%, or
the RWCR is less than 8.5% or
the NSFR is less than < 100%,
we predict that the bank failed during the crisis.

But rather than pre-specifying particular thresholds, we flip the problem on its head: we look for the combination of thresholds that gives the fewest false alarms for a given desired hit rate i.e. we find thresholds to achieve every hit rate between 5% and 95% with the lowest possible false alarm rate.

What can World War II radar technology tell us about regulatory metrics?

We evaluate the performance of our rules using a receiver operating characteristic (ROC) curve. The ROC curve is a type of plot that compares the hit rates (vertical axis) and false alarm rates (horizontal axis) of different rules at different target hit rates. This was first used during World War II to improve the detection of Japanese aircraft using radar signals in the wake of Pearl Harbor, but has since been used in other fields of research. A perfect rule would appear in the top-left of Chart 1, while any rule above the 45-degree line is better than a random guess.

Chart 1: Hit rates and false alarm rates of individual metrics and a portfolio of all three metrics

Our experiment (see Chart 1) shows that, at hit rates below 80%, the LR and a portfolio of all three metrics perform just as well, with both out-performing the RWCR and NSFR on their own. But, crucially, at higher hit rates (which are likely to matter most for regulators and for society given the very high systemic costs of (large) bank failure compared to, for example, the costs of unnecessarily scrutinising a bank incorrectly identified as vulnerable), the portfolio performs best.

We also find another advantage of using the portfolio relative to using one metric: with the portfolio we need lower regulatory thresholds on each metric to achieve the same result. For example, as shown in Table A below, to achieve a hit rate of 85%, we would need to calibrate each requirement less stringently than when using any of the three metrics on their own, which benefits the system if tougher individual regulations are particularly costly.

Table A: Optimal threshold values to achieve an 85% hit rate

LRRWCRNSFR
Portfolio (LR, RWCR, NSFR)4.15%5.52%76%
Individual LR5.00%
Individual RWCR9.04%
Individual NSFR109%

These results broadly hold when we use 2005 data. In the paper we also discuss different robustness exercises, including out-of-sample tests, and consider how alternative metrics, such as loan-to-deposit and market-based ratios, could help predict bank failure.

Do measures become less useful once you regulate them?

One interesting result is that the RWCR, the one measure that was regulated before the crisis, is the worst predictor – and we are not the first to find this. In contrast, the LR is the most powerful individual predictor. This could be a result of Goodhart’s law, which tells us that ‘when a measure becomes a target, it ceases to be a good measure’. This could happen if banks attempt to arbitrage individual regulations by, for example, engaging in transactions that are not classified as ‘risky’ by the RWCR, but which increase their leverage.

To dig into this, we re-run our experiment separately in North America – where, by exception, the LR was regulated before the crisis – and the rest of the world. Interestingly, in North America, the RWCR is a better predictor of bank failure than the LR. We find the opposite result in the rest of the world. Although this is consistent with Goodhart’s law, there could be other explanations, such as differences in risks across regions. In any case, a portfolio with both metrics can predict bank failure better than the best individual metric.

Belts and Braces?

Overall, Basel III may have introduced additional complexity, but we find the new regulatory requirements have complementary value when regulating banks. So perhaps regulatory metrics are not belts and braces after all.


Marcus Buckmann works in the Bank’s Advanced Analytics Division, Paula Gallego Marquez works in the Prudential Policy Division, Mariana Gimpelewicz works in the Resolution Division and Sujit Kapadia works for the European Central Bank.

If you want to get in touch, please email us at bankunderground@bankofengland.co.uk or leave a comment below.

Comments will only appear once approved by a moderator, and are only published where a full name is supplied. Bank Underground is a blog for Bank of England staff to share views that challenge – or support – prevailing policy orthodoxies. The views expressed here are those of the authors, and are not necessarily those of the Bank of England, or its policy committees.