Description
Abstract: This article uses reinforcement learning (RL) to approximate the policy rules of banks participating in a high-value payment system (HVPS). The objective of the RL agents is to learn a policy function for the choice of amount of liquidity provided to the system at the beginning of the day and the rate at which to pay intraday payments. Individual choices have complex strategic effects precluding a closed form solution of the optimal policy, except in simple cases. We show that, in a stylized two-agent setting, RL agents learn the optimal policy that minimizes the cost of processing their individual payments - without complete knowledge of the environment. We further demonstrate that, in more complex settings, both agents learn to reduce the cost of processing their payments and effectively respond to liquidity-delay tradeoff. Our results show the potential of RL to solve liquidity management problems in HVPS and provide new tools to assist policymakers in their mandates of ensuring safety and improving the efficiency of payment systems.