Policy Iteration for Learning an Exercise Policy for American Options
Yuxi Li and Dale Schuurmans
| What | Talk |
|---|---|
| When |
2008-07-03 15:05
2008-07-03 15:30
2008-07-03 from 15:05 to 15:30 |
| Add event to calendar |
|
Options are important financial instruments, whose prices are usually determined by computational methods. Computational finance is a compelling application area for reinforcement learning research, where hard sequential decision making problems abound and have great practical significance. In this paper, we investigate reinforcement learning methods, in particular, least squares policy iteration (LSPI), for the problem of learning an exercise policy for American options. We also investigate TVR, another policy iteration method. We compare LSPI, TVR with LSM, the standard least squares Monte Carlo method from the finance community. We evaluate their performance on both real and synthetic data. The results show that the exercise policies discovered by LSPI and TVR gain larger payoffs than those discovered by LSM, on both real and synthetic data. Furthermore, for LSPI, TVR and LSM, policies learned from real data generally gain larger payoffs than policies learned from simulated samples. Our work shows that solution methods developed in reinforcement learning can advance the state of the art in an important and challenging application area, and demonstrates furthermore that computational finance remains an under-explored area for deployment of reinforcement learning methods.




