mpo maxmpomaxwin © 2025. All rhts reserved | 18+.We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy