References

Agarwal, Alekh, Nan Jiang, Sham M Kakade, and Wen Sun. 2022. Reinforcement Learning: Theory and Algorithms. https://rltheorybook.github.io/rltheorybook_AJKS.pdf.
Allaire, J. J., Charles Teague, Carlos Scheidegger, Yihui Xie, and Christophe Dervieux. 2024. “Quarto.” https://doi.org/10.5281/zenodo.5960048.
Baydin, Atilim Gunes, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2018. “Automatic Differentiation in Machine Learning: A Survey.” February 5, 2018. https://doi.org/10.48550/arXiv.1502.05767.
Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press. https://web.stanford.edu/~boyd/cvxbook/.
Deng, Li. 2012. “The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web].” IEEE Signal Processing Magazine 29 (6): 141–42. https://doi.org/10.1109/MSP.2012.2211477.
Frans Berkelaar. 2009. Container Ship MSC Davos - Westerschelde - Zeeland. https://www.flickr.com/photos/28169156@N03/52957948820/.
GPA Photo Archive. 2017. Robotic Arm. https://www.flickr.com/photos/iip-photo-archive/36123310136/.
Guy, Romain. 2006. Chess. https://www.flickr.com/photos/romainguy/230416692/.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2013. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media. https://books.google.com?id=yPfZBwAAQBAJ.
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor. 2023. An Introduction to Statistical Learning: With Applications in Python. Springer Texts in Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-38747-0.
Lai, T. L, and Herbert Robbins. 1985. “Asymptotically Efficient Adaptive Allocation Rules.” Advances in Applied Mathematics 6 (1): 4–22. https://doi.org/10.1016/0196-8858(85)90002-8.
Nielsen, Michael A. 2015. Neural Networks and Deep Learning. Determination Press. http://neuralnetworksanddeeplearning.com/.
Ross, Stéphane, Geoffrey J. Gordon, and J. Bagnell. 2010. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.” In. https://www.semanticscholar.org/paper/A-Reduction-of-Imitation-Learning-and-Structured-to-Ross-Gordon/79ab3c49903ec8cb339437ccf5cf998607fc313e.
Russell, Stuart J., and Peter Norvig. 2021. Artificial Intelligence: A Modern Approach. Fourth edition. Pearson Series in Artificial Intelligence. Hoboken: Pearson.
Schrittwieser, Julian, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, et al. 2020. “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model.” Nature 588 (7839, 7839): 604–9. https://doi.org/10.1038/s41586-020-03051-4.
Schulman, John, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. “Proximal Policy Optimization Algorithms.” August 28, 2017. https://doi.org/10.48550/arXiv.1707.06347.
Silver, David, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, et al. 2016. “Mastering the Game of Go with Deep Neural Networks and Tree Search.” Nature 529 (7587, 7587): 484–89. https://doi.org/10.1038/nature16961.
Silver, David, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, et al. 2018. “A General Reinforcement Learning Algorithm That Masters Chess, Shogi, and Go Through Self-Play.” Science 362 (6419): 1140–44. https://doi.org/10.1126/science.aar6404.
Silver, David, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, et al. 2017. “Mastering the Game of Go Without Human Knowledge.” Nature 550 (7676, 7676): 354–59. https://doi.org/10.1038/nature24270.
Sussman, Gerald Jay, Jack Wisdom, and Will Farr. 2013. Functional Differential Geometry. Cambridge, MA: The MIT Press.
Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. Second edition. Adaptive Computation and Machine Learning Series. Cambridge, Massachusetts: The MIT Press. http://incompleteideas.net/book/RLbook2020trimmed.pdf.
Vershynin, Roman. 2018. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press. https://books.google.com?id=NDdqDwAAQBAJ.
Williams, Ronald J. 1992. “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning.” Machine Learning 8 (3): 229–56. https://doi.org/10.1007/BF00992696.