参考文献_统计策略搜索强化学习方法及应用-QQ阅读男生轻小说网

上QQ阅读APP看本书，新人免费读10天

设备和账号都新为新人

参考文献

[1] Schacter,D.,Gilbert,D.,Wegner,D.,et al.Psychology:European Edition[J].Worth Publishers,2011.

[2] Mitchell,T.M..The Discipline of Machine Learning[R].Technical Report CMU ML-06108,2006.

[3] Murphy,K.P..Machine Learning:A Probabilistic Perspective[M].MIT Press,Cambridge,MA,2012.

[4] Bishop,C.M..Pattern Recognition and Machine Learning (Information Science and Statistics)[M].Secaucus,NJ,USA:Springer-Verlag New York,Inc.,2006.

[5] Sutton,R.S.,Ba Rto,A.G..Reinforcement Learning:An Introduction[J].IEEE Transactions on Neural Networks,1998,9(5):1054.

[6] Kaelbling,L.P.,Littman,M.L.and Moore,A.W..Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996,4:237-285.

[7] Poole,D.,Mackworth,A.K..Artificial Intelligence:Foundations of Computational Agents[M].Cambridge University Press,2010.

[8] Kirk,D.E..Optimal Control Theory:An Introduction[J].Positively Aware the Monthly Journal of the Test Positive Aware Network,2004,23(2):13-5.

[9] Bertsekas,D.P..Dynamic Programming and Optimal Control:2nd Edition[J].Athena Scientific,1995.

[10] Sutton,R.S.,Barto,A.G.,and Williams,R.J..Reinforcement Learning is Direct Adaptive Optimal Control[J].IEEE Control Systems Magazine,1992,12(2):19-22.

[11] Busoniu,L.R.,Babuška,R.,Schutter,B.D.,et al.Reinforcement Learning and Dynamic Programming Using Function Approximators[M].CRC Press,Inc,2010.

[12]陈春林.基于强化学习的移动机器人自主学习及导航控制[D].合肥：中国科学技术大学，2006.

[13] Peters,J.,Schaal,S..Policy gradient methods for robotics[C].In Proceedings of the IEEE/RSJ International Conferece on Intelligent Robots and Systems,2006:2219-2225.

[14] Tesauro,G..TD-Gammon,a Self-Teaching Backgammon Program,Achieves Master-Level Play[J].Neural Computation,1944,6(2):215-219.

[15] Abe,N.,Kowalczyk,M.,Domick,M.,et al.Optimizing Debt Collections Using Constrained Reinforcement Learning[C].16th ACM SGKDD Conference on Knowledge Discovery and Data Mining,2010:75.

[16] Williams,J.D.,Young,S..Partially Observable Markov Decision Processes for Spoken Dialog Systems[J].Computer Speech and Language,2007,21(2):393-422.

[17]李琼，郭御风，蒋艳凰.基于强化学习的智能 I/O 调度算法[J].计算机工程与科学，2010（7）：58-61.

[18]张水平.在策略强化学习算法在互联电网 AGC 最优控制中的应用[D].广州：华南理工大学，2013.

[19]刘智勇，马凤伟.城市交通信号的在线强化学习控制[C].第26届中国控制会议，2007.

[20]祖丽楠.多机器人系统自主协作控制与强化学习研究[D].长春：吉林大学，2007.

[21]陈鑫，魏海军，吴敏，等.基于高斯回归的连续空间多智能体跟踪学习[J].自动化学报，2013，39（012）：2021-2031.

[22] Lee,D.,Choi,M.,and Bang,H..Model-Free Linear Quadratic Tracking Control for Unmanned Helicopters Using Reinforcement Learning[C].5th International Conference on Automation,Robotics and Applications (ICARA),2011.

[23] Valasek,J.,Doebbler,J.,Tandale,M.D.,et al.Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles[J].IEEE Transactions on Systems Man and Cybernetics Part B,2008,38(4):1014-1020.

[24] Crespo,A.,Li,W.,and Timoszczuk,A.P..ATFM Computational Agent Based on Reinforcement Learning Aggregating Human Expert Experience[C].Integrated and Sustainable Transportation System IEEE,2011.

[25] Xie,N.,Hachiya,H.,and Sugiyama,M..Artist Agent:A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting[J].IEICE Transactions on Information and Systems,2012,E96D(5).

[26] Silver,D.,Huang,A.,Maddison,C.J.,et al.Mastering the Game of Go with Deep Neural Networks and Tree Search[J].Nature,2016,529 (7587):484-489.

[27] Thrun,S.,Burgard,W.,and Fox,D..Probabilistic Robotics (Intelligent Robotics and Autonomous Agents)[M].The MIT Press,2005.

[28] Kober,J.,Bagnell,J.A.,and Peters,J..Reinforcement Learning in Robotics:A Survey[J].International Journal of Robotics Research,2013.

[29] Deisenroth,M.P.,Neumann,G.,and Peters,J.R..A Survey on Policy Search for Robotics[J].Foundations and Trends in Robotics,2013,2(1-2):1-142,.

[30] Cheng,G.,Hyon,S.H.,Morimoto,J.,et al.CB:A Humanoid Research Platform for Exploring NeuroScience[J].Advanced Robotics,2007,21(10):1097-1114..

[31] Watkins,C.,Dayan,P..Q-learning[J].Machine Learning,1992,8(3-4):279-292.

[32] Sutton,R.S..Learning to Predict by the Methods of Temporal Differences[J].Machine Learning,1988,3(1):9-44.

[33] Rummery,G.A.,Niranjan,M..On-Line Q-Learning Using Connectionist Systems[J].Technical Report,1994.

[34]高阳，陈世福，陆鑫.强化学习研究综述[J].自动化学报，2004，30（001）：86-100.

[35]蒋国飞，高慧琪，吴沧浦.Q 学习算法中网格离散化方法的收敛性分析[J].控制理论与应用，1999，16（002）：194-198.

[36]蒋国飞，吴沧浦.基于 Q 学习算法和 BP 神经网络的倒立摆控制[J].自动化学报，1998，24（005）：662-666.

[37] Lagoudakis,M.G.,Parr,R..Least-Squares Policy Iteration[J].Journal of Machine Learning Research,2003,4(6):1107-1149.

[38]陈兴国.基于值函数估计的强化学习算法研究[D].南京：南京大学，2013.

[39] Sugiyama,M.,Hachiya,H.,Towell,C.,et al.Geodesic Gaussian Kernels for Value Function Approximation[J].Autonomous Robots,2008,25(3):287-304.

[40] Hachiya,H.,Akiyama,T.,Sugiayma,M.,et al.Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning[J].Neural Networks,2009,22(10):1399-1410.

[41] Akiyama,T.,Hachiya,H.,Sugiyama,M..Efficient exploration through active learning for value function approximation in reinforcement learning[J].Neural Networks,23(5):639-648,2010.

[42] Sugiyama,M.,Hachiya,H.,Kashima,H.,et al.Least Absolute Policy Iteration--A Robust Approach to Value Function Approximation[J].IEICE Transactions on Information and Systems,2010,93(9):2555-2565.

[43] S Schaal,S.,Peters,J.,Nakanishi,J.,et al.Learning Movement Primitives[J].Springer Tracts in Advanced Robotics.Ciena,Italy:Springer,2004.

[44] Bagnell,J.A.,Schneider,J.G..Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods[C].IEEE International Conference on Robotics and Automation,2001.

[45] Kober,J.,Peters,J..Policy Search for Motor Primitives in Robotics[J].Machine Learning,2011,84(1):171-203.

[46] Ng,A.Y.,Kim,H.J.,Jordan,M.I.,et al.Autonomous Helicopter Flight Via Reinforcement Learning[J].Advances in Neural Information Processing Systems,2004,16.

[47] Ng,Y.,Jordan,M..PEGASUS:A policy search method for large MDPs and POMDPs[C].In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence,2000,406-415.

[48] Sehnke,F.,Osendorfer,C.,Thomas Rückstie,et al.Parameter-exploring policy gradients[J].Neural Networks,2010,23(4):551-559.

[49] Williams,R.J..Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning[J].Machine Learning,1992,8(3-4):229-256.

[50] Kakade,S..A Natural Policy Gradient[J].Advances in Neural Information Processing Systems(NIPS),2002.

[51] Dayan,Peter,Hinton,et al.Using Expectation-maximization for Reinforcement Learning[J].Neural Computation,1997,9(2):271-278

[52] Peters,J.,Schaal,S..Natural Actor-Critic[J].Neurocomputing,2008,71(7-9):1180-1190.

[53] Barto,A.G.,Mahadevan,S..Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems,2003,13(1-2):341-379.

[54]周文吉，俞扬.分层强化学习综述[J].智能系统学报，2017，12（5）：590-594.

[55]杜威，丁世飞.多智能体强化学习综述[J].计算机科学，2019，46（8）：1-8.

[56]刘全，翟建伟，章宗长，等.深度强化学习综述[J].计算机学报，2018，041（1）：1-27.

[57]赵冬斌，邵坤，朱圆恒，等.深度强化学习综述：兼论计算机围棋的发展[J].控制理论与应用，2016，33（6）：701-717.

[58] Osa,T.,Pajarinen,J.,Neumann,G.,et al.An Algorithmic Perspective on Imitation Learning[J].Foundations and Trends in Robotics,2018,7(1-2):1-179.

[59] Sermanet,P.,Xu,K.,and Levine,S..Unsupervised Perceptual Rewards for Imitation Learning[J].arXiv preprint arXiv:1612.06699,2016.

[60] Maeda,G.J.,Neumann,G.,Ewerton,M.,et al.Probabilistic Movement Primitives for Coordination of Multiple Human-robot Collaborative Tasks[J].Autonomous Robots,2017,41(3):593-612.

[61]张凯峰，俞扬.基于逆强化学习的示教学习方法综述[J].计算机研究与发展，2019，56（2）：254-261.

[62]李帅龙，张会文，周维佳.模仿学习方法综述及其在机器人领域的应用[J].计算机工程与应用，2019，55（04）：22-35.

[63] Pan,S.J.,Yang,Q..A Survey on Transfer Learning[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(10):1345-1359.

[64]王皓，高阳，陈兴国.强化学习中的迁移：方法和进展[J].电子学报，2008，36（S1）：39-43.

[65] Finn,C.,Abbeel,and P.,Levine,S..Model-agnostic Meta-learning for Fast Adaptation of Deep Networks[C].In Proceedings of the 34th International Conference on Machine Learning,2017:1126-1135.

[66] Todorov,E.,Erez,T.and Tassa,Y..MuJoCo:A Physics Engine for Model-based Control[C],2012 IEEE/RSJ International Conference on Intelligent Robots and Systems,2012,5026-5033.