您的当前位置:首页正文

Scaling reinforcement learning algorithms by learning variable temporal resolution models

2020-07-15 来源:欧得旅游网
DisturbancesEnvironment(System)StatePayoffActionAgent(Controller)DisturbancesEnvironment(System)RewardCostStateRewardRewardRewardTnT3T2T1ActionPayoffAgent(Controller)ba0s0XAbstractmodel1xC2(s0,X)M-2La0a0M-1La1a1M-1Lak-1ak-1M-1s0s1C(s a )00s2C(s a )11sk-1C(s a )k-1k-1sk= xΣ0ΣR(i)ki=1k

因篇幅问题不能全部显示,请点此查看更多更全内容