Scaling reinforcement learning algorithms by learning variable temporal resolution models
2020-07-15
来源:欧得旅游网
DisturbancesEnvironment(System)StatePayoffActionAgent(Controller)DisturbancesEnvironment(System)RewardCostStateRewardRewardRewardTnT3T2T1ActionPayoffAgent(Controller)ba0s0XAbstractmodel1xC2(s0,X)M-2La0a0M-1La1a1M-1Lak-1ak-1M-1s0s1C(s a )00s2C(s a )11sk-1C(s a )k-1k-1sk= xΣ0ΣR(i)ki=1k