A bounded actor-critic algorithm for reinforcement learning