실습 - Q-learning exploit&exploration and discounted reward