Q-learning and exploration