Understanding the limitations of deep Q-learning