Learning in a deterministic environment with policy iteration