It's UWAweek 19

help3001

This forum is provided to promote discussion amongst students enrolled in CITS3001 Algorithms, Agents and Artificial Intelligence.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying selected article
Showing 1 of 292 articles.
Currently 5 other people reading this forum.


SVG not supported

Login to reply

👍?
helpful
10:04am Wed 13th Sep, Joshua N.

ANONYMOUS wrote:
> With policy iteration, my understanding is that: > - We compute the utilities of a policy > - Then we compute a new policy according to the utilities we just calculated > - We repeat this until the policy converges > > What I don't understand is why does the policy improve? If we start with policy P and determine the set of utilities U for each state, then using U wouldn't we just get back the same policy P?
Hi Anon, With policy iteration, - We use value determination to determine the agent's corresponding utilities if it follows a policy. - We then use action determination to determine the optimal policy given a utility for each state. - We then switch to this new policy and repeat the process until the policy converges. The reason why it improves is because, initially we choose an arbitrary policy without any knowledge of utilities. Afterwards, we compute the utilities of following the initial policy and we can now use action determination to determine the best move for each state. We can then update our policy to the new and more optimal policy. Then we repeat the process to see if we can improve the policy further. I hope that helps.

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  8:08AM Aug 25 2024
Privacy policy