It's UWAweek 19

help3001

This forum is provided to promote discussion amongst students enrolled in CITS3001 Algorithms, Agents and Artificial Intelligence.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying selected article
Showing 1 of 292 articles.
Currently 4 other people reading this forum.


SVG not supported

Login to reply

👍?
helpful
10:02am Wed 13th Sep, ANONYMOUS

My understanding is that even if we use the utilities of the original policy, our policy choices will still change as there is more information available For example, if you started with a 2x2 grid with a step cost of 1 and a bottom right terminal of plus +10 and the original policy was to always move down, we would start with a policy of: D D D - and utilities of -1 9 0 +10 Then in the second iteration, our policy would change based on the new utilities that we have for each state, at each iterative step, you check each state to see if there is a possible move that would now result in a higher utility, so the policy becomes: -> D -> - and utilities of 8 9 9 +10
>From here, since this is the best move that each state can make, this is the best policy and the iteration is complete. Even though these are the correct final utlities in this case, they don't have to be for policy iteration to be complete, it's just at the step where none of the actions in the policy change.
This is how I understand policy iteration, hopefully someone can confirm if this is correct

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  8:08AM Aug 25 2024
Privacy policy