It's UWAweek 51

help3011

This forum is provided to promote discussion amongst students enrolled in CITS3011 Intelligent Agents.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying selected article
Showing 1 of 83 articles.
Currently 11 other people reading this forum.


 UWA week 44 (2nd semester, 1st exam week) ↓
SVG not supported

Login to reply

👍?
helpful
12:19pm Thu 31st Oct, Andrew G.

"Mahit Gupta" <23*9*2*5@s*u*e*t*u*a*e*u*a*> wrote:
> I can't seem to understand the difference between q-learning and utility learning clearly, I know that q learning means learning the utility of actions available to a state rather than learning the utility of each state(util learning). > > But when it comes to its applications, I don't get which one is to be used for any case. > Maybe if explained with an example, that might help
As was discussed in the lectures: The upside of Q-learning is that it learns the transition model as part of the (state, action) utilities (where regular learning requires us to either know the transition model or learn it separately). The primary downside is that there will always be more (state, action) pairs than there are states, so Q-learning has to learn many more things to build an effective model. If the transition model is consistent or predictable between states, it can be much more efficient to separately learn the transition model and the state utilities. If the transition model can be arbitrarily different between states, then learning it is as hard as learning the (state, action) utilities anyway. Being "model-free" (not having to have a model or concept of the system ahead of time) means Q-learning is able to adapt to a previously unknown system that we don't have a model for, but for any system where we do have a model, ignoring that model is just going to make it harder to learn. Ignoring the model basically means treating every state you encounter as having unknown, potentially arbitrarily-different rules to anything you have ever seen before. It would be like playing Go, but assuming that any time you place a stone the whole board can become almost anything. There is no way to learn other than to learn the (state, action) utilities, but by making this assumption you are just making it harder to learn anything. So in general we much prefer to have a model for a system, but if we don't, and can't assume anything about the system, then we don't really have a lot of choice.

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  8:08AM Aug 25 2024
Privacy policy