"Mahit Gupta" <23*9*2*
5@s*u*e*t*u*a*e*u*a*> wrote:
> I can't seem to understand the difference between q-learning and utility learning clearly, I know that q learning means learning the utility of actions available to a state rather than learning the utility of each state(util learning).
>
> But when it comes to its applications, I don't get which one is to be used for any case.
> Maybe if explained with an example, that might help
As was discussed in the lectures:
The upside of Q-learning is that it learns the transition model as part of the (state, action) utilities (where regular learning requires us to either know the transition model or learn it separately).
The primary downside is that there will always be more (state, action) pairs than there are states, so Q-learning has to learn many more things to build an effective model.
If the transition model is consistent or predictable between states, it can be much more efficient to separately learn the transition model and the state utilities.
If the transition model can be arbitrarily different between states, then learning it is as hard as learning the (state, action) utilities anyway.
Being "model-free" (not having to have a model or concept of the system ahead of time) means Q-learning is able to adapt to a previously unknown system that we don't have a model for, but for any system where we do have a model, ignoring that model is just going to make it harder to learn.
Ignoring the model basically means treating every state you encounter as having unknown, potentially arbitrarily-different rules to anything you have ever seen before.
It would be like playing Go, but assuming that any time you place a stone the whole board can become almost anything.
There is no way to learn other than to learn the (state, action) utilities, but by making this assumption you are just making it harder to learn anything.
So in general we much prefer to have a model for a system, but if we don't, and can't assume anything about the system, then we don't really have a lot of choice.