It's UWAweek 51

help3011

This forum is provided to promote discussion amongst students enrolled in CITS3011 Intelligent Agents.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying the 4 articles in this topic
Showing 4 of 83 articles.
Currently 10 other people reading this forum.


 UWA week 44 (2nd semester, 1st exam week) ↓
SVG not supported

Login to reply

👍?
helpful
12:56am Thu 31st Oct, Mahit G.

I can't seem to understand the difference between q-learning and utility learning clearly, I know that q learning means learning the utility of actions available to a state rather than learning the utility of each state(util learning). But when it comes to its applications, I don't get which one is to be used for any case. Maybe if explained with an example, that might help


SVG not supported

Login to reply

👍?
helpful
12:19pm Thu 31st Oct, Andrew G.

"Mahit Gupta" <23*9*2*5@s*u*e*t*u*a*e*u*a*> wrote:
> I can't seem to understand the difference between q-learning and utility learning clearly, I know that q learning means learning the utility of actions available to a state rather than learning the utility of each state(util learning). > > But when it comes to its applications, I don't get which one is to be used for any case. > Maybe if explained with an example, that might help
As was discussed in the lectures: The upside of Q-learning is that it learns the transition model as part of the (state, action) utilities (where regular learning requires us to either know the transition model or learn it separately). The primary downside is that there will always be more (state, action) pairs than there are states, so Q-learning has to learn many more things to build an effective model. If the transition model is consistent or predictable between states, it can be much more efficient to separately learn the transition model and the state utilities. If the transition model can be arbitrarily different between states, then learning it is as hard as learning the (state, action) utilities anyway. Being "model-free" (not having to have a model or concept of the system ahead of time) means Q-learning is able to adapt to a previously unknown system that we don't have a model for, but for any system where we do have a model, ignoring that model is just going to make it harder to learn. Ignoring the model basically means treating every state you encounter as having unknown, potentially arbitrarily-different rules to anything you have ever seen before. It would be like playing Go, but assuming that any time you place a stone the whole board can become almost anything. There is no way to learn other than to learn the (state, action) utilities, but by making this assumption you are just making it harder to learn anything. So in general we much prefer to have a model for a system, but if we don't, and can't assume anything about the system, then we don't really have a lot of choice.


SVG not supported

Login to reply

👍?
helpful
12:55pm Fri 1st Nov, ANONYMOUS

so in an unknown environment, which one should be used? Q-learning can just learn state-action pairs, but utility learning can just learn the transition model then use it to calculate state utilities. I'm having difficulty finding pros and cons of using one over the other (other than computation).


SVG not supported

Login to reply

👍?
helpful
6:35pm Fri 1st Nov, Andrew G.

ANONYMOUS wrote:
> so in an unknown environment, which one should be used? > Q-learning can just learn state-action pairs, but utility learning can just learn the transition model then use it to calculate state utilities. I'm having difficulty finding pros and cons of using one over the other (other than computation).
There is not a clear answer, since as I say: In a system with no known model where we are not able to assume anything about the structure of the system, they become basically equivalent. This information is covered in the lectures. In general you are expected to have a sufficient understanding of the logic and properties of these techniques that you should be able to assess a novel situation and determine for yourself the merits or disadvantages to each approach (or, indeed, when they are basically equivalent).

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  8:08AM Aug 25 2024
Privacy policy