It's UWAweek 51

help3011

This forum is provided to promote discussion amongst students enrolled in CITS3011 Intelligent Agents.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying selected article
Showing 1 of 83 articles.
Currently 14 other people reading this forum.


 UWA week 40 (2nd semester, week 10) ↓
SVG not supported

Login to reply

👍x1
helpful

ANONYMOUS wrote:
> Hi, > > I was wondering what is meant by "consistently outperforms" when marking? > > For example, my current (not final) agent beats the satisfactory agent consistently every time over the long term when I run tournaments of size 10,000 or 100,000, but in the default tournament of 1000 games it sometimes loses by a narrow margin. > > Would this be considered consistently outperforming the other agent, and is there some acceptable amount of times it can lose to the other agent by chance? > > Thanks
We do intend to run more samples when assessing your submissions, but 1000 games is still a lot, so if your agent is not consistently winning with 1000 games, it is probably a close enough thing that it's not "consistent". So submitting this agent would probably be gambling that it is good enough that on the samples we take it outperforms the benchmark. You should definitely be aiming higher, especially since SatisfactoryAgent is not a high benchmark. It is intended that to earn the marks you should be consistently and easily outperforming it (after all, there is another hidden benchmark tier above that you should be trying to beat as well!). The purpose of taking a thousand or more samples is to counteract the effects of "bad luck" and losing "by chance". Your agent should be good enough that a thousand games should be sufficient to prove it is better than another agent. If after a thousand games your win rate is basically the same, you cannot claim to be consistently outperforming the benchmark agent. Gozz

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  8:08AM Aug 25 2024
Privacy policy