Please consider offering answers and suggestions to help other students!
And if you fix a problem by following a suggestion here,
it would be great if other interested students could see a short
"Great, fixed it!" followup message.
Remember that the testing code provided is just a simple tournament boiling it down to a single empirically-measured number.
In this case, what it seems really happened is that the SatisfactoryAgent lucked out and got an anomalously high proportion of games as spy while yours was anomalously low. This is the nature of statistical empirical measures, and we know enough samples should converge on the mean. The probability of your agent losing in a tournament should be very low if it is a good agent.
The tournament code is simpler than the tests we plan to do for assessment. Recall that part of the assignment is you are meant to assess the effectiveness of your agent yourself and report on that.
We will do our best to accurately assess your agent's performance compared to the benchmarks, and if your agent is able to consistently outperform the benchmarks to any statistically significant degree, this should be reflected in the result. As with any statistical system, there is a chance of a false positive or false negative. I expect false negatives in the marking system to be extremely unlikely unless you have made an agent so close in capability to the benchmark that it may not actually be meaningful to say it outperforms it, and hence would not be a false negative, but a true negative.
For example: Even with this random variation, I assume you have never seen the RandomAgent outperform your agent? It is technically possible, but statistically unlikely enough that we should be able to assume we will never see it. You should be aiming to make the best agent you can, and it is definitely possible to be enough better than the SatisfactoryAgent that we would expect to see the same effect. The SatisfactoryAgent is a deliberately still quite low bar so that clearing it by a clear statistically significant margin is easy.
So all that is to say: I expect that if you are losing any appreciable fraction of tournaments due to bad luck, then your agent probably can't be considered to consistently outperform the benchmark. It should be extremely unlikely for a good agent to do so. This is demonstrated by the fact that we have agents that were made as part of preparing this project that we have never seen lose a tournament against any of the benchmarks.
Hope that helps.
Cheers,
Gozz