-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update from Cary (2021-06-15) (Long and not urgent, so you don't have to read it) #91
Comments
Yeah I think 30 is enough, I don't think the results at 100 runs will be groundbreaking |
yup! |
Thanks for the update, and for all the effort you've put into this. I imagine 30 runs is enough to smooth out randomness and provide useful results, especially with such a large competitor pool. If you're concerned about crowning the 2nd and 3rd place winners correctly, maybe you could make the "quick-n-easy" video after 30 runs, and then run some more before the polished video comes out and the prize money is distributed? |
Yeah 30 runs sounds about right. No need to rush though if you need more time |
30 runs sounds good! You could maybe publish the min, max and median scores as well, so people could see how close it actually is. |
30 runs will be fine |
If you're concerned about randomness determining the top 3 spots: you could take, say, the top 15 that are currently in the running to win and run only them against all other entries. This would allow you to do 50 times more runs than the full 1468*1468 matrix. |
I don't really have any interest in having more rounds since my strategy wouldn't win even if the effect of randomness was out of the picture. So i am fine with 30. (P. S. I just realized how selfish that sounded, like i only care about my own strategy. But i didn't mean it in that way. Sorry to anyone who fealt offended by this.) |
Thanks for the feedback, everybody! I'll go with 30 rounds then, so the calculations should be done some time tomorrow (2021-06-17). Also, someone notified me of a Discord screenshot claiming that the results have already been revealed, and I just want to clarify that it is fake. As of right now (2021-06-16 12:14 PM PST), no results have been revealed yet!
Also, CathGorm, that's a smart idea! In fact, there's probably only 10 or so contenders that have a possibility of being in the top 3 anyway, so I could run them many extra times at a much faster speed. |
Wait, so the results are almost out??? Excited noises |
I bet you that if you take the top 10 contenders and run them against each other, it'd be a 10 way tie, with every match being always cooperate on both sides. Somewhat counter-intuitively, the bad strategies are what determine the winner in this kind of competition, since chances are very high that the top several strategies (and with the pool size here, the number might be in the hundreds, but it's hard to speculate) won't defect first, which means that putting those "nice" strategies against other "nice" strategies always gives that same cooperate scenario. The winner is determined by who does the best on average against all the not-nice strategies, which might be a small part of the actual pool of strategies. In a pool of only Nice strategies and random strategies, Grim Trigger would win, since Grim Trigger is the optimal Nice strategy against random. In a pool of only Nice strategies and Joss, Always Cooperate would win. In a mix, it just comes down to the strategy that's best optimized for that particular mix. By removing the bad strategies, you remove the thing that determines the winner. |
I think the plan is to run the top strategies against all other strategies, but having the non top strats not compete against each other. You'll end up with a 10x1400 matrix instead of a 1400x1400 matrix. |
That would work; I misread it at first which made me concerned; my bad. |
Hey guys, I've finished calculating all the results! So I'll make the quick-n-easy results reveal in the next 6 hours or so. Also, WillemPaternotte is correct. I needed to fine-tune the results of the top performers, because 2nd and 3rd scored close enough that they could flip. So, I had the 1400x1400 matrix run 30 times, and the 3x1400 matrix (the top 3 vs. everyone else) run 200 times. It turns out 4th place was already >20 sigma below 3rd place, and took a while to run. So only the top 3 were necessary "fine-tune", and now that's done too! |
Can't wait! :D |
Could you post here a link to the video when it's up? |
Update: The video has been recorded now. |
The video's out now! Unforutnately the results still can't be finalized yet, though: https://www.youtube.com/watch?v=yz7Db_yugfc |
So, when will the definitive results be announced?
|
@carykh Hey Cary! Have you considered making a webapp of this project? I know it was part of a course, so you might don't want to dedicate more time to this, but I was thinking that it would be interesting.
Of course this would need some kind of donation of computing resources from some cloud service, and some maintainers, but you never know, someone might be interested in helping with that. Just a thought. |
I think that that would be a lot of effort for a simple single time tournament, but Enjoyer's repo hosts its results on a GitHub pages with a GUI to help compare strats. It's definitely not on quite that scale, but it might be closer to what you're looking for. |
Hi everyone! I'm still running as many full 1477x1477 matrix runs per night as possible. Here's some quick facts:
Although I did implement multiprocessing, running 10 full-matrix runs actually uses up the CPU for 24 hrs 14 min, which means I can't use my computer to do other stuff. So lately, I've been limiting it to 3 full-matrix runs a night! With that, I've finished 25 full-matrix runs total. I told people I'd do 100 runs and average them, but recently, I've been thinking 30 should just be enough. Here's why:
I'm aware people have proposed other methods to speed up this process, such as caching repetitive games, cloud computing, and similar-strategy-detection. I tried implementing some of these, but none of them worked as well as I'd hoped. I could discuss why they didn't work if people want! But TL;DR if it takes 2 days to implement, but only saves you 1 day of runtime, it's not worth it lol
TL;DR: Are you guys cool with me averaging the results of 30 runs, and posting the results ~2 days from now, versus averaging 100 results about 25 days from now?
The text was updated successfully, but these errors were encountered: