Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from Cary (2021-06-15) (Long and not urgent, so you don't have to read it) #91

Open
carykh opened this issue Jun 16, 2021 · 21 comments

Comments

@carykh
Copy link
Owner

carykh commented Jun 16, 2021

Hi everyone! I'm still running as many full 1477x1477 matrix runs per night as possible. Here's some quick facts:

  • The original plan was to do 1 full-matrix run, but people were concerned about the effect of randomness, so we agreed I should do 100 runs and then average the results.
  • 1,468 of the 1,615 entries passed all sanity checks, plus the 9 defaults. That gets us to the competitor count of 1477.
  • Each game is ~250 moves on average, but I'm also not going to double-count (e.g. A+B and B+A). So, that's roughly 1477x1477x250x2/2 = 504 million function calls per full-matrix run.

Although I did implement multiprocessing, running 10 full-matrix runs actually uses up the CPU for 24 hrs 14 min, which means I can't use my computer to do other stuff. So lately, I've been limiting it to 3 full-matrix runs a night! With that, I've finished 25 full-matrix runs total. I told people I'd do 100 runs and average them, but recently, I've been thinking 30 should just be enough. Here's why:

  • the effect of randomness is proportional to the inverse square root of N, so going from 30 to 100 doesn't even halve it
  • I don't want to make you guys wait for another month.
  • The top 1 is clearly separable from the pack, but 2 and 3 are still close enough to swap, unfortunately
  • people generally say the Central Limit Theorem kicks in at n=30 (making margins of error easier to reason about)

I'm aware people have proposed other methods to speed up this process, such as caching repetitive games, cloud computing, and similar-strategy-detection. I tried implementing some of these, but none of them worked as well as I'd hoped. I could discuss why they didn't work if people want! But TL;DR if it takes 2 days to implement, but only saves you 1 day of runtime, it's not worth it lol

TL;DR: Are you guys cool with me averaging the results of 30 runs, and posting the results ~2 days from now, versus averaging 100 results about 25 days from now?

@nekiwo
Copy link

nekiwo commented Jun 16, 2021

Yeah I think 30 is enough, I don't think the results at 100 runs will be groundbreaking

@aaaa-trsh
Copy link

yup!

@Xapakas
Copy link

Xapakas commented Jun 16, 2021

Thanks for the update, and for all the effort you've put into this. I imagine 30 runs is enough to smooth out randomness and provide useful results, especially with such a large competitor pool. If you're concerned about crowning the 2nd and 3rd place winners correctly, maybe you could make the "quick-n-easy" video after 30 runs, and then run some more before the polished video comes out and the prize money is distributed?

@arielsysdev
Copy link

Yeah 30 runs sounds about right. No need to rush though if you need more time

@Kinkelin
Copy link

30 runs sounds good! You could maybe publish the min, max and median scores as well, so people could see how close it actually is.

@donno2048
Copy link

30 runs will be fine

@CathGorm
Copy link

If you're concerned about randomness determining the top 3 spots: you could take, say, the top 15 that are currently in the running to win and run only them against all other entries. This would allow you to do 50 times more runs than the full 1468*1468 matrix.

@Barigamb738
Copy link

I don't really have any interest in having more rounds since my strategy wouldn't win even if the effect of randomness was out of the picture. So i am fine with 30.

(P. S. I just realized how selfish that sounded, like i only care about my own strategy. But i didn't mean it in that way. Sorry to anyone who fealt offended by this.)

@carykh
Copy link
Owner Author

carykh commented Jun 16, 2021

Thanks for the feedback, everybody! I'll go with 30 rounds then, so the calculations should be done some time tomorrow (2021-06-17).

Also, someone notified me of a Discord screenshot claiming that the results have already been revealed, and I just want to clarify that it is fake. As of right now (2021-06-16 12:14 PM PST), no results have been revealed yet!

If you're concerned about randomness determining the top 3 spots: you could take, say, the top 15 that are currently in the running to win and run only them against all other entries. This would allow you to do 50 times more runs than the full 1468*1468 matrix.

Also, CathGorm, that's a smart idea! In fact, there's probably only 10 or so contenders that have a possibility of being in the top 3 anyway, so I could run them many extra times at a much faster speed.

@redtachyon2098
Copy link

Wait, so the results are almost out??? Excited noises

@EFHIII
Copy link

EFHIII commented Jun 17, 2021

Also, CathGorm, that's a smart idea! In fact, there's probably only 10 or so contenders that have a possibility of being in the top 3 anyway, so I could run them many extra times at a much faster speed.

I bet you that if you take the top 10 contenders and run them against each other, it'd be a 10 way tie, with every match being always cooperate on both sides.

Somewhat counter-intuitively, the bad strategies are what determine the winner in this kind of competition, since chances are very high that the top several strategies (and with the pool size here, the number might be in the hundreds, but it's hard to speculate) won't defect first, which means that putting those "nice" strategies against other "nice" strategies always gives that same cooperate scenario.

The winner is determined by who does the best on average against all the not-nice strategies, which might be a small part of the actual pool of strategies. In a pool of only Nice strategies and random strategies, Grim Trigger would win, since Grim Trigger is the optimal Nice strategy against random. In a pool of only Nice strategies and Joss, Always Cooperate would win. In a mix, it just comes down to the strategy that's best optimized for that particular mix.

By removing the bad strategies, you remove the thing that determines the winner.

@WillemPaternotte
Copy link

WillemPaternotte commented Jun 17, 2021

By removing the bad strategies, you remove the thing that determines the winner.

I think the plan is to run the top strategies against all other strategies, but having the non top strats not compete against each other. You'll end up with a 10x1400 matrix instead of a 1400x1400 matrix.

@EFHIII
Copy link

EFHIII commented Jun 17, 2021

You'll end up with a 10x1400 matrix instead of a 1400x1400 matrix.

That would work; I misread it at first which made me concerned; my bad.

@carykh
Copy link
Owner Author

carykh commented Jun 18, 2021

Hey guys, I've finished calculating all the results! So I'll make the quick-n-easy results reveal in the next 6 hours or so.

Also, WillemPaternotte is correct. I needed to fine-tune the results of the top performers, because 2nd and 3rd scored close enough that they could flip.

So, I had the 1400x1400 matrix run 30 times, and the 3x1400 matrix (the top 3 vs. everyone else) run 200 times. It turns out 4th place was already >20 sigma below 3rd place, and took a while to run. So only the top 3 were necessary "fine-tune", and now that's done too!

@arielsysdev
Copy link

Can't wait! :D

@donno2048
Copy link

Could you post here a link to the video when it's up?

@duckboycool
Copy link

Update: The video has been recorded now.

@carykh
Copy link
Owner Author

carykh commented Jun 18, 2021

The video's out now! Unforutnately the results still can't be finalized yet, though: https://www.youtube.com/watch?v=yz7Db_yugfc

@redtachyon2098
Copy link

redtachyon2098 commented Jun 18, 2021 via email

@karapapas
Copy link

karapapas commented Jun 19, 2021

@carykh Hey Cary! Have you considered making a webapp of this project? I know it was part of a course, so you might don't want to dedicate more time to this, but I was thinking that it would be interesting.

  • People could improve their strategies and test against the others,
  • Everyone could have an account with a single or multiple strategies,
  • With options such as to reveal their code or not,
  • With automated cheating scanning methods for the submitted strategies that would improve overtime,
  • Dashboards with statistics, that would also be enriched by the time.

Of course this would need some kind of donation of computing resources from some cloud service, and some maintainers, but you never know, someone might be interested in helping with that.

Just a thought.

@duckboycool
Copy link

@carykh Hey Cary! Have you considered making a webapp of this project? I know it was part of a course, so you might don't want to dedicate more time to this, but I was thinking that it would be interesting.

  • People could improve their strategies and test against the others,
  • Everyone could have an account with a single or multiple strategies,
  • With options such as to reveal their code or not,
  • With automated cheating scanning methods for the submitted strategies that would improve overtime,
  • Dashboards with statistics, that would also be enriched by the time.

Of course this would need some kind of donation of computing resources from some cloud service, and some maintainers, but you never know, someone might be interested in helping with that.

Just a thought.

I think that that would be a lot of effort for a simple single time tournament, but Enjoyer's repo hosts its results on a GitHub pages with a GUI to help compare strats. It's definitely not on quite that scale, but it might be closer to what you're looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests