-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strategy discussion #36
Comments
Interesting thread, I’ll have to watch and see what people post |
I'm personally interested in how well any sort of machine learning algorithm works. I have to assume that it would just work as a worse detective, because it would take longer to have any accuracy. It would also probably be beaten by all the other "nice" algorithms since it has the potential to lose massive points against any algorithms with a grim trigger if it defects at any point during training. A text I saw online regarding this showed that it basically just learnt the same thing as TFT, which makes sense. But regardless, I'm still interested in if anyone has managed one that's remotely successful. |
It acts like tft, but beats it by 0.1 - 0.2 points |
From my experience so far there isn't just one best strategy it all depends on the environment.
In short try to be nice and do your best to optimize for different rude strategies you can encounter. Not trying to win against them (as they have very small chances of winning overall), but trying to get them to cooperate as much as you can. |
On the subject of machine learning, here’s a very interesting paper on evolved algorithms for iterated prisoner’s dilemma: https://www.bc.edu/content/dam/files/schools/cas_sites/cs/pdf/academics/honors/06DanielScali.pdf |
I think you’re incorrectly responding to tit for tat. I think you’re playing a detective, so if you add a case for if tit for tat happens in your detecting round you could be able to switch to a strategy which earns you points |
thanks |
also my strat is very different form the detective i just named it that cuz it inspects the opponent. |
@nobody5050 i checked the rounds and there was nothing in there that would be bad. |
TL;DR: "Nice" strategies have an advantage unless "mean" strategies are good at inferring their opponent's patterns and detecting opponents against whom defection has positive long-term net expected value (like
Counterpoint: unprovoked defection is heavily penalized and not worth it, unless you can farm it for intelligence. In the base meta (with the 9 defaults), let's discuss the optimal unprovoked defector's behavior and how it would play out, ignoring any attempts to predict when the game is close to ending: grimTrigger: the optimal unprovoked defector would defect (D/C) for +2, then find itself in a D/D loop for the rest of the game, with an opportunity cost of 2 points per turn relative to a C/C loop. Net gain: 2 points - 2 points per remaining move. Initial defection sequence: D->D... alwaysCooperate: the optimal unprovoked defector would defect (D/C) for +2, then continue defecting to get 5 points per move instead of the 3 it would've gotten by cooperating. Net gain: 2 points + 2 points per remaining move. Initial defection sequence: D->D... alwaysDefect: the optimal unprovoked defector would at best defect on the first turn, saving the one point it would lose from a C/D as opposed to the optimal chain of D/Ds against this opponent. Net gain: 1 point. Initial defection sequence: D->D... titForTat: the optimal unprovoked defector would defect (D/C +5), then cooperate (C/D +0) and return to the cooperation loop. Net gain: -1 point. Initial defection sequence: D->C joss: similar to detective: this is also secretly a ftft: our optimal defector defects (+5, +2 better than C/C), cooperates next turn and notices it gets no response, then detects it's playing against simpleton: our optimal unprovoked defector sets off the simpleton with a D/C (+5) then defects again (D/D for +1), then returns to the cooperation loop. Net gain: 0 points. Initial defection sequence: D->D random: our optimal unprovoked defector defects (expected value increases by 1.5), then immediately detects somehow it's playing against One thing you've probably noticed is that these optimal defectors above are not the same (the initial defection sequences) so you aren't able to get that juicy combined net gain of 11 + 2.5/remaining move. Instead, for just the 9 above, the combined optimal strategy (assuming you're really really good at detecting your opponent) will do significantly worse, although obviously the points per remaining move will have more weight than the constants. I'll leave putting all this together as an exercise for the reader (you just have to weigh the pros/cons of a single initial defection sequence for all these cases + the unknown mix of actual opponents in the meta). Anyhow, going back to my point: if you just look at I initially experimented with de-escalating tit-for-tat variants (that break out of the D/D loop but don't get abused by This is all non-trivial. But "mean" strategies- unprovoked defectors like |
@Barigamb738 if you want to investigate your problem, try pulling in #31 onto your local branch, generating that head-to-head CSV (maybe for runs where you vary the meta), importing it into a spreadsheet application, and then you can see exactly where that performance difference between your |
I agree. Though the biggest problem here is predicting your enemy's move. My guess is that the most common strategy will probably be some kind of a titForTat variant. Crafting a detective or something like that that's able to respond to all of these would be very hard. I think maybe some fancy machine learning could pull it off. |
You could have a very conservative Plus for what it's worth, |
I personally think that mean strategies and detective types aren’t going to hold up as well as nice strategies in this tournament. The benefit of betraying an alwaysCooperate vs entering a defect loop with a grim trigger seem to cancel each other out, but no one is going to be submitting more versions of alwaysCooperate. I think it’s far more likely that there are going to be variants of grim trigger. But again, the winner of this tournament will definitely be decided by the pool of strategies. Maybe there’ll be a ton of algorithms for a mean or detective type to take advantage of. Either that or Random will miraculously win by playing every matchup perfectly. That’ll certainly be something to see. |
@nobody5050 just ran a big mosh pit of strats visible in people's repos and a detective variant by @Lasermancer did quite well in there.
|
I’ll have to look into that mosh pit and see what sort of things other people are doing.
In regards to this, I meant more that it’s more likely that there’ll be more grim trigger variants than alwaysCooperate in the pool, so defecting first could very well end up in a net loss in points instead of cancelling out. Filling up the pool with grim triggers destroys any detective’s score because nice strategies get ahead (not that that’s going to happen). But like you said, for detectives it’s going to be up to exploiting variations of tit for tat and random-likes. Likewise, for “nice” strategies, I reckon it might be up to how they can deal with any detectives. |
Why though? Punishing your opponent for defection traps you in a D/D loop in most cases and is suicide for any strategy that wants to win. Yeah, the problem with the "nice" strat is that you're capped by forgiveness and your average-case game is a 3-point match. D/D is very inefficient so you have to throw olive branches every once in a while to an opponent who wrongs you. This means that your worst-case games average less than 1 point per match. Meanwhile, every single full-exploit a detective is able to pull off gets them a whole point per match. Even if they lose some of their 3-point matches or dip a bit, they can make massive gains from exploitative counterplay. In theory, a detective can do considerably better than any "nice" strat... but it just needs to be almost magically good at adapting to opponents. |
Haha, I know there’s at least two or three grim trigger variants already so I’m just going off what I’ve seen from this tournament so far. |
Thanks @l4vr0v i will try it |
I don't think I'm going to win this but it's nice to share experiences with people. I'll go over the different issues I encountered when developing my algorithm. Loops Don't fear the grudger Reactive strats won't give much bang for your buck Cash in on unreactive or slow to react strats |
Is it better to try to take down others, or raise your own score? |
I DID IT!!!!!!!!! |
Congratulations! |
Nice! I couldn't do that, but I'm happy you did. Only two days left! I'm getting hyped. So this is what a deadline feels like if you are working on something you are truly passionate for! |
sorry for my bad english |
Also since then it won against them |
Only one day left!!!!!!!!!!! |
The thing I submitted only has 11 lines of code. I hope it does well.... I couldn't improve upon it. |
Hello.
I wanted to create this issue for people to shear their ideas with eachother.
For example in one of my earlier attempts i wanted to awoid some of the flaws of ftft by doing 2 in total and not 2 in a row and it worked. But i am going with a different strategy now so i don't need this.
I know it's not the best strategy to help your enemies but i did it anyways.
If you have any strategy ideas that you don't need anymore but someone could use it, post it here.
The text was updated successfully, but these errors were encountered: