Whenever I play and the USSR captures Berlin I alway play this song on Youtube. The reaction is great.
Predicting Victory or Defeat - How do you know you are ahead or behind?
-
WHY do you want to do a scientific analysis?
I noticed my question hasn’t been answered.
Could this be part of the Secret Liberal Conspiracy?
-
WHY do you want to do a scientific analysis?
I noticed my question hasn’t been answered.
Could this be part of the Secret Liberal Conspiracy?
Of course!
Once they have reduced the odds of winning down to mere numbers in a table, they will analysis the Axis of Evil and decide we should surrender immediately.
-
Is anyone an expert on different types of regressions? I’m wondering how much a problem using a linear regression is for a variable that only has a 1 or 0 outcome?
The problem is that the difference between winning by a slim margin, and totally devasting someone can be big. For instance, you can win a narrow victory with the Axis and Allies unit IPCs being equal, or have a big victory with a 200+ IPC difference. Ideally you’d have a win that was a “1” and a larger win that was a “1.5” or “2”. Any idea of how to measure this based on an Axis and Allies board? You could use victory cities, but I tend to think that they are a joke.
Is there any way to parse a map file? I’d like to convert it into an array of number of units per country, so I could write a computer program to generate a data file for analysis.
With the latest model, 1) AXIS IPC territory held (J+G territory) and 2) total unit IPC value difference are the two significant factors (p=0.001).
-
Instead of looking for a discreet outcome (0 or 1) why not look for a probability of axis win (from 0% to 100%). Then you could say Game X is a 40% probability and Game Y is a 85% probability.
Not sure how to set up the math, but if you could it would probably be more useful.
-
To do a regression you need an outcome – or a “Y”.
The basic model is a rather simple
y=mx+b (standard linear equation)Except that it is more like
y=m1x1+m2x2+…+ b
(a linear equation with several factors)So my data needs a Y. Which is currently 1 if the Axis wins, and 0 if the Allies win. You cannot use a “probability of winning” unless there is a way of scientifically measuring it.
-
But does Y have to be discreet? That is, does it have to be an integer (only ‘0’ or ‘1’) or could it be any decimal between 0 and 1? ie, the equation is evaluated for Game A and Y=0.4. For Game B Y=0.85.
See what I mean?
-
Let’s say you discover 3 factors that you have identified as having a linear relationship with winning (say IPC value, Victory Cities, and Naval strength). Assign a weight (m1,m2,m3) to each factor based on their relative importance. The sum of the weights should equal 1. eg. if Victory Cities is found to be the most important it gets a m2=0.5 while IPC gets m1=0.3 and Navy gets m3=0.2
This defines your model as:
Y = 0.3x1 + 0.5x2 + 0.2*x3Now for each factor, based on your data, you identify the range of values from the games you analyzed. If there was never an allied win after the axis had 97 IPC or more, then 97 IPC is assigned a value of 1. If there was never an axis win when the allies had 99 IPC or more (axis had 67 IPC) then assign that a value of 0. The range in between (from 67 to 97 axis IPC) gets assigned values between 0 and 1 depending on what percentage of games were won by each side. Should be the best fit linear. Do the same thing for the other 2 factors (Victory Cities and Naval Strength).
Now for any given game, just plug in the data from that game (ie. axis IPC is 73, so x1 would equal 0.2 lets say). Just for arguments sake lets say x2 was 0.25 and x3 was 0.8. Plug it into the formula and you get:
Y = (0.3)(0.2) + (0.5)(0.25) + (0.2)*(0.8)
= 0.06 + 0.125 + 0.16
= 0.345Therefore there would be a 34.5% chance of the axis winning this particular game.
-
So, as you can probably guess, I pretty much know by round 5 who will win by how large a margin. Strategy, by that point, plays no role at all in the game, it’s now a game of chance. (Because everyone seems to use the same strategy with the only variations being related directly to what they have left after the last round.)
http://www.axisandallies.org/forums/index.php?topic=9006.msg178572#msg178572
Lets hear your prediction.
Axis submision in round… 9
I can give a thought process, but I would rather wait until the game is over. -
Rclayton - interesting idea, but wouldn’t you run into a problem because the value that you’d have for Y would have to be created by the exact same factors that you had as x1, x2, and x3?
For instance, you wouldn’t want to use the IPC income as Y (because my regression analysis has shown that is a good measure of the “outcome”, but that there are other factors that affect it as well - such as total unit value).
If you are regressing IPC income on IPC income, of course your model will be very good at predicting because they are the same thing! A similar problem exists if instead of IPC income you define victory as a continuum from 0 to 1 based on multiple factors. Your dependent variable (your ‘y’) needs to be different from your independent variables - and theoretically caused by them.
…
My latest finding is that the model gets better and predicting if I remove the early rounds. Thus the R^2 (percent of variance predicted) can increase from 45% to as much as 85% (or possibly even more, though I don’t have enough last round data), if I only look at the latter rounds. This makes a lot of sense, because in the early part of the game you don’t know who is going to win (for that matter, the 45% of variance predicted was probably coming from the latter round records and very little to none of it from the first couple rounds).
To increase prediction power in the earlier rounds where the game is nearly even, you’d have to be able to predict luck (impossible) or measure skill (possibly using league rankings).
-
I think you misunderstand me. Y is not IPC. Y is probability of Axis victory.
m1, m2, m3… is the weight of each independent variable (ie factor), and they all add up to 1.0 (this is important to make sure that your Y value is a scale of 0 to 1). So if you found that IPC (m1) was a better predictor than any other variable, your m1 value for IPC would be higher than your other m values (ie. m1>m2; m1>m3).
x1, x2, x3…is the actual value each independent variable takes on for the specific game you are analyzing, and they must all be between 0 and 1 (again to make sure your Y is between 0 and 1). So again for your IPC factor, if the current game has the axis doing very well in IPC (compared to your data set of games you previously analyzed) then your x1 value would be close to 1.0 and if the allies are doing well, the x1 value would be close to 0.0
What I’m describing is basically a weighted average based on linear relationships that you will determine using your data set.
Hope that helps.
-
Hmm, I think what you are describing is covered by linear regression process. I’m using OLS (ordinary least squares) regression and SPSS (software which does most of the work for me). Are you familiar with OLS?
I’m fuzzy on some of the exact details as to how OLS works because it’s been 5+ years since I was doing major statistics work.
Here is the wikipedia entry:
http://en.wikipedia.org/wiki/Least_squares -
I guess I’m suggesting something like weighted least squares
http://en.wikipedia.org/wiki/Weighted_least_squares
My university math is pretty fuzzy, and as I recall we only touched on regression. I didn’t take too many stats courses.
But yes, what I am describing is covered by linear regression. What I was attempting to do was demonstrate that Y need not be an independent variable with only possible outcomes of 0 or 1, in response to:
Is anyone an expert on different types of regressions? I’m wondering how much a problem using a linear regression is for a variable that only has a 1 or 0 outcome?
The problem is that the difference between winning by a slim margin, and totally devasting someone can be big. For instance, you can win a narrow victory with the Axis and Allies unit IPCs being equal, or have a big victory with a 200+ IPC difference. Ideally you’d have a win that was a “1” and a larger win that was a “1.5” or “2”. Any idea of how to measure this based on an Axis and Allies board? You could use victory cities, but I tend to think that they are a joke.
Is there any way to parse a map file? I’d like to convert it into an array of number of units per country, so I could write a computer program to generate a data file for analysis.
With the latest model, 1) AXIS IPC territory held (J+G territory) and 2) total unit IPC value difference are the two significant factors (p=0.001).
If you allow Y to be a real number between 0 and 1, then I think it makes your linear regression model work better with it, and also solves your concern of a narrow victory versus a landslide win.
Also, parsing a map file would depend on the format used. TripleA map files would be pretty simple to parse, since the project is open source you should be able to look at the code and determine the format of the file. Mapview is not open source, but Motdc is actively developing for it and he might be open to helping parse a mapview map file. ABattlemap would be near impossible as far as I can tell, since I don’t believe anyone is actively developing this application anymore. Unfortunately I’d say 80% of the map files on this board are ABattlemap, so that may put a big kink in those plans.
-
Hopefully we won’t bore everyone else (people I still need data files - send me your aBattlemaps!!!) on the thread.
Maybe it would be helpful to clarify that there is are two Ys. The observed outcome and the predicted outcome. The predicted outcome will vary a lot (in the 0 to 1 range, but it could go as far as -1 or +2). The observed outcome is currently 0 or 1.
Weighing records might be a good idea. I suspect excluding them might work even better. Eg. if I can collect enough data for the last 1-3 rounds of a game that would be the best.
Weighting - I tried it out, using the Round as the weight, and it boosted R^2 from 0.35 to 0.55. However i can get better results by excluding the early rounds.
–
Hmm, logistic regression is meant to deal with 1/0 outcomes. However I don’t see how to do it with my SPSS version (11), so I’m going to try and get a new version.BTW - do you have any aBattlemaps you can send my way? (Ideally with bid data).
-
Sorry, I don’t have any maps available. I am actually in the midst of playing my first ever revised game as we speak.
I would think that you might want to try to avoid excluding earlier rounds. Ideally you want to be able to take any game, plug in the critical dimensions into the formula, and spit out some sort of expected outcome. Just because a game was in the early rounds, doesn’t mean you shouldn’t try to take a crack at predicting the outcome, does it?
I wonder if you could also try to calculate a confidence level? Eg. Game 1 was in round 30, and based on the independent variables the axis should win with a confidence level of 90%. Game 2 was only in round 6, and it was calculated that the axis should win but with a confidence level only of 55%.
Or something along those lines.
Not sure what the calculations would look like though…
-
I wonder if you could also try to calculate a confidence level? Eg. Game 1 was in round 30, and based on the independent >variables the axis should win with a confidence level of 90%. Game 2 was only in round 6, and it was calculated that the axis >should win but with a confidence level only of 55%.
The model with you give a predicted outcome and a standard deviation for that (For instance it might give you a 0.9 with a 0.2 standard deviation). So you could get a confidence level from that. Maybe a logit model will do a better job of this (as it will tell you chance of getting exactly 0 or 1, whereas the linear regression says you can get 0.9 which is an outome (a near win) that doesn’t exist as it represents an uncompleted game). I’ll see if I can get SPSS to upgrade.
-
So using logistic regression and more data, my model is predicting 87% of game outcomes (using all the data from round 1 to the end of rounds), and 100% of games starting on round 5 (eg once the game has progressed for a couple rounds - this model is very accurately predicting the winner!).
Now my problem is I’m not quite sure how to give a simple explanation of how logistic regression works. In fact I’m somewhat confused myself.
Variables
Both measured at the end of Russian turnUnitDif: AXIS IPC Units - Ally IPC units
IPCDif: AXIS IPC Territory - Ally IPC territory
The Model
116 rounds of data - roughly 15 games              Predicted
             Allied Win   Axis Win  Percent Correct
Observed Allied Win 28        10       73.7
       Axis Win  5        73       93.6Overall Percent Correct - 87.1
(It isn’t predicting Allied wins as well, because 2/3 of my data was axis wins)       B     SE      Sig      Exp(B)
UnitDif  .038   .011    .000      1.039
IPCDif   .110    .035    .002      1.116
Constant 6.644  1.501   .000    768.146Cox and Snell R Squared: .520
Nagelkerke R Squared:  .724I think this means that if UnitDIF changes by 1, your chances of winning change by 3.9%. If IPCDif changes by 1, your chances of winning change by 11.6%. But both of those values seem kind of high. So is that right?
Also if the IPCDif is zero then it means for the game to be even the UnitDif should be 177 (6.644/.038) - is that right?The logistic model is complex because it has something to do with a ratio of two exponents (e to the power of something).
-
“I think this means that if UnitDIF changes by 1, your chances of winning change by 3.9%. If IPCDif changes by 1, your chances of winning change by 11.6%. But both of those values seem kind of high. So is that right?
Also if the IPCDif is zero then it means for the game to be even the UnitDif should be 177 (6.644/.038) - is that right?”Yay, numbers.
Give me more things to throw at my opponents to confuse them.
More numbers!
-
So I did a prediction for my
current game with DJensenMy (very tentative) prediction model, assisted by my math skills (also tentative in this area), predicts that (as of the end of R9) you have a 1 in 250 chance of winning (though the 95% confidence interval is that your odds of winning range from 1 in 12.5 to 1 in 5000).
Hmm, the 95% confidence interval might even be bigger than that.
B SE Sig Exp(B)
UnitDif .038 .011 .000 1.039
IPCDif .110 .035 .002 1.116
Constant 6.644 1.501 .000 768.146At R9 our situation:
IPCDif: -6
UnitDif: -12y=-6* 0.111 –12*.038+6.644
y=5.522e^5.522=250.13
However since it could be plus or minus 2 confidence intervals, your chance of winning could be as high as
e^2.522=12.45
or as low as
e^8.522=5000 or so -
I should note that the confidence interval is inflated because UnitDif and IPCDif are correlated (0.560). If I only use UnitDif the confidence interval is only half the size.
I’m not sure if this means that the confidence interval measurement is inflated due to the correlation and that the true interval is smaller, or if the actual interval increases too. Tentatively I’d suspect the first is true.
-
This math is seriously fracking insane.
The way I predict Victory or Defeat is that I look at the board and use my intellectual knowledge of A&A to determine if I’m doing good or not!