Predicting Victory or Defeat - How do you know you are ahead or behind?

  • 2007 AAR League

    To do a regression you need an outcome – or a “Y”.

    The basic model is a rather simple
    y=mx+b (standard linear equation)

    Except that it is more like
    y=m1x1+m2x2+…+ b
    (a linear equation with several factors)

    So my data needs a Y.  Which is currently 1 if the Axis wins, and 0 if the Allies win.  You cannot use a “probability of winning” unless there is a way of scientifically measuring it.

  • 2007 AAR League

    But does Y have to be discreet?  That is, does it have to be an integer (only ‘0’ or ‘1’) or could it be any decimal between 0 and 1?  ie, the equation is evaluated for Game A and Y=0.4.  For Game B Y=0.85.

    See what I mean?

  • 2007 AAR League

    Let’s say you discover 3 factors that you have identified as having a linear relationship with winning (say IPC value, Victory Cities, and Naval strength).  Assign a weight (m1,m2,m3) to each factor based on their relative importance.  The sum of the weights should equal 1.  eg. if Victory Cities is found to be the most important it gets a m2=0.5 while IPC gets m1=0.3 and Navy gets m3=0.2

    This defines your model as:
    Y = 0.3x1 + 0.5x2 + 0.2*x3

    Now for each factor, based on your data, you identify the range of values from the games you analyzed.  If there was never an allied win after the axis had 97 IPC or more, then 97 IPC is assigned a value of 1.  If there was never an axis win when the allies had 99 IPC or more (axis had 67 IPC) then assign that a value of 0.  The range in between (from 67 to 97 axis IPC) gets assigned values between 0 and 1 depending on what percentage of games were won by each side.  Should be the best fit linear.  Do the same thing for the other 2 factors (Victory Cities and Naval Strength).

    Now for any given game, just plug in the data from that game (ie. axis IPC is 73, so x1 would equal 0.2 lets say).  Just for arguments sake lets say x2 was 0.25 and x3 was 0.8.  Plug it into the formula and you get:

    Y = (0.3)(0.2) + (0.5)(0.25) + (0.2)*(0.8)
      = 0.06 + 0.125 + 0.16
      = 0.345

    Therefore there would be a 34.5% chance of the axis winning this particular game.

  • '19 Moderator

    @Baghdaddy:

    @Jennifer:

    So, as you can probably guess, I pretty much know by round 5 who will win by how large a margin.  Strategy, by that point, plays no role at all in the game, it’s now a game of chance.  (Because everyone seems to use the same strategy with the only variations being related directly to what they have left after the last round.)

    http://www.axisandallies.org/forums/index.php?topic=9006.msg178572#msg178572

    Lets hear your prediction.

    Axis submision in round… 9
    I can give a thought process, but I would rather wait until the game is over.

  • 2007 AAR League

    Rclayton - interesting idea, but wouldn’t you run into a problem because the value that you’d have for Y would have to be created by the exact same factors that you had as x1, x2, and x3?

    For instance, you wouldn’t want to use the IPC income as Y (because my regression analysis has shown that is a good measure of the “outcome”, but that there are other factors that affect it as well - such as total unit value).

    If you are regressing IPC income on IPC income, of course your model will be very good at predicting because they are the same thing!  A similar problem exists if instead of IPC income you define victory as a continuum from 0 to 1 based on multiple factors.  Your dependent variable (your ‘y’) needs to be different from your independent variables - and theoretically caused by them.

    My latest finding is that the model gets better and predicting if I remove the early rounds.  Thus the R^2 (percent of variance predicted) can increase from 45% to as much as 85% (or possibly even more, though I don’t have enough last round data), if I only look at the latter rounds.  This makes a lot of sense, because in the early part of the game you don’t know who is going to win (for that matter, the 45% of variance predicted was probably coming from the latter round records and very little to none of it from the first couple rounds).

    To increase prediction power in the earlier rounds where the game is nearly even, you’d have to be able to predict luck (impossible) or measure skill (possibly using league rankings).

  • 2007 AAR League

    I think you misunderstand me.  Y is not IPC.  Y is probability of Axis victory.

    m1, m2, m3… is the weight of each independent variable (ie factor), and they all add up to 1.0 (this is important to make sure that your Y value is a scale of 0 to 1).  So if you found that IPC (m1) was a better predictor than any other variable, your m1 value for IPC would be higher than your other m values (ie. m1>m2; m1>m3).

    x1, x2, x3…is the actual value each independent variable takes on for the specific game you are analyzing, and they must all be between 0 and 1 (again to make sure your Y is between 0 and 1).  So again for your IPC factor, if the current game has the axis doing very well in IPC (compared to your data set of games you previously analyzed) then your x1 value would be close to 1.0 and if the allies are doing well, the x1 value would be close to 0.0

    What I’m describing is basically a weighted average based on linear relationships that you will determine using your data set.

    Hope that helps.

  • 2007 AAR League

    Hmm, I think what you are describing is covered by linear regression process.  I’m using OLS (ordinary least squares) regression and SPSS (software which does most of the work for me).  Are you familiar with OLS?

    I’m fuzzy on some of the exact details as to how OLS works because it’s been 5+ years since I was doing major statistics work.

    Here is the wikipedia entry:
    http://en.wikipedia.org/wiki/Least_squares

  • 2007 AAR League

    I guess I’m suggesting something like weighted least squares

    http://en.wikipedia.org/wiki/Weighted_least_squares

    My university math is pretty fuzzy, and as I recall we only touched on regression.  I didn’t take too many stats courses.

    But yes, what I am describing is covered by linear regression.  What I was attempting to do was demonstrate that Y need not be an independent variable with only possible outcomes of 0 or 1, in response to:

    @akreider2:

    Is anyone an expert on different types of regressions?  I’m wondering how much a problem using a linear regression is for a variable that only has a 1 or 0 outcome?

    The problem is that the difference between winning by a slim margin, and totally devasting someone can be big.  For instance, you can win a narrow victory with the Axis and Allies unit IPCs being equal, or have a big victory with a 200+ IPC difference.  Ideally you’d have a win that was a “1” and a larger win that was a “1.5” or “2”.  Any idea of how to measure this based on an Axis and Allies board?  You could use victory cities, but I tend to think that they are a joke.

    Is there any way to parse a map file? I’d like to convert it into an array of number of units per country, so I could write a computer program to generate a data file for analysis.

    With the latest model, 1) AXIS IPC territory held (J+G territory) and 2) total unit IPC value difference are the two significant factors (p=0.001).

    If you allow Y to be a real number between 0 and 1, then I think it makes your linear regression model work better with it, and also solves your concern of a narrow victory versus a landslide win.

    Also, parsing a map file would depend on the format used.  TripleA map files would be pretty simple to parse, since the project is open source you should be able to look at the code and determine the format of the file.  Mapview is not open source, but Motdc is actively developing for it and he might be open to helping parse a mapview map file.  ABattlemap would be near impossible as far as I can tell, since I don’t believe anyone is actively developing this application anymore.  Unfortunately I’d say 80% of the map files on this board are ABattlemap, so that may put a big kink in those plans.

  • 2007 AAR League

    Hopefully we won’t bore everyone else (people I still need data files - send me your aBattlemaps!!!) on the thread.

    Maybe it would be helpful to clarify that there is are two Ys.  The observed outcome and the predicted outcome.  The predicted outcome will vary a lot (in the 0 to 1 range, but it could go as far as -1 or +2).  The observed outcome is currently 0 or 1.

    Weighing records might be a good idea.  I suspect excluding them might work even better.  Eg. if I can collect enough data for the last 1-3 rounds of a game that would be the best.

    Weighting - I tried it out, using the Round as the weight, and it boosted R^2 from 0.35 to 0.55.  However i can get better results by excluding the early rounds.


    Hmm, logistic regression is meant to deal with 1/0 outcomes. However I don’t see how to do it with my SPSS version (11), so I’m going to try and get a new version.

    BTW - do you have any aBattlemaps you can send my way?  (Ideally with bid data).

  • 2007 AAR League

    Sorry, I don’t have any maps available.  I am actually in the midst of playing my first ever revised game as we speak.

    I would think that you might want to try to avoid excluding earlier rounds.  Ideally you want to be able to take any game, plug in the critical dimensions into the formula, and spit out some sort of expected outcome.  Just because a game was in the early rounds, doesn’t mean you shouldn’t try to take a crack at predicting the outcome, does it?

    I wonder if you could also try to calculate a confidence level?  Eg. Game 1 was in round 30, and based on the independent variables the axis should win with a confidence level of 90%.  Game 2 was only in round 6, and it was calculated that the axis should win but with a confidence level only of 55%.

    Or something along those lines.

    Not sure what the calculations would look like though…

  • 2007 AAR League

    I wonder if you could also try to calculate a confidence level?  Eg. Game 1 was in round 30, and based on the independent >variables the axis should win with a confidence level of 90%.  Game 2 was only in round 6, and it was calculated that the axis >should win but with a confidence level only of 55%.

    The model with you give a predicted outcome and a standard deviation for that (For instance it might give you a 0.9 with a 0.2 standard deviation). So you could get a confidence level from that. Maybe a logit model will do a better job of this (as it will tell you chance of getting exactly 0 or 1, whereas the linear regression says you can get 0.9 which is an outome (a near win) that doesn’t exist as it represents an uncompleted game).  I’ll see if I can get SPSS to upgrade.

  • 2007 AAR League

    So using logistic regression and more data, my model is predicting 87% of game outcomes (using all the data from round 1 to the end of rounds), and 100% of games starting on round 5 (eg once the game has progressed for a couple rounds - this model is very accurately predicting the winner!).

    Now my problem is I’m not quite sure how to give a simple explanation of how logistic regression works.  In fact I’m somewhat confused myself.

    Variables
    Both measured at the end of Russian turn

    UnitDif: AXIS IPC Units - Ally IPC units

    IPCDif: AXIS IPC Territory - Ally IPC territory

    The Model
    116 rounds of data - roughly 15 games

                                            Predicted
                                 Allied Win       Axis Win    Percent Correct
    Observed  Allied Win  28                10              73.7
                  Axis Win    5                 73              93.6

    Overall Percent Correct - 87.1
    (It isn’t predicting Allied wins as well, because 2/3 of my data was axis wins)

                 B          SE           Sig           Exp(B)
    UnitDif    .038      .011        .000           1.039
    IPCDif     .110       .035       .002           1.116
    Constant  6.644    1.501     .000        768.146

    Cox and Snell R Squared: .520
    Nagelkerke R Squared:    .724

    I think this means that if UnitDIF changes by 1, your chances of winning change by 3.9%.  If IPCDif changes by 1, your chances of winning change by 11.6%.  But both of those values seem kind of high.  So is that right?
    Also if the IPCDif is zero then it means for the game to be even the UnitDif should be 177 (6.644/.038) - is that right?

    The logistic model is complex because it has something to do with a ratio of two exponents (e to the power of something).


  • “I think this means that if UnitDIF changes by 1, your chances of winning change by 3.9%.  If IPCDif changes by 1, your chances of winning change by 11.6%.  But both of those values seem kind of high.  So is that right?
    Also if the IPCDif is zero then it means for the game to be even the UnitDif should be 177 (6.644/.038) - is that right?”

    Yay, numbers.

    Give me more things to throw at my opponents to confuse them.

    More numbers!

  • 2007 AAR League

    So I did a prediction for my
    current game with DJensen

    My (very tentative) prediction model, assisted by my math skills (also tentative in this area), predicts that (as of the end of R9) you have a 1 in 250 chance of winning (though the 95% confidence interval is that your odds of winning range from 1 in 12.5 to 1 in 5000).

    Hmm, the 95% confidence interval might even be bigger than that.

    B          SE          Sig          Exp(B)
    UnitDif    .038      .011        .000          1.039
    IPCDif    .110      .035      .002          1.116
    Constant  6.644    1.501    .000        768.146

    At R9 our situation:
    IPCDif: -6
    UnitDif: -12

    y=-6* 0.111 –12*.038+6.644
    y=5.522

    e^5.522=250.13

    However since it could be plus or minus 2 confidence intervals, your chance of winning could be as high as
    e^2.522=12.45
    or as low as
    e^8.522=5000 or so

  • 2007 AAR League

    I should note that the confidence interval is inflated because UnitDif and IPCDif are correlated (0.560).  If I only use UnitDif the confidence interval is only half the size.

    I’m not sure if this means that the confidence interval measurement is inflated due to the correlation and that the true interval is smaller, or if the actual interval increases too.  Tentatively I’d suspect the first is true.

  • 2007 AAR League

    This math is seriously fracking insane.

    The way I predict Victory or Defeat is that I look at the board and use my intellectual knowledge of A&A to determine if I’m doing good or not!


  • Here is a good rule of thumb…

    If you are playing against Ankmcfly, you are behind.

    /WOOT!
    //Thank you, thank you very much.
    :wink:

  • 2007 AAR League

    Some additional findings…

    1)  The balance of power (Axis Naval IPC - Allied Naval IPC) for navy has the same impact as it does for land (Axis Land IPC - Allied Land IPC).  The coefficients are currently 0.041 and 0.045 (no statistical difference).

    2). The amount of territory occupied by Japan or Germany has the same impact.  Eg. it is just as useful for Japan to go up several IPCs of territory as it is for Germany.  On the Allies side, UK and Russian territory is equally valuable (US territory is less valuable, possibly not valuable at all but that might be due to colinearity).

    1. A small bribe, ideally by paypal, to the other player can increase your chances of winning by 73.2% normally, but only by 37.9% in tournament games.  Ok, late April fools =)
  • 2007 AAR League

    Oops.  Battlemap doesn’t include ICs in your land unit count (or anywhere else).  This is particularly bad for Japan (eg. exclusion of IC values, downgrades the effectiveness of the model - and it’s especially bad because a 15 IPC difference in the model is like a 60% difference in your odds), but also matters if the UK or US were to build an IC.

    I’m not sure what will happen if you included other ICs - like say SEU IC was a battleground (eg nobody could produce in it because they’re fighting over it - then its effective value is more like 0).

    So I think I might work on including ICs…

    Weird that BattleMap counts AA guns, but not ICs.

  • Moderator

    AA’s are movable units, IC’s are not.

    IC’s help supply lines and production capability, but aren’t really units and I think it is probably better that they aren’t included.

    If you want to take production capability into account that might be a good idea, but the IPC value itself shouldn’t count in the units.

    Just as an extreme example to show a possible skew, you could have Japan with 5 IC’s worth 75 IPC and Russia with 5 armor worth 25 IPC.  Even though Russia is negative -50 IPC, they are still in a much better position to win.

Suggested Topics

Axis & Allies Boardgaming Custom Painted Miniatures

101

Online

17.8k

Users

40.4k

Topics

1.8m

Posts