The new ELO-based ranking system


  • I love this idea. My personal feedback:

    • No need to have Elon ratings decay, but perhaps put an asterisk after the number if fewer than 3 games were played in the last 12 months or similar concept
      *the version of games are similar enough that we don’t need separate ratings for each version. You already can see the best OOB players are also top in BM.
      *I would like to see playoffs using this for bracketing assuming player has met the minimum number of games for the season.

  • @Arthur-Bomber-Harris At first I agree with the sentiment, but after a minute I definitely do not.

    Most players here prefer BM and have honed their skills for it.
    I think more of the better players are playing BM.

    If the data is accurate, and it may not be (see jkeller issue below)
    Anecdotal evidence:
    Pejon is #4 overall, 20-7

    He is 6-0 PtV, possibly a weaker field
    13-7 BM
    Maybe he plays higher competition in BM, I don’t know offhand. Those records add up to 19-7, may be some data entry errors - we probably need someone to double check. I don’t have time these days.

    Myygames is #1 in OOB with 7-0
    Is #10 overall when put together with everyone else.

    Again, could be data entry errors, could be strength of schedule differences.
    But many players, and probably Myygames, would like to know they’re #1 in OOB and not just #10 overall, especially when that’s the version they’re into.

    We also split the versions 3 years ago because we had the issue of what version to play in the playoffs.

    I don’t have time - I just slapped this response together but I hope it helps and stimulates your brain. Keep those ideas coming, and let me know what you think about my response if you want


  • @gamerman01 strength of schedule is way weaker in OOB which is why I can make the playoff finals in this division but would get crushed in BM with more top players.

    With ELO it gives people incentive to cross over to other versions if people have inappropriately low or high ratings. Knock down a few people who are most out of line with reality and then the reduced ratings cascade through the rest of the group as the consequences reverberate. It might not be absolutely perfect but I doubt people will end up too far away from where they should be.


  • all good points, but THERE is a reason why there are separate ratings…If I only play OOB and my opponent insists on BM, it is al starting over again. I think the separate ratings should stay and have play offs for all versons.

    I think the elo could work and I am not against that. Nice initiative and well explained


  • I mean in the play offs

  • '19 '18

    @oysteilo said in Proposal for a new, ELO-based, ranking system:

    intereasting ideas… from the new elo spreadsheet my overall rating is 1673, my OOb is1546 and BM is 1552

    How can my overall ranking be higher than then any of the two individual game version rankings?

    Here are your results.
    First your wins:

    14000360-baff-4504-8bc1-ade7b6012e67-image.png

    And here your losses

    d0c0a0cb-c53f-4197-98fd-f4189be7f1d3-image.png

    Your 3 BM4-wins have netted you 98+55+42 = 195 overall-rating and only 165 BM4-specific Rating.
    Your 4 OOB-wins have netted you 80+53+4+6 = 143 overall-rating but 210 OOB-specific rating

    Your single BM4-loss has cost you 64 overall-rating and 57 BM4-specific rating.
    Your 3 OOB-losses have cost you 101 overall-rating and 164 OOB-specific rating.

    Gamerman already gave the explanation: You defeated people in a version that they are weaker in.
    AetV had 1654 overall rating before, but only 1515 BM4-rating.
    ArthurBomberHarris had 1631 overall rating before, but 1570 OOB-Rating before

    and so on and so on.

    But: I noticed something else. Mr_stucifer has entered the 2022 data for me and he abbreviated Aequitas et veritas as AetV. This is a problem obviously since my sheet thinks those are 2 different players.

    I will have to look over the data myself to check for similar errors (spelling for example).

    The data for jkeller was all there and correct, just not visible. I simply forgot to list his name in the BM4-Sheet. So the data was calculated and everything was correct, but you couldn’t see it. Fixed that.


  • @Arthur-Bomber-Harris said in Proposal for a new, ELO-based, ranking system:

    No need to have Elon ratings decay, but perhaps put an asterisk after the number if fewer than 3 games were played in the last 12 months or similar concept

    This is already implemented. Grey background indicates “inactive” status, which is currently set at 1 year since last result.

    White background (and italic) means less than 2 completed games.

  • '19 '18

    @gamerman01 said in Proposal for a new, ELO-based, ranking system:

    Pejon is #4 overall, 20-7

    He is 6-0 PtV, possibly a weaker field
    13-7 BM

    Please note:
    While Pejon is “only” #4 in Overall and #1 in PtV, his Rating overall is significantly higher than his PtV rating.

    ae28f844-4ece-432d-bd45-fed23be2c35f-image.png

    Now there are of course multiple reasons for that, but one that jumps my eye immediately:
    There was a full year gap between his last two PtV results and in this time he increased his overall rating from 1641 to 1886, that’s a lot!

    He seems to be a great example for someone who improved over time!
    e64d1b04-fbff-41d8-a824-c300e3bed46c-image.png

    Notice how 4 of his 7 losses happened early in 2022. And the only 2 losses he had this year were against very very high rated players.

    So his 20-7 overall might not seem so impressive at first glance, but this simple win/loss ratio is hindered by a weaker early phase in the data. It doesn’t tell a story of improvement.
    His ELO however properly reflects that.

    One sidenote though:
    These early losses were against other people who are strong but had lower rating back then because this is where the data starts. As soon as I input earlier results, the ELO will become more and more accurate.

    This is a reminder that everyone can help me with this task.


  • Now jkeller’s BM4 results are in there…


  • @Arthur-Bomber-Harris said in Proposal for a new, ELO-based, ranking system:

    You already can see the best OOB players are also top in BM.

    The top 3 OOB players have literally played ZERO BM4 games.
    And #4 OOB Booper is only #18 in BM4
    #5 is you and you have a single reported BM4 game, would be ranked #15 with that.

    #6 OOB Farmboy is the first who really is also top in BM4


  • @MrRoboto said in Proposal for a new, ELO-based, ranking system:

    @Arthur-Bomber-Harris said in Proposal for a new, ELO-based, ranking system:

    You already can see the best OOB players are also top in BM.

    The top 3 OOB players have literally played ZERO BM4 games.
    And #4 OOB Booper is only #18 in BM4
    #5 is you and you have a single reported BM4 game, would be ranked #15 with that.

    #6 OOB Farmboy is the first who really is also top in BM4

    Maybe the overall rating serves zero to none purpose then??? Should we kill it?

    I think the ranking also reflekts personal prefereanse. AAB has one BM game where as Booper has 5 BM games and 4 wins. So it appears it hugely matters who you play in the different versions, why should this ipact overall rating? We dont use it, do we?


  • You’re right, over-all rating has been an additional curiosity ranking. It is not used at all for playoffs or permanent league records. The purpose has been to help compare players who play a lot of one version to players who play a lot of the other. For this reason the overall is still nice. For example, Myygames whipping everybody in OOB can be put in perspective when stacked up against everyone playing all versions.

    And that is why overall doesn’t have any actual bearing on anything, is because it’s a mish-mash.


  • @gamerman01 said in Proposal for a new, ELO-based, ranking system:

    You’re right, over-all rating has been an additional curiosity ranking. It is not used at all for playoffs or permanent league records. The purpose has been to help compare players who play a lot of one version to players who play a lot of the other. For this reason the overall is still nice. For example, Myygames whipping everybody in OOB can be put in perspective when stacked up against everyone playing all versions.

    And that is why overall doesn’t have any actual bearing on anything, is because it’s a mish-mash.

    I agree, lets keep the overall,but focus on version-spesific ratings for play offs!

  • '19 '18

    With AAB you mean ABH, ArthurBomberHarris?

    We use the overall ranking right now and I would like to keep it as well in the future.

    We wouldn’t need it if EVERYONE is like Myygames and plays one version only.

    However, some people like Pejon or GeneralDisarray are playing 2 or even more versions. I find it interesting to see if someone is a specialist or a generalist.

    But, as you two have already agreed: Playoffs are never based on overall rankings, but rather on type specifics.

  • 2025 2024 '23

    I went through 14 more pages and collected data on 226 more games to input.

    link url

  • '19 '18

    that’s incredible, thanks!

  • 2024 '23 '22 '21

    @MrRoboto yeah, overall ratings are a “nice to have” and nothing more.

    I do not know if there is any sport with overall ratings - you see individual ratings for variants:

    Track & field: 100m, 200m etc
    Tennis: Single, double
    Swimming: Breaststroke, freestyle, etc

  • '19 '18

    And yet there is Decathlon!

    And for swimming, there usually are specialists for some of the disciplines and additionally it is interesting to see who the overall fastest swimmer is across all differen versions.


  • @MrRoboto @gamerman01

    Liking relative over absolute references I have stumbled upon my disfavor for parameters at will - and might have found a flaw in the currently proposed system!

    What we probably want to avoid next to circular references is possible division by zero. This would occur, however, if RAold was by 50 higher than RB? Going further my first projection to fix is

    EA = 0,5 * (1+ (RAold-RBold)/(1+Rmax-Rmin))

    with Rmax (Rmin) being highest (lowest) League rating

    Intention is that EA for equally strong players results in 1/2 and EA for strongest versus weakest player is near 1 …

    I am not sure yet if smart starting rating is at null or 1k or 1.5k or else. But what appears to me is that there is plenty of more elaboration needed to create a flawless system!! Hey, I would be in; it looks to be fun!

    Meanwhile we should consider as
    @Martin said in League General Discussion Thread:

    @MrRoboto thank you for this analysis, and well presented! The ELO system was suggested a few times during the last years, and I also strongly support it. And as I stated before, there are plenty of management systems available which would ease the job of the score keeper / league manager.

  • PantherP Panther pinned this topic on

  • @pacifiersboard said in Proposal for a new, ELO-based, ranking system:

    What we probably want to avoid next to circular references is possible division by zero. This would occur, however, if RAold was by 50 higher than RB?

    I’m not sure where in the formula @MrRoboto posted it is possible to divide by zero anywhere, R(b)-R(a) is the denominator but is not the complete denominator, you would have a denominator of 1. So in the case of exactly even ratings, E(a) = 1.

    However I could see this being an issue as that would make the winner get 0 points if I’m understanding correctly:

    R(a)+K*(S-E(a))

    E(a) = 1, S=1 for a win, so S-E(a) = 0.
    K*(0) = 0
    R(a)new = R(a)old + 0

Suggested Topics

  • 14
  • 18
  • 32
  • 173
  • 91
  • 461
  • 42
  • 272
Axis & Allies Boardgaming Custom Painted Miniatures

108

Online

17.8k

Users

40.6k

Topics

1.8m

Posts