The new ELO-based ranking system


  • @farmboy said in Proposal for a new, ELO-based, ranking system:

    If I’m right on that, instead of it being a factor in one’s first games, can it be more sensitive in one’s most recent games? That might allow new players to move up more quickly without penalizing players that have been around for a while.

    Or both?? Sensitive at beginning and also recent?
    Or K factor that adjusts to total number of games, where the adjustment is high in the first few games, but if that player continues on to play 15 or 50 or 500, then automatically eliminates those first few game k factor and enacts the later game? MrRoboto, we think you can do anything now. :D

    I suppose after 50 or 500 whatever games, the k factor being applied to the first few would be practically nil, so maybe could be applied to both beginning and more recent.

    Look at those formulas. Look at that automation. We’ll figure it out.

  • '19 '18

    @farmboy and @gamerman01
    This is not possible since that would mean to retroactively adapt the ELO change of past games when you play more recent games.
    That contradicts the whole idea and is actually not even possible to implement since that creates circular references again - the biggest problem the old system had.

    It’s also not necessary at all. As you, gamerman, already stated: At a certain point the first few games are completely irrelevant. That point is FAR earlier than 50 games.

    Just an extreme example:
    Dawgoneit is currently 5-45, with an ELO of 1059
    With only 4 wins against some of the current top5 players, he can increase his rating to ~1600 even though he still is only 9-45 at that point.
    The system accurately shows the current strength.

    Thanks to Mr_Stucifer we now have the data of 2022 too. I think the system already looks extremely solid.

    Everybody can create a copy for themselves with
    File -> Make a Copy.

    You can then play around and add some results as you like to see how the system behaves.


  • Um, guys?

    The Hubble telescope just got the corrected lens. With another year or 15 months whatever of data (2022, some of 2021) and whatever adjustment Roboto just made,

    I am super excited and happy to see this standings board. Like I said, with my experience of entering every game result and, indeed, reading comments and what all is involved with moderating the league for years,

    I can tell you THIS outcome is excellent.

    Screenshot 2023-10-31 10.52.24.png

    Screenshot 2023-10-31 10.57.37.png


  • Now that is what it would look like if there was a “lifetime” rating starting in late 2021. This is not what 2023 would look like. And of course our past rankings spreadsheet will fill out 2023 so that playoffs are unaffected and comparability across years will be there.

    But something like this, and will be better after conversations and tweaking, is coming to a computer near you in 1/1/24.

    More than half the credit goes to Roboto for enthusiasm, computer ability, and pushing for improvement. I was hard to get through, but today with this spreadsheet pictured below, I am a believer and this is the future.

  • G gamerman01 referenced this topic on

  • intereasting ideas… from the new elo spreadsheet my overall rating is 1673, my OOb is1546 and BM is 1552

    How can my overall ranking be higher than then any of the two individual game version rankings?


  • @oysteilo said in Proposal for a new, ELO-based, ranking system:

    intereasting ideas… from the new elo spreadsheet my overall rating is 1673, my OOb is1546 and BM is 1552

    How can my overall ranking be higher than then any of the two individual game version rankings?

    One quick explanation/example is if you defeated someone in BM who normally plays PtV or OOB and is more successful there.


  • also look at Jkeller, he is number one in overall, but I only find his name in the OOB bracket where he is 7th.

    Maybe I am missing something?


  • @oysteilo this also might be because results are still being put in. I see him having an overall ranking based on 11 games and an OOB ranking based on 4 games. And 0 games with BM or PTV. So I suspect there is either a clerical error or 7 games (all wins so that would push up his ELO) that were counted in overall that have yet to be counted elsewhere.


  • Right, only 3-1 in OOB but is 10-1 overall

    jkeller was 5-0 in BM in 2022 but the new spreadsheet has him with 0 BM games. Results tab has BM games completed by him but none in the BM standings tab, so something isn’t working there.

    MrRoboto will weigh in.
    Good point-out, oysteilo


  • I love this idea. My personal feedback:

    • No need to have Elon ratings decay, but perhaps put an asterisk after the number if fewer than 3 games were played in the last 12 months or similar concept
      *the version of games are similar enough that we don’t need separate ratings for each version. You already can see the best OOB players are also top in BM.
      *I would like to see playoffs using this for bracketing assuming player has met the minimum number of games for the season.

  • @Arthur-Bomber-Harris At first I agree with the sentiment, but after a minute I definitely do not.

    Most players here prefer BM and have honed their skills for it.
    I think more of the better players are playing BM.

    If the data is accurate, and it may not be (see jkeller issue below)
    Anecdotal evidence:
    Pejon is #4 overall, 20-7

    He is 6-0 PtV, possibly a weaker field
    13-7 BM
    Maybe he plays higher competition in BM, I don’t know offhand. Those records add up to 19-7, may be some data entry errors - we probably need someone to double check. I don’t have time these days.

    Myygames is #1 in OOB with 7-0
    Is #10 overall when put together with everyone else.

    Again, could be data entry errors, could be strength of schedule differences.
    But many players, and probably Myygames, would like to know they’re #1 in OOB and not just #10 overall, especially when that’s the version they’re into.

    We also split the versions 3 years ago because we had the issue of what version to play in the playoffs.

    I don’t have time - I just slapped this response together but I hope it helps and stimulates your brain. Keep those ideas coming, and let me know what you think about my response if you want


  • @gamerman01 strength of schedule is way weaker in OOB which is why I can make the playoff finals in this division but would get crushed in BM with more top players.

    With ELO it gives people incentive to cross over to other versions if people have inappropriately low or high ratings. Knock down a few people who are most out of line with reality and then the reduced ratings cascade through the rest of the group as the consequences reverberate. It might not be absolutely perfect but I doubt people will end up too far away from where they should be.


  • all good points, but THERE is a reason why there are separate ratings…If I only play OOB and my opponent insists on BM, it is al starting over again. I think the separate ratings should stay and have play offs for all versons.

    I think the elo could work and I am not against that. Nice initiative and well explained


  • I mean in the play offs

  • '19 '18

    @oysteilo said in Proposal for a new, ELO-based, ranking system:

    intereasting ideas… from the new elo spreadsheet my overall rating is 1673, my OOb is1546 and BM is 1552

    How can my overall ranking be higher than then any of the two individual game version rankings?

    Here are your results.
    First your wins:

    14000360-baff-4504-8bc1-ade7b6012e67-image.png

    And here your losses

    d0c0a0cb-c53f-4197-98fd-f4189be7f1d3-image.png

    Your 3 BM4-wins have netted you 98+55+42 = 195 overall-rating and only 165 BM4-specific Rating.
    Your 4 OOB-wins have netted you 80+53+4+6 = 143 overall-rating but 210 OOB-specific rating

    Your single BM4-loss has cost you 64 overall-rating and 57 BM4-specific rating.
    Your 3 OOB-losses have cost you 101 overall-rating and 164 OOB-specific rating.

    Gamerman already gave the explanation: You defeated people in a version that they are weaker in.
    AetV had 1654 overall rating before, but only 1515 BM4-rating.
    ArthurBomberHarris had 1631 overall rating before, but 1570 OOB-Rating before

    and so on and so on.

    But: I noticed something else. Mr_stucifer has entered the 2022 data for me and he abbreviated Aequitas et veritas as AetV. This is a problem obviously since my sheet thinks those are 2 different players.

    I will have to look over the data myself to check for similar errors (spelling for example).

    The data for jkeller was all there and correct, just not visible. I simply forgot to list his name in the BM4-Sheet. So the data was calculated and everything was correct, but you couldn’t see it. Fixed that.


  • @Arthur-Bomber-Harris said in Proposal for a new, ELO-based, ranking system:

    No need to have Elon ratings decay, but perhaps put an asterisk after the number if fewer than 3 games were played in the last 12 months or similar concept

    This is already implemented. Grey background indicates “inactive” status, which is currently set at 1 year since last result.

    White background (and italic) means less than 2 completed games.

  • '19 '18

    @gamerman01 said in Proposal for a new, ELO-based, ranking system:

    Pejon is #4 overall, 20-7

    He is 6-0 PtV, possibly a weaker field
    13-7 BM

    Please note:
    While Pejon is “only” #4 in Overall and #1 in PtV, his Rating overall is significantly higher than his PtV rating.

    ae28f844-4ece-432d-bd45-fed23be2c35f-image.png

    Now there are of course multiple reasons for that, but one that jumps my eye immediately:
    There was a full year gap between his last two PtV results and in this time he increased his overall rating from 1641 to 1886, that’s a lot!

    He seems to be a great example for someone who improved over time!
    e64d1b04-fbff-41d8-a824-c300e3bed46c-image.png

    Notice how 4 of his 7 losses happened early in 2022. And the only 2 losses he had this year were against very very high rated players.

    So his 20-7 overall might not seem so impressive at first glance, but this simple win/loss ratio is hindered by a weaker early phase in the data. It doesn’t tell a story of improvement.
    His ELO however properly reflects that.

    One sidenote though:
    These early losses were against other people who are strong but had lower rating back then because this is where the data starts. As soon as I input earlier results, the ELO will become more and more accurate.

    This is a reminder that everyone can help me with this task.


  • Now jkeller’s BM4 results are in there…


  • @Arthur-Bomber-Harris said in Proposal for a new, ELO-based, ranking system:

    You already can see the best OOB players are also top in BM.

    The top 3 OOB players have literally played ZERO BM4 games.
    And #4 OOB Booper is only #18 in BM4
    #5 is you and you have a single reported BM4 game, would be ranked #15 with that.

    #6 OOB Farmboy is the first who really is also top in BM4


  • @MrRoboto said in Proposal for a new, ELO-based, ranking system:

    @Arthur-Bomber-Harris said in Proposal for a new, ELO-based, ranking system:

    You already can see the best OOB players are also top in BM.

    The top 3 OOB players have literally played ZERO BM4 games.
    And #4 OOB Booper is only #18 in BM4
    #5 is you and you have a single reported BM4 game, would be ranked #15 with that.

    #6 OOB Farmboy is the first who really is also top in BM4

    Maybe the overall rating serves zero to none purpose then??? Should we kill it?

    I think the ranking also reflekts personal prefereanse. AAB has one BM game where as Booper has 5 BM games and 4 wins. So it appears it hugely matters who you play in the different versions, why should this ipact overall rating? We dont use it, do we?

Suggested Topics

  • 23
  • 42
  • 184
  • 93
  • 342
  • 153
  • 52
  • 195
Axis & Allies Boardgaming Custom Painted Miniatures

24

Online

17.8k

Users

40.6k

Topics

1.8m

Posts