Home Engineering Room

Don't ya just love DB/TP "math"?

2

Comments

  • Prime LorcaPrime Lorca ✭✭✭✭✭
    @Cpt_insano_2k1 , do you owe the OP an apology for deviating from the original intent? Or do you change the subject when your arguments are broken as a debate tactic?
    Farewell 🖖
  • Cpt_insano_2k1Cpt_insano_2k1 ✭✭✭✭✭
    The point made by my argument is still a valid one.

    However, out of respect for the OP’s intent of this post, it is not a point that is directly relevant. Apology enough? Can we get back on track now? @Prime Lorca [10FH]
  • Prime LorcaPrime Lorca ✭✭✭✭✭
    The point made by my argument is still a valid one.

    However, out of respect for the OP’s intent of this post, it is not a point that is directly relevant. Apology enough? Can we get back on track now? @Prime Lorca [10FH]

    That's up to you. You seem intent to be the arbiter of what is related, what is relevant, when someone should or shouldn't address a certain point. So please, don't let me stop you from moderating the thread.
    Farewell 🖖
  • Cpt_insano_2k1Cpt_insano_2k1 ✭✭✭✭✭
    I suppose my thinking was that this forum would be one that engineers were more likely to monitor. And to that line of thought, the displayed shuttle success probability vs observed outcome seems askew. The gauntlet perceived probability of success vs observed outcome seems askew. Simple as that.

    Just a “please fix this” thread to the engineers, not a “ lets debate whether or not you are ok with this” thread to the players. But I will certainly back off if the intent of the engineering forum is solely to have the white knights come in and bash people for making valid observations and asking for some simple fixes.
  • Drago MusevniDrago Musevni ✭✭✭✭
    65%er vs 5%er. I have better roll stats in both categories. I crit more times and still lose. Tell me again how the system is working as intended?

    lwf6skogwi9t.png
    01000010 01101001 01110100 01100101 00100000 01101101 01111001 00100000 01110011 01101000 01101001 01101110 01111001 00100000 01101101 01100101 01110100 01100001 01101100 00100000 01100001 01110011 01110011 00100001
  • Prime LorcaPrime Lorca ✭✭✭✭✭
    Testing the shuttles seems straight forward. Chart the percentage, then chart whether they succeed or fail. Gauntlet seems much more difficult. I don't know how much you actually need. I made a list below, which seems pretty daunting. So overwhelming that I probably would not participate because of what it would require. But here's what I came up with.

    Player input:
    - Your min (skill 1)
    - Your max (skill 1)
    - Your min (skill 2)
    - Your max (skill 2)
    - Your crit percentage
    - Your total score
    - Opponent min (skill 1)
    - Opponent max (skill 1)
    - Opponent min (skill 2)
    - Opponent max (skill 2)
    - Opponent crit percentage
    - Opponent total score
    - Win streak before match

    Calculations/simulations:
    - (Spreadsheet formula to document skill spreads and/or averages is optional)
    - Calculation or simulation to determine odds of success
    - Formula to document winner

    That's a start. I don't know how much can be removed or what may need to be added. But that list alone is more effort than I care to put into gauntlet. Hopefully this is helpful in getting the thread back on track. :)
    Farewell 🖖
  • Cpt_insano_2k1Cpt_insano_2k1 ✭✭✭✭✭
    Testing the shuttles seems straight forward. Chart the percentage, then chart whether they succeed or fail. Gauntlet seems much more difficult. I don't know how much you actually need. I made a list below, which seems pretty daunting. So overwhelming that I probably would not participate because of what it would require. But here's what I came up with.

    Player input:
    - Your min (skill 1)
    - Your max (skill 1)
    - Your min (skill 2)
    - Your max (skill 2)
    - Your crit percentage
    - Your total score
    - Opponent min (skill 1)
    - Opponent max (skill 1)
    - Opponent min (skill 2)
    - Opponent max (skill 2)
    - Opponent crit percentage
    - Opponent total score
    - Win streak before match

    Calculations/simulations:
    - (Spreadsheet formula to document skill spreads and/or averages is optional)
    - Calculation or simulation to determine odds of success
    - Formula to document winner

    That's a start. I don't know how much can be removed or what may need to be added. But that list alone is more effort than I care to put into gauntlet. Hopefully this is helpful in getting the thread back on track. :)

    Also need to include the # of trophies you stand to win from the battle. (This is the key to the subtle cues that was mentioned before)

    And i agree, tracking all of these metrics would be challenging. If this was done as a “community pool” it would be highly probable that data entry errors would occur since there is no way to automatically record this data and it would be entirely dependent upon individuals q/c’ing their own data. I would be hesitant to trust this analysis as a community dataset.

    Shuttles would be more straight forward, and the upcoming event would be the time to do it.

    Just need to wait to collect data until the 4,000 ep threshold, track the mission name, shuttle success probability, any boosts used, and mission outcome. Individuals who participate for the whole event will have sufficient sample sizes on their own without the need to aggregate the data. To make it more statistically sound with less individual variance or noise, the missions should be crewed and boosted identically for the duration of the event, ie same crew, same boost (skill or time) to control for other factors and try to isolate the displayed success probability as the tested feature. Also, doing it this way would simplify data recording, as you would only need to record “success/fail” as all other variables would be the same for each observation.
  • Prime LorcaPrime Lorca ✭✭✭✭✭
    Testing the shuttles seems straight forward. Chart the percentage, then chart whether they succeed or fail. Gauntlet seems much more difficult. I don't know how much you actually need. I made a list below, which seems pretty daunting. So overwhelming that I probably would not participate because of what it would require. But here's what I came up with.

    Player input:
    - Your min (skill 1)
    - Your max (skill 1)
    - Your min (skill 2)
    - Your max (skill 2)
    - Your crit percentage
    - Your total score
    - Opponent min (skill 1)
    - Opponent max (skill 1)
    - Opponent min (skill 2)
    - Opponent max (skill 2)
    - Opponent crit percentage
    - Opponent total score
    - Win streak before match

    Calculations/simulations:
    - (Spreadsheet formula to document skill spreads and/or averages is optional)
    - Calculation or simulation to determine odds of success
    - Formula to document winner

    That's a start. I don't know how much can be removed or what may need to be added. But that list alone is more effort than I care to put into gauntlet. Hopefully this is helpful in getting the thread back on track. :)

    Also need to include the # of trophies you stand to win from the battle. (This is the key to the subtle cues that was mentioned before)

    And i agree, tracking all of these metrics would be challenging. If this was done as a “community pool” it would be highly probable that data entry errors would occur since there is no way to automatically record this data and it would be entirely dependent upon individuals q/c’ing their own data. I would be hesitant to trust this analysis as a community dataset.

    Shuttles would be more straight forward, and the upcoming event would be the time to do it.

    Just need to wait to collect data until the 4,000 ep threshold, track the mission name, shuttle success probability, any boosts used, and mission outcome. Individuals who participate for the whole event will have sufficient sample sizes on their own without the need to aggregate the data. To make it more statistically sound with less individual variance or noise, the missions should be crewed and boosted identically for the duration of the event, ie same crew, same boost (skill or time) to control for other factors and try to isolate the displayed success probability as the tested feature. Also, doing it this way would simplify data recording, as you would only need to record “success/fail” as all other variables would be the same for each observation.

    g6stj2q11ibb.jpg

    Seriously, though... I think you're right about the gauntlet, but wrong about shuttles. I just couldn't help myself with the meme. If you just want to test the percentages as they are, then the community can handle that. If you want to isolate certain test parameters to get nuanced information, then you would want to have a small group and consider using PM's to coordinate and test isolated variables.
    Farewell 🖖
  • Cpt_insano_2k1Cpt_insano_2k1 ✭✭✭✭✭
    Bylo Band wrote: »
    If you don't trust us but you expect us to trust you, then what are we even doing here?

    Please stop this. There was nothing in my reply that said “i dont trust you”. I made it as clear as i could: expecting everyone to record such a robust set of variables is undoubtedly going to result in data entry issues, and therefore create problems with a statistically sound analysis.
  • Cpt_insano_2k1Cpt_insano_2k1 ✭✭✭✭✭
    edited May 2020
    As far as testing shuttles, my proposed test is as follows:

    Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.

    The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.

    If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.

    Example data sheet:

    Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)

    If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.
  • Prime LorcaPrime Lorca ✭✭✭✭✭
    As far as testing shuttles, my proposed test is as follows:

    Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.

    The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.

    If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.

    Example data sheet:

    Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)

    If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.

    Please let me know if this covers what you want. Imo, the crew slots are more for reference than mandatory, so that's why I put them to the right. I can modify the spreadsheet, or you can if you want. I went with successes and failures because it's easy to just add one to whichever column is appropriate. It can be changed if you want.

    https://docs.google.com/spreadsheets/d/1LY1cYAqmMGqRup1-7eQeUA83YDGZOz-r42X9BjbZiro/edit?usp=drivesdk
    Farewell 🖖
  • Cpt_insano_2k1Cpt_insano_2k1 ✭✭✭✭✭
    As far as testing shuttles, my proposed test is as follows:

    Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.

    The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.

    If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.

    Example data sheet:

    Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)

    If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.

    Please let me know if this covers what you want. Imo, the crew slots are more for reference than mandatory, so that's why I put them to the right. I can modify the spreadsheet, or you can if you want. I went with successes and failures because it's easy to just add one to whichever column is appropriate. It can be changed if you want.

    https://docs.google.com/spreadsheets/d/1LY1cYAqmMGqRup1-7eQeUA83YDGZOz-r42X9BjbZiro/edit?usp=drivesdk

    Thank you for putting that together! Looks pretty good, we could probably tweek it slightly as we gather feedback from folks and think through the process a bit.

    yes the crew is just for reference to help make sure that each attempt at a mission is identical for the individual who sends the shuttle.
  • Prime LorcaPrime Lorca ✭✭✭✭✭
    As far as testing shuttles, my proposed test is as follows:

    Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.

    The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.

    If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.

    Example data sheet:

    Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)

    If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.

    Please let me know if this covers what you want. Imo, the crew slots are more for reference than mandatory, so that's why I put them to the right. I can modify the spreadsheet, or you can if you want. I went with successes and failures because it's easy to just add one to whichever column is appropriate. It can be changed if you want.

    https://docs.google.com/spreadsheets/d/1LY1cYAqmMGqRup1-7eQeUA83YDGZOz-r42X9BjbZiro/edit?usp=drivesdk

    Thank you for putting that together! Looks pretty good, we could probably tweek it slightly as we gather feedback from folks and think through the process a bit.

    yes the crew is just for reference to help make sure that each attempt at a mission is identical for the individual who sends the shuttle.

    It just occurred to me that adding the captain name to the left and moving the mission name to the right seems helpful.
    (Thanks for not shutting the thread down, Shan.)
    Farewell 🖖
  • MirrorVerse JcMirrorVerse Jc ✭✭✭
    edited May 2020
    This all just happened within a single refreshed gauntlet. What i find most commonly Is that I am more likely to lose the majority or all of an entire set. If I lose the first round it continues until all crew are disabled.
    And yes. I have 2 Armus

    v42eumz46g1p.jpeg
    94e9px0g77z2.jpeg
    vl3ua97lfg0y.png
    l8rgvc66dusa.png


  • MirrorVerse JcMirrorVerse Jc ✭✭✭
    edited May 2020
    Another full round this happens.
    jx6inr2mn4mg.jpeg
    0mslft3u53d4.jpeg
    tq1l4zqzo9kg.png
    mfwogca8a95j.png
    zfi7dzgcbzon.png
    tl7iam29up3i.png
    2phzsjk7fimr.png
  • Fi®3wallFi®3wall ✭✭✭
    lol
  • What, if anything, Peachtree's analysis can say is that players who have access to displayed percentages wind up having match outcome results similar to the displayed percentages in aggregate.

    The coverage bias is beyond obvious. But I guess I should just point and laugh at peoples' ignorance since that's the cool thing for very helpful forum denizens round these parts to do.

    It is a real shame that there are so many players who don't understand prob & stats, but the bigger shame lies with those who mock them and undermine the good faith reporting of potential issues. Wielding a flawed aggregate as a cudgel when discussing a specific result doesn't make you part of the solution. Errors can and do hide in aggregate results.
  • Prime LorcaPrime Lorca ✭✭✭✭✭
    For the record, I ran almost all of my shuttles at 97% this weekend. My spreadsheet earlier in the thread has my documentation. I'll be the first to admit that it literally looks unbelievable. I kept checking my VP through the event to see if that hundreds digit changed. It did not. The last three digits were 898 the whole time after I hit the 4k shuttles. So, hitting 100% success in 108 shuttles is within the 5% margin of error suggested earlier in the thread. It's a shame that no one else used the spreadsheet. I think 108 shuttles is barely acceptable in terms of sample size. And given the outcome, I am curious to see if it could be duplicated, even within one or two failures.

    I don't have the patience to track gauntlet data, but in the spirit of cooperation, I did my best with my shuttles. It will be interesting to test other percentages when I go for a higher rank in the future. My squadron did not track their data, but they expressed some frustrations with shuttles rated below 95%. One guy claimed two rounds where three out of four failed at 85-92%. Maybe those were outliers and the other shuttles balanced that out. I don't know. Maybe the experiment can continue in a couple weeks if anyone is interested.

    Personally, I think there's something magical that happens at 96% and above. I don't have any evidence beyond this weekend, so I am curious to test it.
    Farewell 🖖
  • Cpt_insano_2k1Cpt_insano_2k1 ✭✭✭✭✭
    edited May 2020
    Like lorca, i also tracked success % this event. I sent 6 shuttles at a time, all with 3* time reducers (no skill boosts) and sent them with the same crew throughout the event. In total, i got 30 repetitions for each shuttle mission with expected/observed success listed below:

    96% / 96.67%
    95% / 83.33%**
    90% / 86.67%
    90% / 86.67%
    86% / 90%**
    83%/ 80%

    The outliers are denoted with “**”. The most surprising result was the 95% shuttle mission which resulted in 83% success. The fails came consistently throughout the event for this mission, there wasn’t really a “bad streak of luck”, it simply failed more often than the others. There were 25 successes and 5 failures for this mission, in order for it to recover and perform “as expected”, it would have needed 70 consecutive successes by the end of the event.

    If we pool all of my shuttle runs into one group, the expected success would be 90%, and observed success was 87.22%. Pretty reasonable, in my opinion, but i think its important to point out that the observed/expected success would have been spot on if it wasnt for the one pool of repetitions from the 95% success mission.

    Lorca and i are discussing design and strategy to attempt to test this 95% success threshold, as well as the potential to test the “AND” skill seats in shuttles.

    In general, i experienced way less failures this event than what i am typically used to seeing and tracking results and sending out identical shuttle missions was relatively simple. I am definitely interested in further investigating shuttle success though and i would like to invite people to participate in a group effort next full faction event.
  • @Cpt_insano_2k1 Do you know if you used shared crew regularly on those 95% shuttles?

    I've heard anecdotal evidence that squad shares can produce unexpected fails.
  • Prime LorcaPrime Lorca ✭✭✭✭✭
    @Cpt_insano_2k1 Do you know if you used shared crew regularly on those 95% shuttles?

    I've heard anecdotal evidence that squad shares can produce unexpected fails.

    I know you weren't asking me, but I used the squad share in one of my shuttles and did not have any failures. That was the shuttle that I time boosted for the full event, so it had the most runs at 34.
    Farewell 🖖
  • @Cpt_insano_2k1 Do you know if you used shared crew regularly on those 95% shuttles?

    I've heard anecdotal evidence that squad shares can produce unexpected fails.

    I know you weren't asking me, but I used the squad share in one of my shuttles and did not have any failures. That was the shuttle that I time boosted for the full event, so it had the most runs at 34.

    I appreciate the response all the same. :)

    In a perfect world I'd want multiple users running the exact same line up to test. I can't imagine the logistics of arranging even one same shuttle/line up test, let alone multiple shuttle/crew variations. It would be "fun" to test this in TP's demo environment, but even then demo doesn't always match production... 🤷

    That said, your report here matches with my personal experience. There could be something more exotic going on, but likely not a general feature failure like the AND bug. Always keen to consider other reports too.
Sign In or Register to comment.