Home The Bridge

Further Proof that shuttle success % displayed is wrong

2

Comments

  • [SSR] GTMET[SSR] GTMET ✭✭✭✭✭
    I have developed serious doubt in the reliability of their event metrics during the last event

    These were my results vs displayed win percentage.

    azoe0bby6irb.png

    p value of binomial dist = 0.00000001518530

    Means there is only a 1 in 70 million chance that my results could occur by chance. Or in other words, with almost 100 samples, there is no way the server was calculating my shuttles with the displayed percentages.

  • WaldoMagWaldoMag ✭✭✭✭✭
    edited April 2018
    GTMET your data seems to be implying the boost does nothing.
    It would have been interesting if you would have recorded the displayed success rate for no boost of those same shuttles,

    However, do not fool yourself into thinking 88 is a large enough sample size. I do commend you for recording your data. As you add to this data, you should see it start to get closer to the actual success rate. The question is, is it the displayed success rate.

    For me I would like DB to regularly post the results, and add to it the standard deviation.

    Edit: I stand corrected. Your time boost results show it has nothing to do with the boosts.
  • Peachtree RexPeachtree Rex ✭✭✭✭✭
    I have developed serious doubt in the reliability of their event metrics during the last event

    These were my results vs displayed win percentage.

    azoe0bby6irb.png

    p value of binomial dist = 0.00000001518530

    Means there is only a 1 in 70 million chance that my results could occur by chance. Or in other words, with almost 100 samples, there is no way the server was calculating my shuttles with the displayed percentages.

    Given that you saw similar probability shifts with and without stat boosts (i.e., time boosts) would seem to indicate the problem isn't with the boosts.

    I'm not saying there isn't a problem (there clearly is), just that stat boosts don't appear to be the only (or even primary) source.
  • Peachtree RexPeachtree Rex ✭✭✭✭✭
    WaldoMag wrote: »
    GTMET your data seems to be implying the boost does nothing.
    It would have been interesting if you would have recorded the displayed success rate for no boost of those same shuttles,

    However, do not fool yourself into thinking 88 is a large enough sample size. I do commend you for recording your data. As you add to this data, you should see it start to get closer to the actual success rate. The question is, is it the displayed success rate.

    For me I would like DB to regularly post the results, and add to it the standard deviation.

    48 (Looking at just the 3* boosts) is actually a pretty decent sample size for something that is supposed to be at 93%. Consider that he should have observed 3 shuttle failures with 3* boosts and, instead, saw 11. That is 366% of expected failures. That is a HUGE disparity.

    Considering these similar results have been replicated across most all of the community, it's pretty clear there is some issue.
  • WebberoniWebberoni ✭✭✭✭✭
    edited April 2018
    I have tracked the last 4 or 5 faction events and my actual results have consistently been 10-15% lower that the posted success rates in the shuttle mission screen.

    The DB CSR explanation for this significant variance is due to a combination of personal and universal RNG. Whether the universal impact is legitimate or a convenient way to explain away a coding flaw, I have no doubt that powers beyond my crew/boosts are impacting the success rates of my shuttle missions during faction events.

    I just wish that the posted success rate, whether it's 0%, >95%, of anywhere in between, were more accurate. The frustration for players comes when actual results are consistently below what the game itself sets as the expectation, via the posted mission-by-mission success rates. I wouldn't mind being 80% successful, if my average posted success rate were much closer to 80%, instead of the 92-95% average I've had during the past few faction events.
  • This Sisko1This Sisko1 ✭✭✭✭✭
    It's been 2 months since I stopped tracking the data. DB is aware of the issue but won't do anything about it so it's not worth the effort for me to send them proof anymore. I was concerned when they last changed the formula and all of my shuttles were in the 90s. I would also like accuracy even if all of the shuttles were in the 70s. When someone tells you that you have a 95 percent chance of success and your failing all the time, you get self conscious.

    ymg46d82crcs.png

    https://forum.disruptorbeam.com/stt/discussion/1713/shuttle-mission-success-chances-during-events-post-your-data-here#latest
  • [SSR] GTMET[SSR] GTMET ✭✭✭✭✭
    WaldoMag wrote: »
    GTMET your data seems to be implying the boost does nothing.
    It would have been interesting if you would have recorded the displayed success rate for no boost of those same shuttles,

    However, do not fool yourself into thinking 88 is a large enough sample size. I do commend you for recording your data. As you add to this data, you should see it start to get closer to the actual success rate. The question is, is it the displayed success rate.

    Waldo, the 87.6% is the success rate without boosts, these were for time boosts that added (at best) 1%.

    Regarding the sample size, I am not fooling myself, I am trying to apply basic statistics. Please review and correct my logic if I am making an error.

    Null hypothesis is that the 92% success rate is accurate
    assumption of a normal distribution of data.

    I then performed Binomial Distribution p test for probability of at most 63 wins given an expected 91.9% success rate and 88 trials.

    According to excel that gives a p value of 0.000000019 allowing us to nearly certainly reject the null hypothesis that the success rate displayed is accurate.

    I may be making a mistake, but if I am not, 88 trials is more than enough given the very high displayed chance of success
  • WaldoMagWaldoMag ✭✭✭✭✭
    edited April 2018
    The problem is when DB. Post the results and their numbers are in the hundreds of thousands and their actual results are off by 0.1%


    I have to add, it is hard to make up the difference with a bad result. Reason just figure how many perfect shuttle runs you need to make your data match with displayed result. And then keep in mind if displayed is right, you still are going to get that percentage on the average. So you would have to be on the success side for quite a while to make up the result.

    Let us assume that DB. Uses in essence a 100 sided die with 1 the lowest and 100 the highest. You would have to collect a 100 squared worth of data to get close to the actual result. Therefore, to have an argument you need to have collected 10,000 results. I would also add if you collect 10,000 results I would say your data should fairly accurately show what the actual success rate is.


    Let me reiterate DB needs to post the result for every faction event. They need to include the plus deviation as well as the minus deviation from the mean.
  • Peachtree RexPeachtree Rex ✭✭✭✭✭
    WaldoMag wrote: »
    The problem is when DB. Post the results and their numbers are in the hundreds of thousands and their actual results are off by 0.1%


    I have to add, it is hard to make up the difference with a bad result. Reason just figure how many perfect shuttle runs you need to make your data match with displayed result. And then keep in mind if displayed is right, you still are going to get that percentage on the average. So you would have to be on the success side for quite a while to make up the result.

    Let us assume that DB. Uses in essence a 100 sided die with 1 the lowest and 100 the highest. You would have to collect a 100 squared worth of data to get close to the actual result. Therefore, to have an argument you need to have collected 10,000 results. I would also add if you collect 10,000 results I would say your data should fairly accurately show what the actual success rate is.


    Let me reiterate DB needs to post the result for every faction event. They need to include the plus deviation as well as the minus deviation from the mean.

    You might be correct if we were trying to prove the difference between 91% and 92%, but people are posting observed results that are 2-3 times the expected failure rates. When the disparity is that big, I would, instead, question the validity of DB's purported "hundreds of thousands" of results.
  • WaldoMagWaldoMag ✭✭✭✭✭
    edited April 2018
    Let me state one reason why I do not care.

    This behavior is the same for all.

    Nobody has an advantage. Unless, someone can figure what is going on. Then only then is there a problem. Because if people can figure out what is wrong they can crew their shuttles appropriately and get better results than everyone else.

    In answer to your statement above me, I question the standard deviation. This is something I have always questioned. DBs RNG is just too streaky.


    Edit: I have two accounts I send out my voyages almost at the same time. I am starting to notice that I get the same dilemmas on both voyages. This also makes me wonder what is going on with the seed for the RNG. I made the comment in the past of needing to use the date as well as time for the seed. When I say time the value usually used is in fraction of seconds not in minutes.
  • DralixDralix ✭✭✭✭✭
    When the disparity is that big, I would, instead, question the validity of DB's purported "hundreds of thousands" of results.

    If DB's analysis is based on the actual server used success rate, that would be useless to a question of whether the displayed success is correct, but still valid from that sense.
    WaldoMag wrote: »
    Nobody has an advantage. Unless, someone can figure what is going on. Then only then is there a problem. Because if people can figure out what is wrong they can crew their shuttles appropriately and get better results than everyone else.

    This is empirically false, because not everyone observes these variances. It's likely that there are specific criteria that cause the difference.
  • WaldoMagWaldoMag ✭✭✭✭✭
    Dralix wrote: »
    When the disparity is that big, I would, instead, question the validity of DB's purported "hundreds of thousands" of results.

    If DB's analysis is based on the actual server used success rate, that would be useless to a question of whether the displayed success is correct, but still valid from that sense.
    WaldoMag wrote: »
    Nobody has an advantage. Unless, someone can figure what is going on. Then only then is there a problem. Because if people can figure out what is wrong they can crew their shuttles appropriately and get better results than everyone else.

    This is empirically false, because not everyone observes these variances. It's likely that there are specific criteria that cause the difference.

    So you are telling me if I can figure out the criteria I do not have an advantage?
  • Peachtree RexPeachtree Rex ✭✭✭✭✭
    WaldoMag wrote: »
    Dralix wrote: »
    When the disparity is that big, I would, instead, question the validity of DB's purported "hundreds of thousands" of results.

    If DB's analysis is based on the actual server used success rate, that would be useless to a question of whether the displayed success is correct, but still valid from that sense.
    WaldoMag wrote: »
    Nobody has an advantage. Unless, someone can figure what is going on. Then only then is there a problem. Because if people can figure out what is wrong they can crew their shuttles appropriately and get better results than everyone else.

    This is empirically false, because not everyone observes these variances. It's likely that there are specific criteria that cause the difference.

    So you are telling me if I can figure out the criteria I do not have an advantage?

    No, I think he's saying that some people ALREADY have an advantage, even if they don't know what it is they are doing differently.
  • DralixDralix ✭✭✭✭✭
    No, I think he's saying that some people ALREADY have an advantage, even if they don't know what it is they are doing differently.

    Correct. I'm saying that "nobody has an advantage" or "This behavior is the same for all" (which I didn't quote), is false.
  • WaldoMagWaldoMag ✭✭✭✭✭
    edited April 2018
    WaldoMag wrote: »
    The problem is when DB. Post the results and their numbers are in the hundreds of thousands and their actual results are off by 0.1%


    I have to add, it is hard to make up the difference with a bad result. Reason just figure how many perfect shuttle runs you need to make your data match with displayed result. And then keep in mind if displayed is right, you still are going to get that percentage on the average. So you would have to be on the success side for quite a while to make up the result.

    Let us assume that DB. Uses in essence a 100 sided die with 1 the lowest and 100 the highest. You would have to collect a 100 squared worth of data to get close to the actual result. Therefore, to have an argument you need to have collected 10,000 results. I would also add if you collect 10,000 results I would say your data should fairly accurately show what the actual success rate is.


    Let me reiterate DB needs to post the result for every faction event. They need to include the plus deviation as well as the minus deviation from the mean.

    You might be correct if we were trying to prove the difference between 91% and 92%, but people are posting observed results that are 2-3 times the expected failure rates. When the disparity is that big, I would, instead, question the validity of DB's purported "hundreds of thousands" of results.

    If we believe the displayed result is wrong, we are trying to use the results seen to come up with the actual success rate. We do need 10000 data results, to give us faith that the rate we are seeing is very close to what the actual rate is.


    Edit : my comment is meaningless if DB is doing the seed for the RNG improperly. Because if they are doing it wrong we will not improve The result with more data gathering. We could all have the same 88 result you had. By the way I did not gather data but I did feel like I was getting 75% success when my displayed result was 91.5%.

    But as Dralix would say you need to have fact not feeling. I would gather data if I felt like it was useful. I just do not think anyone has an advantage.

    I must repeat myself. DB needs to show the results at the end of every faction event. This does not mean the displayed success rate is right. But it would show that the server expected success rate is matching with results. This would mean to me that everyone is treated equally. But deviation from mean, could tell whether there is a seed problem.
  • Peachtree RexPeachtree Rex ✭✭✭✭✭
    WaldoMag wrote: »
    WaldoMag wrote: »
    The problem is when DB. Post the results and their numbers are in the hundreds of thousands and their actual results are off by 0.1%


    I have to add, it is hard to make up the difference with a bad result. Reason just figure how many perfect shuttle runs you need to make your data match with displayed result. And then keep in mind if displayed is right, you still are going to get that percentage on the average. So you would have to be on the success side for quite a while to make up the result.

    Let us assume that DB. Uses in essence a 100 sided die with 1 the lowest and 100 the highest. You would have to collect a 100 squared worth of data to get close to the actual result. Therefore, to have an argument you need to have collected 10,000 results. I would also add if you collect 10,000 results I would say your data should fairly accurately show what the actual success rate is.


    Let me reiterate DB needs to post the result for every faction event. They need to include the plus deviation as well as the minus deviation from the mean.

    You might be correct if we were trying to prove the difference between 91% and 92%, but people are posting observed results that are 2-3 times the expected failure rates. When the disparity is that big, I would, instead, question the validity of DB's purported "hundreds of thousands" of results.

    If we believe the displayed result is wrong, we are trying to use the results seen to come up with the actual success rate. We do need 10000 data results, to give us faith that the rate we are seeing is very close to what the actual rate is.


    Edit : my comment is meaningless if DB is doing the seed for the RNG improperly. Because if they are doing it wrong we will not improve The result with more data gathering. We could all have the same 88 result you had. By the way I did not gather data but I did feel like I was getting 75% success when my displayed result was 91.5%.

    But as Dralix would say you need to have fact not feeling. I would gather data if I felt like it was useful. I just do not think anyone has an advantage.

    I think this discussion might be helpful to your sample size concerns:
    https://www.quora.com/How-many-coin-flips-are-required-to-figure-out-if-the-coin-is-biased
    no amount of coin flips is sufficient [to determine if it is biased] unless the coin is actually biased, in which case it would depend on the bias and how sure you want to be.

    This is where the binomial distribution comes in. It calculates the probability that an observed distribution could be due to chance relative to the expected distribution. The bigger the deviations, the fewer trials are necessary to show that the probability is due to "normal randomness".

    More writing on binomial distributions:
    http://www.statisticshowto.com/binomial-distribution-formula/
  • DralixDralix ✭✭✭✭✭
    WaldoMag wrote: »
    But as Dralix would say you need to have fact not feeling.

    Would I say that?
  • WaldoMagWaldoMag ✭✭✭✭✭
    Dralix wrote: »
    WaldoMag wrote: »
    But as Dralix would say you need to have fact not feeling.

    Would I say that?

    No you would add that my data set is too small to be meaningful.

  • DralixDralix ✭✭✭✭✭
    WaldoMag wrote: »
    Dralix wrote: »
    WaldoMag wrote: »
    But as Dralix would say you need to have fact not feeling.

    Would I say that?

    No you would add that my data set is too small to be meaningful.

    I could be wrong (I'm old and forgotten a lot of things) but I don't think I've ever said that to any serious data collection/analysis. In part because included in what I've forgotten is a lot of my math.

    I have made flippant comments when someone makes definitive conclusions based on anecdotal evidence.
  • WaldoMagWaldoMag ✭✭✭✭✭
    Dralix wrote: »
    No, I think he's saying that some people ALREADY have an advantage, even if they don't know what it is they are doing differently.

    Correct. I'm saying that "nobody has an advantage" or "This behavior is the same for all" (which I didn't quote), is false.


    So you are saying the code is different for everyone?
    If I figure out what is going on I have an advantage. If it is only for my device. I have an advantage. It certainly is not going to be coded for me personally.
  • Peachtree RexPeachtree Rex ✭✭✭✭✭
    WaldoMag wrote: »
    Dralix wrote: »
    No, I think he's saying that some people ALREADY have an advantage, even if they don't know what it is they are doing differently.

    Correct. I'm saying that "nobody has an advantage" or "This behavior is the same for all" (which I didn't quote), is false.


    So you are saying the code is different for everyone?
    If I figure out what is going on I have an advantage. If it is only for my device. I have an advantage. It certainly is not going to be coded for me personally.

    Many different things could be the cause of the error. Boost usage, how crew is allocated to different slots, etc. If someone is (by dumb luck) using a slightly different strategy that just so happens to account for the display error, then they would enjoy an advantage, even if they didn't know about it.
  • DralixDralix ✭✭✭✭✭
    WaldoMag wrote: »
    So you are saying the code is different for everyone?

    I'm not saying that at all. I'm saying that not everyone is experiencing the same wide variance between displayed and actual success rate.

    My theory is that there's a bug in the displayed percentage that triggers under certain conditions, that not everyone trips. One theory is with AND slots when your crew doesn't match both. If that were the case, then someone who always matches both skills would never trip it.
  • RaijinmeisterRaijinmeister ✭✭✭
    edited April 2018
    What really bothers me is a thread like this that would be simply solved making the system more transparent in-game or with a reasonable explanation straight from the devs go on and on forever based only on player's observations.

    What about being a bit more transparent about your 'RNG', DB? Like the real odds of win/lose on the gauntlet? Or do we need a Reddit upheaval to make you do the right thing?
  • WaldoMagWaldoMag ✭✭✭✭✭
    edited April 2018
    Dralix wrote: »
    WaldoMag wrote: »
    So you are saying the code is different for everyone?

    I'm not saying that at all. I'm saying that not everyone is experiencing the same wide variance between displayed and actual success rate.

    My theory is that there's a bug in the displayed percentage that triggers under certain conditions, that not everyone trips. One theory is with AND slots when your crew doesn't match both. If that were the case, then someone who always matches both skills would never trip it.

    I agree with what you are saying. But if we can figure out what the difference is we then would know the actual success rate and crew are ships for the best accurate success rate. Which would give an advantage. I do not know if one could actually figure out what the bug is though.

    Because we are dealing with a RNG, we can be fooled by results and think we figured it out. Only to get the same error the next time we try our solution.
  • Paladin 27Paladin 27 ✭✭✭✭✭
    edited April 2018
    My theory is as follows. I’ve staffed shuttles doing the average skill calcs myself in excel and using crew with first stat primary on and slots and I’ve had results in line to slightly above the displayed percentages the last few events.

    Observation:

    The displayed percentage caluculates average skill for AND missions as HIGH skill plus 1/4 LOW skill (e.g. for a dip and sci slot a character with 1000 dip and 400 sci and a character with 1000 sci and 400 dip will both have a 1100 average skill).

    Theory:

    The actual (server side percentage) calculates as FIRST skill plus 1/4 SECOND skill (e.g. for a dip and sci slot a character with 1000 dip and 400 sci would have an 1100 average skill but a 400 dip 1000 sci character would have a 650 average skill)
  • [SSR] GTMET[SSR] GTMET ✭✭✭✭✭
    This is where the binomial distribution comes in. It calculates the probability that an observed distribution could be due to chance relative to the expected distribution. The bigger the deviations, the fewer trials are necessary to show that the probability is due to "normal randomness".

    More writing on binomial distributions:
    http://www.statisticshowto.com/binomial-distribution-formula/

    Exactly Peachtree! And the p value is the measure to determine the likelihood that the coin is biased. In this case 88 trials is WAY more than enough when the results are off by ~20%.

    This is not to say that the actual % was 20% off, but if the server was calculating off of an 82% success rate instead of a 92% success rate, my p value changes to like 0.02. Still terribly bad luck, but much more possible.
  • WaldoMagWaldoMag ✭✭✭✭✭
    edited April 2018
    @Paladin 27
    That actually has some similarity to the OR slot when it comes to boost.
    If the slot has skill A OR B, a crew with a higher skill A than B when a boost to skill A is applied it only effects A. However, if the crew has a higher skill B the boost of A or B effects that crew's skill B. But in this case it is for the display of the success rate not sure if the actual success rate works like this. (Also I just found out about this so have yet to check event shuttles)

    Repeating to clarify. Do not like my wording in above.

    If a crew has a higher skill A than B that crew's A skill can only be boosted by an A boost, a B boost will boost his B skill.
    But a crew with higher B skill than A will get his B skill boosted by both A and B boost.
  • Data1001Data1001 ✭✭✭✭✭
    What really bothers me is a thread like this that would be simply solved making the system more transparent in-game or with a reasonable explanation straight from the devs go on and on forever based only on play observations.

    What about being a bit more transparent about your 'RNG', DB? Like the real odds of win/lose on the gauntlet?

    Such things will never, ever happen, unless DB is legally forced to do so, or required to do so in order to maintain their current distribution channels. No company is suddenly going to decide to show the man behind the curtain just because their customer base wants them to.

    i4q4lwkp4x6h.gif


    Could you please continue the petty bickering? I find it most intriguing.
    ~ Data, ST:TNG "Haven"
  • edited April 2018
    Assuming the reports here about CS responses came from well-informed staff and are therefore not bogus, one starts to wonder - what's the frigging point of messing with the global success/failure rate at all? If EVERYONE is affected, that rate shouldn't matter at all, and will only aggravate people, as they apparently and consistently artificially force a much lower rate than any of the rates displayed.
    The only possible reason that springs to mind is that they are trying to force people to invest more by artificially putting the breaks on the average returned VP (at least during events), trying to crank up the competition. Which would expose DB yet again as just a revenue-obsessed company caring about nothing else (so no surprise there :-/ ).
    However, even this argument is extremely weak, if not nonsensical: if EVERYONE is equally effected (i.e., the rate is lowered by the same amount for everybody), adjusting this will only have an effect on the outcome for the ABSOLUTE amount of VPs, not the relative amount. However, the only place where the absolute amount has any effect are the threshold rewards - and the total amount of VP required here is so low that most people achieving the last 4* reward here (=usually the top reward) will not even have to invest any DIL (let alone money) anyway.
    For all ranked rewards, nothing will change, even is the success rate is artificially lowered by half (or even doubled, for that matter) (assuming every player is continuously playing throughout the event).
  • [SSR] GTMET[SSR] GTMET ✭✭✭✭✭
    @shan it would be great if someone from DB would chime in on this speculation and interpretation of CS feedback:

    1) In events, are each players shuttles based only on their crew and difficulty, or is there some global effect that reduces the actual success rate?

    2) if the displayed success rates are intended to be accurate, would you someone from DB commit to working with the small number of players who have all recorded data that shows that event shuttles are nearly certainly not achieving the displayed success rate

    A reply and good faith effort would go a long way in restoring faith in the fairness and transparency of the event system.
Sign In or Register to comment.