Home The Bridge
Options

Experiment & Findings: is the AND function bugged in events.

[SSR] GTMET[SSR] GTMET ✭✭✭✭✭
edited April 2018 in The Bridge
Summary: Through experimental design and tracking, there is strong evidence to suggest an error occurring with how AND nodes are calculated vs how they are displayed.

Background: Last even I tracked my results and found that my results were extremely unlikely to be occurring due to chance. Below are my results and calculations with an astronomically low p-value of 0.000000019. Many people theorized that this is due to AND shuttles displaying as (Highest Skill + 25% * Lowest Skill), but calculating on the back-end as (1st Skill + 25% * 2nd Skill). This could lead to significant concordance or discordance depending on how crew were placed in those slots.

Experimental Design: In order to test this, I planned to run the first 2 days of this even with crew configured optimally as if the displayed % of success was correct. I would then switch at mid-day on Saturday to optimizing crew as if AND slots were calculated as 1st + 25% * 2nd Skill. I logged the % chance of success that was displayed and also utilized the formula from the wiki for shuttle success, with difficulty for 4000 VP shuttles set at 2000 (determined through goal seek).

l8wms6pw5q2b.png

For sake of the experiment, I tracked only 4000 VP shuttles as I don't have clear difficulty values for lower shuttles and the displayed rate for early shuttles often is well above 99.9 even though not displayed as such.

Results: In the first phase of the experiment, across 28 shuttles, the actual results showed significant discordance to displayed chance of success, 79% vs 92% (p=0.02). However when compared to calculated AND bug success, they compared favorably 79% vs 80% (p=0.52). This data likely rejects the null that the shuttles are performing according to their displayed rates

In the second phase, across 46 shuttles, the displayed and calculated chances of success were nearly identical at 93% and 92%, respectively. Actual shuttle results fared comparably at 89% of success (p=.22 & p=.31). As such we could not reject the null and these results appear to match the displayed chances of success.

Conclusion: There is statistically significant data to suggest that there is an error in how the AND nodes are being calculated relative to how they are intended to function and are being displayed.

Table 1. Results from prior event, observational only:
co6slr32ene4.png

Table 2. Experimental results through use of AND slots:
xzor31sqe9lo.png


Additional Notes: I probably should have transitioned to the 2nd phase of the experiment later to balance the shuttle runs more, I forgot to take into account the ramp up period cutting into the runs. That said, the p values suggest that there was still significant data to reject the null.

Although no data is provided, 3 fleet mates saw similar impact. They were running multiple Engineering AND Other slots, where they were primarily filling the "Other" trait and saw poor results despite 93-95% displayed rates. Upon switching to the 1st + 25% 2nd approach, they found their success rates return to the low to mid 90's.
«13

Comments

  • Options
    [SSR] GTMET[SSR] GTMET ✭✭✭✭✭
    @Shan , @Black Pebble, it would be nice to get an official response to what we are experiencing and documenting, even if it is just to let us know that you are having someone investigate the potential bug.
  • Options
    This Sisko1This Sisko1 ✭✭✭✭✭
    Wow I'm going to review this some more in detail but at first glance, it's a heck of a job. Hope it leads to a booming thread because it's amazing work!
  • Options
    DralixDralix ✭✭✭✭✭
    Excellent work!
    @Shan , @Black Pebble, it would be nice to get an official response to what we are experiencing and documenting, even if it is just to let us know that you are having someone investigate the potential bug.

    Shan has said before that tagging her doesn't work. I imagine they disabled it for DB staff to cut down on spam.

    I suggest you send this by PM.
  • Options
    Peachtree RexPeachtree Rex ✭✭✭✭✭
    If that's your hypothesis, there are some faction missions that are 1-slot AND missions that you should be able to test out at a lot lower cost than using up your event time.

    https://stt.wiki/wiki/Faction_Missions#Augments
    Federation:
    Investigate Corruption
    Rescue Citizen Captives

    There are a bunch of 2 slot missions that have one AND slot, but I figured the single slots should be suuuper obvious if you slot in someone w/o DIP in DIP/CMD.
  • Options
    This Sisko1This Sisko1 ✭✭✭✭✭
    edited April 2018
    I'm willing to run on federation missions until the next event. Will try and track the data. Maybe I can team up with someone so they are running a similar percentage but using the opposite of me when one has both skills and the other does not.

    I should say I've tracked my missions off and on and it's usually 5-10 points below. I would try to match the skills but I don't always care. They haven't responded to our concerns other than to make fails give points (which made it harder to compete and pull away) but it would be nice to finally figure out what is going on and how to avoid it.
  • Options
    WebberoniWebberoni ✭✭✭✭✭
    edited April 2018
    Although no data is provided, 3 fleet mates saw similar impact. They were running multiple Engineering AND Other slots, where they were primarily filling the "Other" trait and saw poor results despite 93-95% displayed rates. Upon switching to the 1st + 25% 2nd approach, they found their success rates return to the low to mid 90's.

    I can confirm the same for your last point. I was running the 3-seat "Emitters" mission with crew that were high in the non-ENG skill (and had no ENG skill at all), getting posted success rates of >95% (with 3* ENG boost).

    I failed this particular mission during 5 straight waves of missions (repeated 2250 points 4 times, then once at 2750 points). Once I stopped using that mission, my actual success rates improved and were much more closely aligned with the posted success rates (still about 5-7% lower than posted).
  • Options
    No data to provide, just observation, but I used this hypothesis throughout the event. I had lots of opportunity to practice, because there were so many dang engineering spots, and poor Vorik was an Army of One. I was filling spaces with non-event crew that matched the AND skills with a focus on the first (say, Prof Scott for Eng AND COM rather than using a strong bonus crew with just COM). This lowered the displayed percentages (the bonus crew with COM would put the shuttle in the 90s, but I was instead running shuttles displaying 80-90%) My results appeared better, and more consistent with the % display than in past events. Obviously, just anecdotal evidence, but solid enough for my purposes. Thanks for the data!
  • Options
    Shan wrote: »
    I have seen this thread and I created a bug report linking to it.

    Hopefully, this should yield a quick answer. One would assume that it should be easy for a programmer to look on the server side and see how shuttles are calculated (vs what we see as a displayed % on the client side).

    Member of Rise of the Phoenix.
  • Options
    Peachtree RexPeachtree Rex ✭✭✭✭✭
    Shan wrote: »
    I have seen this thread and I created a bug report linking to it.

    Hopefully we can have a resolution of sorts by Thursday :)
  • Options
    [SSR] GTMET[SSR] GTMET ✭✭✭✭✭
    If that's your hypothesis, there are some faction missions that are 1-slot AND missions that you should be able to test out at a lot lower cost than using up your event time.

    https://stt.wiki/wiki/Faction_Missions#Augments
    Federation:
    Investigate Corruption
    Rescue Citizen Captives

    There are a bunch of 2 slot missions that have one AND slot, but I figured the single slots should be suuuper obvious if you slot in someone w/o DIP in DIP/CMD.

    That might be worth trying, but it also assumes that events are using the exact some code and logic set as non-event shuttles
  • Options
    ShanShan ✭✭✭✭✭
    I do not have an ETA for this but I do not expect to have an answer this week.
  • Options
    This Sisko1This Sisko1 ✭✭✭✭✭
    edited April 2018
    Alright it took me a while but finally figured it all out and will test the success percent for those 2 fed missions using crew that only have the second skill. So tkvma shows a success of 72 percent in investigate corruption but he doesn't have diplomacy. I would expect the actual success rate to be 30s. I'm having others do the opposite and fill the diplomacy part with a crew that doesn't have cmd. Any and all spectators are welcome to run similar tests to increase the same size.

    I am running the other ones just to have more test data.

    2zadeggl9j4r.png
    e15lrlkxzm4l.png

    I should say this is probably a waste of time because the point was shown and a bug fix is being sent. But it will be fun to show it again and rest my mind that I wasn't crazy, I shouldnt have failed so much. Plus it adds a factor into all my shuttle run calculations that makes those a waste now.
  • Options
    Odo MarmarosaOdo Marmarosa ✭✭✭✭✭
    Nice work. There has been alot of speculation, and it's good that someone ran an actual experiment.
  • Options
    Alright it took me a while but finally figured it all out and will test the success percent for those 2 fed missions using crew that only have the second skill. So tkvma shows a success of 72 percent in investigate corruption but he doesn't have diplomacy. I would expect the actual success rate to be 30s. I'm having others do the opposite and fill the diplomacy part with a crew that doesn't have cmd. Any and all spectators are welcome to run similar tests to increase the same size.

    I am running the other ones just to have more test data.



    Good Luck and God Speed!
    Member of Rise of the Phoenix.
  • Options
    So thats why almost all my shuttles with full event crew and 91% displayed success failed 90% of time, because the first skill was ENG and none of them has ENG skill only by the boosters.
    That was the 3 slot mission.
  • Options
    JhamelJhamel ✭✭✭✭✭
    edited April 2018
    So what if this is all correct ... of course everyone has been affected by the same bug, but ... it means we have all been shown false success chances for all this whole time (even if not meant maliciously, but still). Even if it was a bug - which I hope it was - it will require a lot of tweaking to actual difficulty of shuttles.

    Non-event max difficulty shuttles should always lead to a high 90% if not 99% success chance if you put in an immortal high-base legendary card with a blue skill boost.

    Event max difficulty shuttles (4000 VP) should at least lead to around 85-90% without any 2x bonus crew if you use high-base (1250+) legendary cards.

    Also, I think even if it was a bug, we should all get some serious compensation (a decent amount of highest quality 10x shuttle boost packs and maybe a pack of 20 transmissions for all factions), because we lost a ton of skill boosts to failed shuttles (and only got the trainers and 20% of the VP).

    Also, all the potential faction loot that has gone due to these not supposed to happen fails should be looked at. I bet I've lost dozens if not hundreds of Holoprograms, Science Experiments, Case Files, Medical Experiments, Ancient Films, IDICs, Romulan Star Empire Icons, etc. etc. to these miscalculations, and I'm not the only one. Of course, this happened to all of us, but we all lost a lot if these calculations were really so far off.

    There's one good thing: All these "complaint threads" about hilariously high failure rates compared to displayed success chances haven't been just "bad luck with RNG" ... we were all actually right ...
    "Everything about the Jem'Hadar is lethal!" - Eris (ST-DS9 Episode 2x26 "The Jem'Hadar")
  • Options
    Peachtree RexPeachtree Rex ✭✭✭✭✭
    So what if this is all correct ...of course everyone has been affected by the same bug, but ... it means we have all been shown false success chances for all this whole time (even if not meant maliciously, but still). Even if it was a bug - which I hope it was - it will require a lot of tweaking to actual difficulty of shuttles.

    Non-event max difficulty shuttles should always lead to a high 90% if not 99% success chance if you put in an immortal high-base legendary card with a blue skill boost.

    Event max difficulty shuttles (4000 VP) should at least lead to around 85-90% without any 2x bonus crew if you use high-base (1250+) legendary cards.

    Also, I think even if it was a bug, we should all get some serious compensation (a decent amount of highest quality 10x shuttle boost packs and maybe a pack of 20 transmissions for all factions), because we lost a ton of skill boosts to failed shuttles (and only got the trainers and 20% of the VP).

    Also, all the potential faction loot that has gone due to these not supposed to happen fails should be looked at. I bet I've lost dozens if not hundreds of Holoprograms, Science Experiments, Case Files, Medical Experiments, Ancient Films, IDICs, Romulan Star Empire Icons, etc. etc. to these miscalculations, and I'm not the only one. Of course, this happened to all of us, but we all lost a lot if these calculations were really so far off.

    There's one good thing: All these "complaint threads" about hilariously high failure rates compared to displayed success chances haven't been just "bad luck with RNG" ... we were all actually right ...

    Once people started doing actual statistical analyses, it was unarguably clear something was wrong. It's why I always recommend people collect data if they feel something is amiss. It's very easy to dismiss thoughts and feelings, but data and facts are a lot sticker subjects.
  • Options
    Zombie Squirrel Zombie Squirrel ✭✭✭✭✭
    It should be in DBs interest as well to get it fixed too, cause too many shuttle failures for shuttles with high success percentages can lead to less revenue.

    Why buy extra boosts or speed up shuttles with dilithium, when most fail anyway, why buy event packs to get the crew with the highest bonus early, when 99% shuttles fail anyway? Frustrated players = less revenue. Its easy, but i wonder if or when DB finally gets it. ;)
    •SSR Delta Flyers•
  • Options
    SSR BarkleySSR Barkley ✭✭✭✭✭
    don't you know? we're all beta testers.
    /SSR/ Barkley - semi retired
    Second Star to the Right - Join Today!
  • Options
    AviTrekAviTrek ✭✭✭✭✭
    Pallidyne wrote: »
    Frankly the fact that it took this level of analysis from non-DB employees says something about not only the QC process, but the "everything is working perfectly" mantra that can not be taken at face value for anything.

    100%. This is the most obvious bug and would have been the first place I'd have looked as a DB employee. That DB employees continued to dismiss months of player issues until it was forced down their throats is problem #1 at DB.

    I remind you that when the portals were broken Shan reported that they had been tested live in production by the Senior QA engineer, an hour later DB admitted that the portal had been broken for months. So how were they tested and proven working when there was a bug preventing them from working?
  • Options
    The effort that has gone into proving this 'now-probable' bug is impressive. Hats off Sirs!!!
    Ten Forward Loungers - Give Your Best, Get Our Best!
    Check out our website to find out more:
    https://wiki.tenforwardloungers.com/
  • Options
    Peachtree RexPeachtree Rex ✭✭✭✭✭
    Pallidyne wrote: »
    And just as a disclaimer or clarification, I really think Shan in sincere. I think some folks in DB think it's OK to lie to her or simply lazily declare things to her without really checking.

    I think it's more likely that the client and server were checked independently and the results of the two were not compared.

    Client passed the client tests. Server passed the server tests. No integration test to verify that the client and server success algorithms were equivalent.
  • Options
    DavideBooksDavideBooks ✭✭✭✭✭
    edited April 2018
    Don't worry. They can always send out a Mirror Picard as recompense.
    😁
  • Options
    PallidynePallidyne ✭✭✭✭✭
    edited April 2018
    Pallidyne wrote: »
    And just as a disclaimer or clarification, I really think Shan in sincere. I think some folks in DB think it's OK to lie to her or simply lazily declare things to her without really checking.

    I think it's more likely that the client and server were checked independently and the results of the two were not compared.

    Client passed the client tests. Server passed the server tests. No integration test to verify that the client and server success algorithms were equivalent.

    So I am not saying this as Joe Schmoe off the street.

    That's still called being lazy and declaring things without really checking. Or ignorant. Either one works. You got a client/sever app, you test it end to end. Not just this end and that end.

    I state this as someone who is coming up on 30 years in the tech industry from several perspectives and having worn different hats.

    I've also been in meetings where folks have gotten eaten alive by a manager up to C-Level stated, WTF, if you test an app, you test it end to end, and I've been with companies that have lost contracts for this very same behavior.
  • Options
    Peachtree RexPeachtree Rex ✭✭✭✭✭
    Pallidyne wrote: »
    Pallidyne wrote: »
    And just as a disclaimer or clarification, I really think Shan in sincere. I think some folks in DB think it's OK to lie to her or simply lazily declare things to her without really checking.

    I think it's more likely that the client and server were checked independently and the results of the two were not compared.

    Client passed the client tests. Server passed the server tests. No integration test to verify that the client and server success algorithms were equivalent.

    So I am not saying this as Joe Schmoe off the street.

    That's still called being lazy and declaring things without really checking. Or ignorant. Either one works. You got a client/sever app, you test it end to end. Not just this end and that end.

    I state this as someone who is coming up on 30 years in the tech industry from several perspectives and having worn different hats.

    I've also been in meetings where folks have gotten eaten alive by a manager up to C-Level stated, WTF, if you test an app, you test it end to end, and I've been with companies that have lost contracts for this very same behavior.

    I guess I keyed in on the "lie to her" part of your quote and not the "lazy" part. Insufficiently testing isn't necessarily "lazy", it can also be done out if ignorance or their team being unwilling to dedicate enough time/resources to properly investigate.

    There are lots of potential reasons and we can speculate all day long, but I've found that malice is rarely the issue and apathy or ignorance are usually the culprit.
  • Options
    Shan wrote: »
    I have seen this thread and I created a bug report linking to it.

    Good, better late than never.
Sign In or Register to comment.