Experiment & Findings: is the AND function bugged in events.
[SSR] GTMET
✭✭✭✭✭
Summary: Through experimental design and tracking, there is strong evidence to suggest an error occurring with how AND nodes are calculated vs how they are displayed.
Background: Last even I tracked my results and found that my results were extremely unlikely to be occurring due to chance. Below are my results and calculations with an astronomically low p-value of 0.000000019. Many people theorized that this is due to AND shuttles displaying as (Highest Skill + 25% * Lowest Skill), but calculating on the back-end as (1st Skill + 25% * 2nd Skill). This could lead to significant concordance or discordance depending on how crew were placed in those slots.
Experimental Design: In order to test this, I planned to run the first 2 days of this even with crew configured optimally as if the displayed % of success was correct. I would then switch at mid-day on Saturday to optimizing crew as if AND slots were calculated as 1st + 25% * 2nd Skill. I logged the % chance of success that was displayed and also utilized the formula from the wiki for shuttle success, with difficulty for 4000 VP shuttles set at 2000 (determined through goal seek).
For sake of the experiment, I tracked only 4000 VP shuttles as I don't have clear difficulty values for lower shuttles and the displayed rate for early shuttles often is well above 99.9 even though not displayed as such.
Results: In the first phase of the experiment, across 28 shuttles, the actual results showed significant discordance to displayed chance of success, 79% vs 92% (p=0.02). However when compared to calculated AND bug success, they compared favorably 79% vs 80% (p=0.52). This data likely rejects the null that the shuttles are performing according to their displayed rates
In the second phase, across 46 shuttles, the displayed and calculated chances of success were nearly identical at 93% and 92%, respectively. Actual shuttle results fared comparably at 89% of success (p=.22 & p=.31). As such we could not reject the null and these results appear to match the displayed chances of success.
Conclusion: There is statistically significant data to suggest that there is an error in how the AND nodes are being calculated relative to how they are intended to function and are being displayed.
Table 1. Results from prior event, observational only:
Table 2. Experimental results through use of AND slots:
Additional Notes: I probably should have transitioned to the 2nd phase of the experiment later to balance the shuttle runs more, I forgot to take into account the ramp up period cutting into the runs. That said, the p values suggest that there was still significant data to reject the null.
Although no data is provided, 3 fleet mates saw similar impact. They were running multiple Engineering AND Other slots, where they were primarily filling the "Other" trait and saw poor results despite 93-95% displayed rates. Upon switching to the 1st + 25% 2nd approach, they found their success rates return to the low to mid 90's.
Background: Last even I tracked my results and found that my results were extremely unlikely to be occurring due to chance. Below are my results and calculations with an astronomically low p-value of 0.000000019. Many people theorized that this is due to AND shuttles displaying as (Highest Skill + 25% * Lowest Skill), but calculating on the back-end as (1st Skill + 25% * 2nd Skill). This could lead to significant concordance or discordance depending on how crew were placed in those slots.
Experimental Design: In order to test this, I planned to run the first 2 days of this even with crew configured optimally as if the displayed % of success was correct. I would then switch at mid-day on Saturday to optimizing crew as if AND slots were calculated as 1st + 25% * 2nd Skill. I logged the % chance of success that was displayed and also utilized the formula from the wiki for shuttle success, with difficulty for 4000 VP shuttles set at 2000 (determined through goal seek).
For sake of the experiment, I tracked only 4000 VP shuttles as I don't have clear difficulty values for lower shuttles and the displayed rate for early shuttles often is well above 99.9 even though not displayed as such.
Results: In the first phase of the experiment, across 28 shuttles, the actual results showed significant discordance to displayed chance of success, 79% vs 92% (p=0.02). However when compared to calculated AND bug success, they compared favorably 79% vs 80% (p=0.52). This data likely rejects the null that the shuttles are performing according to their displayed rates
In the second phase, across 46 shuttles, the displayed and calculated chances of success were nearly identical at 93% and 92%, respectively. Actual shuttle results fared comparably at 89% of success (p=.22 & p=.31). As such we could not reject the null and these results appear to match the displayed chances of success.
Conclusion: There is statistically significant data to suggest that there is an error in how the AND nodes are being calculated relative to how they are intended to function and are being displayed.
Table 1. Results from prior event, observational only:
Table 2. Experimental results through use of AND slots:
Additional Notes: I probably should have transitioned to the 2nd phase of the experiment later to balance the shuttle runs more, I forgot to take into account the ramp up period cutting into the runs. That said, the p values suggest that there was still significant data to reject the null.
Although no data is provided, 3 fleet mates saw similar impact. They were running multiple Engineering AND Other slots, where they were primarily filling the "Other" trait and saw poor results despite 93-95% displayed rates. Upon switching to the 1st + 25% 2nd approach, they found their success rates return to the low to mid 90's.
17
Comments
Shan has said before that tagging her doesn't work. I imagine they disabled it for DB staff to cut down on spam.
I suggest you send this by PM.
https://stt.wiki/wiki/Faction_Missions#Augments
Federation:
Investigate Corruption
Rescue Citizen Captives
There are a bunch of 2 slot missions that have one AND slot, but I figured the single slots should be suuuper obvious if you slot in someone w/o DIP in DIP/CMD.
I should say I've tracked my missions off and on and it's usually 5-10 points below. I would try to match the skills but I don't always care. They haven't responded to our concerns other than to make fails give points (which made it harder to compete and pull away) but it would be nice to finally figure out what is going on and how to avoid it.
I can confirm the same for your last point. I was running the 3-seat "Emitters" mission with crew that were high in the non-ENG skill (and had no ENG skill at all), getting posted success rates of >95% (with 3* ENG boost).
I failed this particular mission during 5 straight waves of missions (repeated 2250 points 4 times, then once at 2750 points). Once I stopped using that mission, my actual success rates improved and were much more closely aligned with the posted success rates (still about 5-7% lower than posted).
Hopefully, this should yield a quick answer. One would assume that it should be easy for a programmer to look on the server side and see how shuttles are calculated (vs what we see as a displayed % on the client side).
Hopefully we can have a resolution of sorts by Thursday
That might be worth trying, but it also assumes that events are using the exact some code and logic set as non-event shuttles
I am running the other ones just to have more test data.
I should say this is probably a waste of time because the point was shown and a bug fix is being sent. But it will be fun to show it again and rest my mind that I wasn't crazy, I shouldnt have failed so much. Plus it adds a factor into all my shuttle run calculations that makes those a waste now.
Good Luck and God Speed!
That was the 3 slot mission.
Non-event max difficulty shuttles should always lead to a high 90% if not 99% success chance if you put in an immortal high-base legendary card with a blue skill boost.
Event max difficulty shuttles (4000 VP) should at least lead to around 85-90% without any 2x bonus crew if you use high-base (1250+) legendary cards.
Also, I think even if it was a bug, we should all get some serious compensation (a decent amount of highest quality 10x shuttle boost packs and maybe a pack of 20 transmissions for all factions), because we lost a ton of skill boosts to failed shuttles (and only got the trainers and 20% of the VP).
Also, all the potential faction loot that has gone due to these not supposed to happen fails should be looked at. I bet I've lost dozens if not hundreds of Holoprograms, Science Experiments, Case Files, Medical Experiments, Ancient Films, IDICs, Romulan Star Empire Icons, etc. etc. to these miscalculations, and I'm not the only one. Of course, this happened to all of us, but we all lost a lot if these calculations were really so far off.
There's one good thing: All these "complaint threads" about hilariously high failure rates compared to displayed success chances haven't been just "bad luck with RNG" ... we were all actually right ...
Once people started doing actual statistical analyses, it was unarguably clear something was wrong. It's why I always recommend people collect data if they feel something is amiss. It's very easy to dismiss thoughts and feelings, but data and facts are a lot sticker subjects.
Why buy extra boosts or speed up shuttles with dilithium, when most fail anyway, why buy event packs to get the crew with the highest bonus early, when 99% shuttles fail anyway? Frustrated players = less revenue. Its easy, but i wonder if or when DB finally gets it.
Second Star to the Right - Join Today!
100%. This is the most obvious bug and would have been the first place I'd have looked as a DB employee. That DB employees continued to dismiss months of player issues until it was forced down their throats is problem #1 at DB.
I remind you that when the portals were broken Shan reported that they had been tested live in production by the Senior QA engineer, an hour later DB admitted that the portal had been broken for months. So how were they tested and proven working when there was a bug preventing them from working?
Check out our website to find out more:
https://wiki.tenforwardloungers.com/
I think it's more likely that the client and server were checked independently and the results of the two were not compared.
Client passed the client tests. Server passed the server tests. No integration test to verify that the client and server success algorithms were equivalent.
😁
So I am not saying this as Joe Schmoe off the street.
That's still called being lazy and declaring things without really checking. Or ignorant. Either one works. You got a client/sever app, you test it end to end. Not just this end and that end.
I state this as someone who is coming up on 30 years in the tech industry from several perspectives and having worn different hats.
I've also been in meetings where folks have gotten eaten alive by a manager up to C-Level stated, WTF, if you test an app, you test it end to end, and I've been with companies that have lost contracts for this very same behavior.
I guess I keyed in on the "lie to her" part of your quote and not the "lazy" part. Insufficiently testing isn't necessarily "lazy", it can also be done out if ignorance or their team being unwilling to dedicate enough time/resources to properly investigate.
There are lots of potential reasons and we can speculate all day long, but I've found that malice is rarely the issue and apathy or ignorance are usually the culprit.
Good, better late than never.
Perhaps we should all get a Vreenak.