@Cpt_insano_2k1 , do you owe the OP an apology for deviating from the original intent? Or do you change the subject when your arguments are broken as a debate tactic?
The point made by my argument is still a valid one.
However, out of respect for the OP’s intent of this post, it is not a point that is directly relevant. Apology enough? Can we get back on track now? @Prime Lorca [10FH]
The point made by my argument is still a valid one.
However, out of respect for the OP’s intent of this post, it is not a point that is directly relevant. Apology enough? Can we get back on track now? @Prime Lorca [10FH]
That's up to you. You seem intent to be the arbiter of what is related, what is relevant, when someone should or shouldn't address a certain point. So please, don't let me stop you from moderating the thread.
I suppose my thinking was that this forum would be one that engineers were more likely to monitor. And to that line of thought, the displayed shuttle success probability vs observed outcome seems askew. The gauntlet perceived probability of success vs observed outcome seems askew. Simple as that.
Just a “please fix this” thread to the engineers, not a “ lets debate whether or not you are ok with this” thread to the players. But I will certainly back off if the intent of the engineering forum is solely to have the white knights come in and bash people for making valid observations and asking for some simple fixes.
But I will certainly back off if the intent of the engineering forum is solely to have the white knights come in and bash people for making valid observations and asking for some simple fixes.
It seems to me that most of the "bashing" is being done by the 'dark knights' (to put a twist on your phrase), who can't stand that anyone has a totally different experience (even those who may have played a lot more than them), and that their opinion is the only valid one.
Could you please continue the petty bickering? I find it most intriguing. ~ Data, ST:TNG "Haven"
I suppose my thinking was that this forum would be one that engineers were more likely to monitor. And to that line of thought, the displayed shuttle success probability vs observed outcome seems askew. The gauntlet perceived probability of success vs observed outcome seems askew. Simple as that.
Just a “please fix this” thread to the engineers, not a “ lets debate whether or not you are ok with this” thread to the players. But I will certainly back off if the intent of the engineering forum is solely to have the white knights come in and bash people for making valid observations and asking for some simple fixes.
I'm pretty sure the gauntlet isn't rigged. It just has some totally wild RNG. Calling people "white knights" doesn't change that. Still not arguing about the shuttle percentage thing.
The thing about the forum... Yes, some rooms are monitored more closely than others. But the forums are pretty public. People on the forum try to be helpful when they think they can help. You may not know this yet, but it's the next best thing to hearing something from the devs. There's a very good chance that they do not respond to this thread or say that these things have been addressed and are not "bugs," but features of the game. I hope they respond. It's always nice to hear from them.
That part about "lets debate whether or not you are ok with this” is like saying you get to say whose opinion is valid. (Shan, I'm sorry for all the snipping you'll do on Monday.) I would say that it's like only the OP should have a voice, but you have no problem voicing your opinion too. So... Please. Let others post. And maybe you should stop posting if you're truly waiting for the devs. You can end this. Stop posting insults and nonsense. I'll have no need to keep going. I'll hop away, two squares back and one to the right.
@Cpt_insano_2k1, the way you talk to people, the tone of voice you choose to employ, and your autocratic need to be the ultimate authority in every thread you "moderate" is totally out of line. You employ hearsay as evidence, you attack everyone who asks you to defend your claims, you insult people who do not agree with you (i.e. frequent use of the derogatory slur "white knight"), you superficially dismiss actual evidence that disputes your position, and then you play the victim when people confront you. Your antics make discussion/debate impossible, which is the entire purpose of a forum. You are a troll, try contributing to a discussion instead of bulldozing over everyone and earn some respect, or prepare yourself for a lot more resistance.
+++++++++++++++++++++++++++++++
To support a claim like "The Gauntlet is rigged", there needs to be evidence to back this up. Clearly define what it is you are trying to examine, design a way to isolate that, organize a community spreadsheet, have players contribute data publicly, and then let some of the supremely talented and intelligent members of our community crunch the data and see what if any conclusions can be drawn. So far there is nothing in this thread that would warrant any response from the game engineers.
The same goes for shuttle percentages. Set something up, let's test it.
@Drago Musevni, I do not want to put words in your mouth about what your intent was for this thread, if it was your desire to try and make some changes to the game, I think you will find that a lot of people on these forums will be more than happy to assist you, but the first step in that process is to provide evidence that there is an actual problem in need of attention. I took your first few posts in here, combined with your thread title ("Don't ya just love DB/TP "math"?"), to be less serious and more in the realm of generalized complaining, and if that is your intention so be it. But if your goal was to try and effect change, seize this opportunity.
One thing I have found with these gauntlet complaints is a huge one way bias. I have done gauntlets with winning streaks in the 300’s. In the course of them I have won just as many matches I had very little chance to win as I have lost with very little chance to lose. For those who complain and only show screenshots of tough losses, try going for matches you most likely will not win. You will see the correct balance is there. And I agree that this recent trend of a few new people coming here thinking they know more than us is insulting and their words say so. We have been playing for years and we are not STT morons. We also have minds of our own, example being @Nomad (FADM:Patterns of Force) does not need someone telling her how to think and what to say or what her opinion should be. That was talking down to her and she doesn’t deserve that. And if you think the game is predatory than don’t allow yourself to be the prey.
Testing the shuttles seems straight forward. Chart the percentage, then chart whether they succeed or fail. Gauntlet seems much more difficult. I don't know how much you actually need. I made a list below, which seems pretty daunting. So overwhelming that I probably would not participate because of what it would require. But here's what I came up with.
Player input:
- Your min (skill 1)
- Your max (skill 1)
- Your min (skill 2)
- Your max (skill 2)
- Your crit percentage
- Your total score
- Opponent min (skill 1)
- Opponent max (skill 1)
- Opponent min (skill 2)
- Opponent max (skill 2)
- Opponent crit percentage
- Opponent total score
- Win streak before match
Calculations/simulations:
- (Spreadsheet formula to document skill spreads and/or averages is optional)
- Calculation or simulation to determine odds of success
- Formula to document winner
That's a start. I don't know how much can be removed or what may need to be added. But that list alone is more effort than I care to put into gauntlet. Hopefully this is helpful in getting the thread back on track.
Testing the shuttles seems straight forward. Chart the percentage, then chart whether they succeed or fail. Gauntlet seems much more difficult. I don't know how much you actually need. I made a list below, which seems pretty daunting. So overwhelming that I probably would not participate because of what it would require. But here's what I came up with.
Player input:
- Your min (skill 1)
- Your max (skill 1)
- Your min (skill 2)
- Your max (skill 2)
- Your crit percentage
- Your total score
- Opponent min (skill 1)
- Opponent max (skill 1)
- Opponent min (skill 2)
- Opponent max (skill 2)
- Opponent crit percentage
- Opponent total score
- Win streak before match
Calculations/simulations:
- (Spreadsheet formula to document skill spreads and/or averages is optional)
- Calculation or simulation to determine odds of success
- Formula to document winner
That's a start. I don't know how much can be removed or what may need to be added. But that list alone is more effort than I care to put into gauntlet. Hopefully this is helpful in getting the thread back on track.
Also need to include the # of trophies you stand to win from the battle. (This is the key to the subtle cues that was mentioned before)
And i agree, tracking all of these metrics would be challenging. If this was done as a “community pool” it would be highly probable that data entry errors would occur since there is no way to automatically record this data and it would be entirely dependent upon individuals q/c’ing their own data. I would be hesitant to trust this analysis as a community dataset.
Shuttles would be more straight forward, and the upcoming event would be the time to do it.
Just need to wait to collect data until the 4,000 ep threshold, track the mission name, shuttle success probability, any boosts used, and mission outcome. Individuals who participate for the whole event will have sufficient sample sizes on their own without the need to aggregate the data. To make it more statistically sound with less individual variance or noise, the missions should be crewed and boosted identically for the duration of the event, ie same crew, same boost (skill or time) to control for other factors and try to isolate the displayed success probability as the tested feature. Also, doing it this way would simplify data recording, as you would only need to record “success/fail” as all other variables would be the same for each observation.
Testing the shuttles seems straight forward. Chart the percentage, then chart whether they succeed or fail. Gauntlet seems much more difficult. I don't know how much you actually need. I made a list below, which seems pretty daunting. So overwhelming that I probably would not participate because of what it would require. But here's what I came up with.
Player input:
- Your min (skill 1)
- Your max (skill 1)
- Your min (skill 2)
- Your max (skill 2)
- Your crit percentage
- Your total score
- Opponent min (skill 1)
- Opponent max (skill 1)
- Opponent min (skill 2)
- Opponent max (skill 2)
- Opponent crit percentage
- Opponent total score
- Win streak before match
Calculations/simulations:
- (Spreadsheet formula to document skill spreads and/or averages is optional)
- Calculation or simulation to determine odds of success
- Formula to document winner
That's a start. I don't know how much can be removed or what may need to be added. But that list alone is more effort than I care to put into gauntlet. Hopefully this is helpful in getting the thread back on track.
Also need to include the # of trophies you stand to win from the battle. (This is the key to the subtle cues that was mentioned before)
And i agree, tracking all of these metrics would be challenging. If this was done as a “community pool” it would be highly probable that data entry errors would occur since there is no way to automatically record this data and it would be entirely dependent upon individuals q/c’ing their own data. I would be hesitant to trust this analysis as a community dataset.
Shuttles would be more straight forward, and the upcoming event would be the time to do it.
Just need to wait to collect data until the 4,000 ep threshold, track the mission name, shuttle success probability, any boosts used, and mission outcome. Individuals who participate for the whole event will have sufficient sample sizes on their own without the need to aggregate the data. To make it more statistically sound with less individual variance or noise, the missions should be crewed and boosted identically for the duration of the event, ie same crew, same boost (skill or time) to control for other factors and try to isolate the displayed success probability as the tested feature. Also, doing it this way would simplify data recording, as you would only need to record “success/fail” as all other variables would be the same for each observation.
Seriously, though... I think you're right about the gauntlet, but wrong about shuttles. I just couldn't help myself with the meme. If you just want to test the percentages as they are, then the community can handle that. If you want to isolate certain test parameters to get nuanced information, then you would want to have a small group and consider using PM's to coordinate and test isolated variables.
If you don't trust us but you expect us to trust you, then what are we even doing here?
Please stop this. There was nothing in my reply that said “i dont trust you”. I made it as clear as i could: expecting everyone to record such a robust set of variables is undoubtedly going to result in data entry issues, and therefore create problems with a statistically sound analysis.
As far as testing shuttles, my proposed test is as follows:
Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.
The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.
If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.
Example data sheet:
Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)
If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.
As far as testing shuttles, my proposed test is as follows:
Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.
The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.
If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.
Example data sheet:
Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)
If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.
Please let me know if this covers what you want. Imo, the crew slots are more for reference than mandatory, so that's why I put them to the right. I can modify the spreadsheet, or you can if you want. I went with successes and failures because it's easy to just add one to whichever column is appropriate. It can be changed if you want.
As far as testing shuttles, my proposed test is as follows:
Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.
The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.
If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.
Example data sheet:
Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)
If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.
Please let me know if this covers what you want. Imo, the crew slots are more for reference than mandatory, so that's why I put them to the right. I can modify the spreadsheet, or you can if you want. I went with successes and failures because it's easy to just add one to whichever column is appropriate. It can be changed if you want.
Thank you for putting that together! Looks pretty good, we could probably tweek it slightly as we gather feedback from folks and think through the process a bit.
yes the crew is just for reference to help make sure that each attempt at a mission is identical for the individual who sends the shuttle.
As far as testing shuttles, my proposed test is as follows:
Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.
The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.
If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.
Example data sheet:
Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)
If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.
Please let me know if this covers what you want. Imo, the crew slots are more for reference than mandatory, so that's why I put them to the right. I can modify the spreadsheet, or you can if you want. I went with successes and failures because it's easy to just add one to whichever column is appropriate. It can be changed if you want.
Thank you for putting that together! Looks pretty good, we could probably tweek it slightly as we gather feedback from folks and think through the process a bit.
yes the crew is just for reference to help make sure that each attempt at a mission is identical for the individual who sends the shuttle.
It just occurred to me that adding the captain name to the left and moving the mission name to the right seems helpful.
(Thanks for not shutting the thread down, Shan.)
65%er vs 5%er. I have better roll stats in both categories. I crit more times and still lose. Tell me again how the system is working as intended?
you had a 96% chance to win this battle. the 4% beat you out this time.
Also to the OP, you may not want to hear this, but there were 4000+ gauntlet rounds recorded in a spreadsheet, directly out of the log files from IAmPicard. These numbers were collected by multiple people and combined into a single spreadsheet that PeachTree Rex recorded, ran numerous statistical filters through and concluded ... that displayed percentages seem to match up with what DB shows, over a significantly large sample size.
If you want to reinvent the wheel or even deny these findings, that's on you. But this is the most exhaustive analysis that was ever done on the Gauntlet. Peachtree was in my fleet and I know he was a reliable source of statistical analysis.
This all just happened within a single refreshed gauntlet. What i find most commonly Is that I am more likely to lose the majority or all of an entire set. If I lose the first round it continues until all crew are disabled.
And yes. I have 2 Armus
What, if anything, Peachtree's analysis can say is that players who have access to displayed percentages wind up having match outcome results similar to the displayed percentages in aggregate.
The coverage bias is beyond obvious. But I guess I should just point and laugh at peoples' ignorance since that's the cool thing for very helpful forum denizens round these parts to do.
It is a real shame that there are so many players who don't understand prob & stats, but the bigger shame lies with those who mock them and undermine the good faith reporting of potential issues. Wielding a flawed aggregate as a cudgel when discussing a specific result doesn't make you part of the solution. Errors can and do hide in aggregate results.
For the record, I ran almost all of my shuttles at 97% this weekend. My spreadsheet earlier in the thread has my documentation. I'll be the first to admit that it literally looks unbelievable. I kept checking my VP through the event to see if that hundreds digit changed. It did not. The last three digits were 898 the whole time after I hit the 4k shuttles. So, hitting 100% success in 108 shuttles is within the 5% margin of error suggested earlier in the thread. It's a shame that no one else used the spreadsheet. I think 108 shuttles is barely acceptable in terms of sample size. And given the outcome, I am curious to see if it could be duplicated, even within one or two failures.
I don't have the patience to track gauntlet data, but in the spirit of cooperation, I did my best with my shuttles. It will be interesting to test other percentages when I go for a higher rank in the future. My squadron did not track their data, but they expressed some frustrations with shuttles rated below 95%. One guy claimed two rounds where three out of four failed at 85-92%. Maybe those were outliers and the other shuttles balanced that out. I don't know. Maybe the experiment can continue in a couple weeks if anyone is interested.
Personally, I think there's something magical that happens at 96% and above. I don't have any evidence beyond this weekend, so I am curious to test it.
Like lorca, i also tracked success % this event. I sent 6 shuttles at a time, all with 3* time reducers (no skill boosts) and sent them with the same crew throughout the event. In total, i got 30 repetitions for each shuttle mission with expected/observed success listed below:
The outliers are denoted with “**”. The most surprising result was the 95% shuttle mission which resulted in 83% success. The fails came consistently throughout the event for this mission, there wasn’t really a “bad streak of luck”, it simply failed more often than the others. There were 25 successes and 5 failures for this mission, in order for it to recover and perform “as expected”, it would have needed 70 consecutive successes by the end of the event.
If we pool all of my shuttle runs into one group, the expected success would be 90%, and observed success was 87.22%. Pretty reasonable, in my opinion, but i think its important to point out that the observed/expected success would have been spot on if it wasnt for the one pool of repetitions from the 95% success mission.
Lorca and i are discussing design and strategy to attempt to test this 95% success threshold, as well as the potential to test the “AND” skill seats in shuttles.
In general, i experienced way less failures this event than what i am typically used to seeing and tracking results and sending out identical shuttle missions was relatively simple. I am definitely interested in further investigating shuttle success though and i would like to invite people to participate in a group effort next full faction event.
@Cpt_insano_2k1 Do you know if you used shared crew regularly on those 95% shuttles?
I've heard anecdotal evidence that squad shares can produce unexpected fails.
I know you weren't asking me, but I used the squad share in one of my shuttles and did not have any failures. That was the shuttle that I time boosted for the full event, so it had the most runs at 34.
@Cpt_insano_2k1 Do you know if you used shared crew regularly on those 95% shuttles?
I've heard anecdotal evidence that squad shares can produce unexpected fails.
I know you weren't asking me, but I used the squad share in one of my shuttles and did not have any failures. That was the shuttle that I time boosted for the full event, so it had the most runs at 34.
I appreciate the response all the same.
In a perfect world I'd want multiple users running the exact same line up to test. I can't imagine the logistics of arranging even one same shuttle/line up test, let alone multiple shuttle/crew variations. It would be "fun" to test this in TP's demo environment, but even then demo doesn't always match production... 🤷
That said, your report here matches with my personal experience. There could be something more exotic going on, but likely not a general feature failure like the AND bug. Always keen to consider other reports too.
Comments
However, out of respect for the OP’s intent of this post, it is not a point that is directly relevant. Apology enough? Can we get back on track now? @Prime Lorca [10FH]
That's up to you. You seem intent to be the arbiter of what is related, what is relevant, when someone should or shouldn't address a certain point. So please, don't let me stop you from moderating the thread.
Just a “please fix this” thread to the engineers, not a “ lets debate whether or not you are ok with this” thread to the players. But I will certainly back off if the intent of the engineering forum is solely to have the white knights come in and bash people for making valid observations and asking for some simple fixes.
It seems to me that most of the "bashing" is being done by the 'dark knights' (to put a twist on your phrase), who can't stand that anyone has a totally different experience (even those who may have played a lot more than them), and that their opinion is the only valid one.
Could you please continue the petty bickering? I find it most intriguing.
~ Data, ST:TNG "Haven"
I'm pretty sure the gauntlet isn't rigged. It just has some totally wild RNG. Calling people "white knights" doesn't change that. Still not arguing about the shuttle percentage thing.
The thing about the forum... Yes, some rooms are monitored more closely than others. But the forums are pretty public. People on the forum try to be helpful when they think they can help. You may not know this yet, but it's the next best thing to hearing something from the devs. There's a very good chance that they do not respond to this thread or say that these things have been addressed and are not "bugs," but features of the game. I hope they respond. It's always nice to hear from them.
That part about "lets debate whether or not you are ok with this” is like saying you get to say whose opinion is valid. (Shan, I'm sorry for all the snipping you'll do on Monday.) I would say that it's like only the OP should have a voice, but you have no problem voicing your opinion too. So... Please. Let others post. And maybe you should stop posting if you're truly waiting for the devs. You can end this. Stop posting insults and nonsense. I'll have no need to keep going. I'll hop away, two squares back and one to the right.
+++++++++++++++++++++++++++++++
To support a claim like "The Gauntlet is rigged", there needs to be evidence to back this up. Clearly define what it is you are trying to examine, design a way to isolate that, organize a community spreadsheet, have players contribute data publicly, and then let some of the supremely talented and intelligent members of our community crunch the data and see what if any conclusions can be drawn. So far there is nothing in this thread that would warrant any response from the game engineers.
The same goes for shuttle percentages. Set something up, let's test it.
@Drago Musevni, I do not want to put words in your mouth about what your intent was for this thread, if it was your desire to try and make some changes to the game, I think you will find that a lot of people on these forums will be more than happy to assist you, but the first step in that process is to provide evidence that there is an actual problem in need of attention. I took your first few posts in here, combined with your thread title ("Don't ya just love DB/TP "math"?"), to be less serious and more in the realm of generalized complaining, and if that is your intention so be it. But if your goal was to try and effect change, seize this opportunity.
And I would like to also remind everyone of the importance of making relevant threads, comments - and that includes titles.
Player input:
- Your min (skill 1)
- Your max (skill 1)
- Your min (skill 2)
- Your max (skill 2)
- Your crit percentage
- Your total score
- Opponent min (skill 1)
- Opponent max (skill 1)
- Opponent min (skill 2)
- Opponent max (skill 2)
- Opponent crit percentage
- Opponent total score
- Win streak before match
Calculations/simulations:
- (Spreadsheet formula to document skill spreads and/or averages is optional)
- Calculation or simulation to determine odds of success
- Formula to document winner
That's a start. I don't know how much can be removed or what may need to be added. But that list alone is more effort than I care to put into gauntlet. Hopefully this is helpful in getting the thread back on track.
Also need to include the # of trophies you stand to win from the battle. (This is the key to the subtle cues that was mentioned before)
And i agree, tracking all of these metrics would be challenging. If this was done as a “community pool” it would be highly probable that data entry errors would occur since there is no way to automatically record this data and it would be entirely dependent upon individuals q/c’ing their own data. I would be hesitant to trust this analysis as a community dataset.
Shuttles would be more straight forward, and the upcoming event would be the time to do it.
Just need to wait to collect data until the 4,000 ep threshold, track the mission name, shuttle success probability, any boosts used, and mission outcome. Individuals who participate for the whole event will have sufficient sample sizes on their own without the need to aggregate the data. To make it more statistically sound with less individual variance or noise, the missions should be crewed and boosted identically for the duration of the event, ie same crew, same boost (skill or time) to control for other factors and try to isolate the displayed success probability as the tested feature. Also, doing it this way would simplify data recording, as you would only need to record “success/fail” as all other variables would be the same for each observation.
Seriously, though... I think you're right about the gauntlet, but wrong about shuttles. I just couldn't help myself with the meme. If you just want to test the percentages as they are, then the community can handle that. If you want to isolate certain test parameters to get nuanced information, then you would want to have a small group and consider using PM's to coordinate and test isolated variables.
Please stop this. There was nothing in my reply that said “i dont trust you”. I made it as clear as i could: expecting everyone to record such a robust set of variables is undoubtedly going to result in data entry issues, and therefore create problems with a statistically sound analysis.
Once the 4000 event point threshold is reached, take note of the missions you are running and take note of the crew choices you made to maximize the success percentage. Take note of any boosts used (time or skill). If you re-crew each mission as they return with the same crew and boosts you initially used, then you wont need to make note of the crew or percentage again because it will be identical to the ones prior. This is absolutely necessary in order to reduce the variances and noise.
The end result is that your “data sheet” would have: mission name, success probability, mission outcome. You could then do a simple calculation to find out your observed success % and you could compare that to the “expected” success percentage.
If done in this way, each shuttle sent out will be a repetition for the pool of all shuttles sent on that particular mission. Repetitions are an important part of statistical testing and will add strength to the analysis.
Example data sheet:
Mission A, expected success%, number of repetitions, success/fail (1/0), observed success ( sum of success/fail column divided by reps)
If we went with a 95% CI, then if expected success was 90%, an observed success rate range of 85%-95% would be evidence that there were no problems. A result lower than 85% could indicate a problem, and an observed result higher than 95% could indicate TP is very kind.
Please let me know if this covers what you want. Imo, the crew slots are more for reference than mandatory, so that's why I put them to the right. I can modify the spreadsheet, or you can if you want. I went with successes and failures because it's easy to just add one to whichever column is appropriate. It can be changed if you want.
https://docs.google.com/spreadsheets/d/1LY1cYAqmMGqRup1-7eQeUA83YDGZOz-r42X9BjbZiro/edit?usp=drivesdk
Thank you for putting that together! Looks pretty good, we could probably tweek it slightly as we gather feedback from folks and think through the process a bit.
yes the crew is just for reference to help make sure that each attempt at a mission is identical for the individual who sends the shuttle.
It just occurred to me that adding the captain name to the left and moving the mission name to the right seems helpful.
(Thanks for not shutting the thread down, Shan.)
you had a 96% chance to win this battle. the 4% beat you out this time.
Also to the OP, you may not want to hear this, but there were 4000+ gauntlet rounds recorded in a spreadsheet, directly out of the log files from IAmPicard. These numbers were collected by multiple people and combined into a single spreadsheet that PeachTree Rex recorded, ran numerous statistical filters through and concluded ... that displayed percentages seem to match up with what DB shows, over a significantly large sample size.
If you want to reinvent the wheel or even deny these findings, that's on you. But this is the most exhaustive analysis that was ever done on the Gauntlet. Peachtree was in my fleet and I know he was a reliable source of statistical analysis.
https://forum.disruptorbeam.com/stt/discussion/12744/4000-gauntlet-rounds
Second Star to the Right - Join Today!
And yes. I have 2 Armus
The coverage bias is beyond obvious. But I guess I should just point and laugh at peoples' ignorance since that's the cool thing for very helpful forum denizens round these parts to do.
It is a real shame that there are so many players who don't understand prob & stats, but the bigger shame lies with those who mock them and undermine the good faith reporting of potential issues. Wielding a flawed aggregate as a cudgel when discussing a specific result doesn't make you part of the solution. Errors can and do hide in aggregate results.
I don't have the patience to track gauntlet data, but in the spirit of cooperation, I did my best with my shuttles. It will be interesting to test other percentages when I go for a higher rank in the future. My squadron did not track their data, but they expressed some frustrations with shuttles rated below 95%. One guy claimed two rounds where three out of four failed at 85-92%. Maybe those were outliers and the other shuttles balanced that out. I don't know. Maybe the experiment can continue in a couple weeks if anyone is interested.
Personally, I think there's something magical that happens at 96% and above. I don't have any evidence beyond this weekend, so I am curious to test it.
96% / 96.67%
95% / 83.33%**
90% / 86.67%
90% / 86.67%
86% / 90%**
83%/ 80%
The outliers are denoted with “**”. The most surprising result was the 95% shuttle mission which resulted in 83% success. The fails came consistently throughout the event for this mission, there wasn’t really a “bad streak of luck”, it simply failed more often than the others. There were 25 successes and 5 failures for this mission, in order for it to recover and perform “as expected”, it would have needed 70 consecutive successes by the end of the event.
If we pool all of my shuttle runs into one group, the expected success would be 90%, and observed success was 87.22%. Pretty reasonable, in my opinion, but i think its important to point out that the observed/expected success would have been spot on if it wasnt for the one pool of repetitions from the 95% success mission.
Lorca and i are discussing design and strategy to attempt to test this 95% success threshold, as well as the potential to test the “AND” skill seats in shuttles.
In general, i experienced way less failures this event than what i am typically used to seeing and tracking results and sending out identical shuttle missions was relatively simple. I am definitely interested in further investigating shuttle success though and i would like to invite people to participate in a group effort next full faction event.
I've heard anecdotal evidence that squad shares can produce unexpected fails.
I know you weren't asking me, but I used the squad share in one of my shuttles and did not have any failures. That was the shuttle that I time boosted for the full event, so it had the most runs at 34.
I appreciate the response all the same.
In a perfect world I'd want multiple users running the exact same line up to test. I can't imagine the logistics of arranging even one same shuttle/line up test, let alone multiple shuttle/crew variations. It would be "fun" to test this in TP's demo environment, but even then demo doesn't always match production... 🤷
That said, your report here matches with my personal experience. There could be something more exotic going on, but likely not a general feature failure like the AND bug. Always keen to consider other reports too.