Data Science and Computing with Python for Pilots and Flight Test Engineers
Solutions to NTPS Statistics Problem Sets
Introduction
In this lesson we demonstrate how to use our statistics codes developed in the previous lessons to solve statistics problems quickly and elegantly. Specifically, we demonstrate the use of our codes on a selection of the statistics problem sets posed by the National Test Pilot School (NTPS) in Chapter 6 of their textbook Math & Physics for Flight Testers, Professional Course, Volume I, National Test Pilot School, Mojave, California, October 2021. The numbering of the problems below and the page references refer to this work.
NTPS uses these problem sets during the statistics lessons in the second week of their course T&E 4001 Fundamentals of Flight Test. The course covers much more than just statistics, but the final exam of the course is about statistics and solving problems like these (it is open book and you are allowed to use a computer). At the time of writing, the above textbook is freely available online as part of NTPS’s effort to have student come well prepared for the Professional Course and Master’s Degree studies.
Some of the codes in the Statistics Part of our online course were in fact developed by us during our attendance of NTPS. It is therefore not surprising that solving problems with them is very easy and fast. All you need to do is instantiate the corresponding class, call whichever one of its methods you desire to use with the data of the problem as its arguments, and out comes the result. Thus, many of these statistics problems can be solved essentially with one single function call. Of course, you do need to know the statistics theory as well as have and know our code, in order to know, which method to call for a given problem, which is why we have created the preceding lessons.
Solved Sample Problems from NTPS Statistics Tutorials
Prerequisite Code
import math
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st
from scipy.optimize import curve_fit
Then add additional cells with all the classes we have defined in the individual lessons of the Statistics Part of this online course. The problem solving codes below create instances of these classes to solve the problems. Then simply copy and run the code cells below to solve individual problems. Make sure you understand, what they do and why they are being used, such that you develop the ability to apply the code to new problems of your own.
# All statistics classes defined in all the statistics lessons of this course
# go into this cell (or distributed over multiple code cells).
# List of statistics classes:
# DataStatistics, ProbabilityTables, NormalDistribution,
# Student_t_Distribution, Chi2_Distribution,
# NonparametricTests, TestOfNormality, SampleSize, CEP, Chauvenet_Criterion
Sample Mean, Sample Median, and Sample Standard Deviation
This section computes simple, basic statistical quantities with a method from the DataStatistics class. The following problem can be found in the NTPS Professional Course Textbook, Math & Physics for Flight Testers, Volume 1, October 2021, on Page 6.43 (for subsequent problems analogously, and only Problem Set, Problem Number, and Page will be indicated).
Problem Set 1, Problem 3 (Page 6.43)
Test points: 3.5, 4.0, 3.8, 4.2, 3.7. Find sample mean, median, and standard deviation.
data = np.array([3.5, 4.0, 3.8, 4.2, 3.7])
ds = DataStatistics()
mean = ds.mean(data)
print(mean)
median = ds.median(data)
print(median)
standard_deviation = ds.standard_deviation(data)
print(standard_deviation)
# Note that the built-in Numpy standard deviation function returns the population standard deviation instead:
population_standard_deviation = np.std(data)
print(population_standard_deviation)
3.84
3.8
0.27018512172212594
0.24166091947189147
Probabilities from Distributions
The problems in this section show how you obtain information commonly found in statistics tables by using our code instead.
Normal (Gaussian) Distribution
Used to study statistical properties of individual samples and the mean, if the true population mean and variance are known, or if the variance is unknown and the sample size is large (\(n\ge30\)).
Problem Set 2, Problem 1
A very large test program was accomplished to determine the stall speed of the F-l9. The standardized data were normally distributed with a mean of 125 knots and a standard deviation of 2 knots. What is the probability of a single stall happening at 130 knots or greater?
dist = NormalDistribution()
x = 130
mean = 125
standard_deviation = 2
result = dist.probability_of_x_or_greater(x=x, mean=mean, standard_deviation=standard_deviation)
print(result)
0.006209665325776159
Student’s \(t\)-Distribution
Used to study statistical properties of individual samples and the mean, using a small sample size ($n<30$).
Problem Set 2, Problem 2
Assume four random pilots stall the F-19 in Problem 1. Ninety-nine percent of the time, the sample average stall speed will be less than what value? Assume \(s = \sigma\).
dist = Student_t_Distribution()
mean = 125 # mean from Problem 1.
sigma = 2 # standard deviation from Problem 1.
result = dist.sample_mean_less_than(mu=mean, s=sigma, n=4, probability=0.99)
print(result)
129.5407028586984
\(\chi^2\)-Distribution
Used to study the statistical properties of our standard deviation estimate.
Problem Set 2, Problem 3
Ninety-nine percent of the time, the sample standard deviation of the stall speed for the four pilots in problem two will be less than what value?
dist = Chi2_Distribution()
mean = 125
sigma = 2
result = dist.sample_standard_deviation_less_than(sigma=sigma, n=4, probability=0.99)
print(result)
3.8892787215016913
Confidence Intervals
Problem Set 3, Problem 1
You test 50 F-120 engines and find that the mean is 27,900 with a standard deviation of 350 lbs. What are the 90% confidence limits for the actual thrust? (hint: because n > 30, use z and assume $\sigma = s$)
dist = NormalDistribution()
n = 50
mean = 27900
standard_deviation = 350
probability = 0.9 # two-sided
result = dist.twosided_confidence_interval(mean, standard_deviation, n, probability)
print(result)
[27818.583899242632, 27981.416100757368]
Problem Set 3, Problem 2
Ten MIL power takeoff rolls were measured by your test group. The standardized data have a mean of 2700 ft and a standard deviation of 200 ft. What are the 95% confidence limits for the actual value?
dist = Student_t_Distribution()
n = 10
mean = 2700
standard_deviation = 200
probability = 0.95 # two-sided
result = dist.twosided_confidence_interval(mean, standard_deviation, n, probability)
print(result)
# verify two digits after period
[2556.9286188094857, 2843.0713811905143]
Problem Set 3, Problem 3
A sample of 9 rocket motors which were stored for 5 years had an average burn time of 3.1 sec and a standard deviation of 0.1 sec. What are the 95% confidence limits for the standard deviation?
dist = Chi2_Distribution()
n = 9
# mean = 3.1 # not needed.
s = 0.1 # sample standard deviation
probability = 0.95 # two-sided
result = dist.twosided_confidence_interval(s, n, probability)
print(result)
[0.06754570344012019, 0.1915770882942761]
Hypothesis Testing
Using the \(z\)-Distribution (i.e. Standard Gaussian (Normal) Distribution with \(\mu_0=0\) and \(\sigma_0=1\))
Example from Lecture using \(z\)-Distribution (Slide 4-23):
During prior testing, it was determined that the cross range error were normally distributed with a mean error of 20 ft and a standard deviation of 3 feet. After a modification, it was found that the sample mean cross range error for nine bombs was 22 feet. Has the mean changed at the 0.05 level of significance?
dist = NormalDistribution()
n = 9
sample_mean = 22
standard_deviation = 3
probability = 0.95
hypothesis_mean = 20
z_test, z_prob_interval = dist.twosided_hypothesis_test(sample_mean, hypothesis_mean, standard_deviation, n, probability)
print(z_test, z_prob_interval)
z_prob (based on significance): [-1.959963984540054, 1.959963984540054] (use negative if applicable)
z_test (based on sample mean and hypothesis mu): 2.0
Null Hypothesis is rejected.
2.0 [-1.959963984540054, 1.959963984540054]
Problem Set 4, Problem 1
You test 50 F-120 engines and find that the mean thrust is 27,900 lbs with a standard deviation of 350 lbs. The contractual guarantee was 28,000 lbs. Did the contractor meet the guarantee at 90% confidence. (hint: because n > 30, use \(z\) and assume \(\sigma=s\))
# Notes:
# "Use z" refers to using standard Gaussian (normal) distribution.
# "Assume sigma = s" refers to using the sample standard deviation s
# to estimate the actual standard deviation of the population sigma.
#####
# Correct way to solve this problem:
dist = NormalDistribution()
n = 50
sample_mean = 27900
standard_deviation = 350
probability = 0.9
hypothesis_mean = 28000
z_test, z_prob = dist.onesided_hypothesis_test(sample_mean, hypothesis_mean, standard_deviation, n, probability)
print(z_test, z_prob)
# Null hypothesis is rejected, because z_test = -2.0203 is smaller than z_prob = -1.2815
z_prob (based on significance): 1.2815515655446004 (use negative if applicable)
z_test (based on sample mean and hypothesis mu): -2.0203050891044216
Null Hypothesis is rejected.
-2.0203050891044216 1.2815515655446004
# WARNING: The code below is not how to solve this problem (incorrect solution). Use the code above instead.
dist = NormalDistribution()
n = 50
sample_mean = 27900
standard_deviation = 350
probability = 0.9
hypothesis_mean = 28000
result = dist.onesided_confidence_intervals(sample_mean, standard_deviation, n, probability)
#result = dist.twosided_confidence_interval(mean, standard_deviation, n, probability)
print(result) # compare to hypothesis mean
z: 1.2815515655446004
[27836.566433829423, 27963.433566170577]
Using Student’s \(t\)-Distribution
Problem Set 4, Problem 2
If there were only 11 test engines in problem 1 above, would the contractor have met the guarantee?
dist = Student_t_Distribution()
n = 11
sample_mean = 27900
standard_deviation = 350
probability = 0.9
hypothesis_mean = 28000
t_test, t_prob = dist.onesided_hypothesis_test(sample_mean, hypothesis_mean, standard_deviation, n, probability)
print(t_test, t_prob)
# Note: contractor seems to have met guarantee, since t_test = -0.9476 is larger than t_prob = -1.3722.
t_prob (based on significance): 1.3721836411102863 (use negative if applicable)
t_test (based on sample mean, xbar, and null hypothesis mu):
-0.9476070829586857 Null Hypothesis is accepted.
-0.9476070829586857 1.3721836411102863
Example from Lecture using One-Sided \(t\)-Distribution (Slide 4-25):
The contract says that climb fuel must be less than 1500 lbs. During nine climbs we get an average of 1600 lbs with a sample standard deviation of 200lbs. Do we penalize the contractor?
dist = Student_t_Distribution()
n = 9
sample_mean = 1600
standard_deviation = 200
probability = 0.95
hypothesis_mean = 1500
t_test, t_prob = dist.onesided_hypothesis_test(sample_mean, hypothesis_mean, standard_deviation, n, probability)
print(t_test, t_prob)
t_prob (based on significance): 1.8595480375228424 (use negative if applicable)
t_test (based on sample mean and hypothesis mu): 1.5
Null Hypothesis is accepted.
1.5 1.8595480375228424
Using the \(\chi^2\)-Distribution
Note: This studies the acceptability of the obtained standard deviation (not of the mean).
Problem Set 4, Problem 3
Rocket motors have a burn time standard deviation of 0.1 sec ($\sigma_0$) when produced. A sample of 9 rocket motors which were stored for 5 years have a sample standard deviation of 0.15 sec. At the 95% confidence level, has the standard deviation increased?
dist = Chi2_Distribution()
n = 9
s = 0.15 # this is sample_standard_deviation.
sigma = 0.10 # this is hypothesis standard deviation (sigma_0 in problem).
probability = 0.95 # confidence level for null hypothesis (1-alpha)
chi2_test, chi2_prob = dist.onesided_hypothesis_test(s, sigma, n, probability)
print(chi2_test, chi2_prob)
chi2_prob (from table, based on significance, alpha, and number of samples, n): 15.507313055865453 (upper bound implemented only!)
chi2_test (based on sample s (sample standard deviation), null hypothesis sigma, and number of samples, n: 17.999999999999996
Null Hypothesis rejected.
17.999999999999996 15.507313055865453
Example Problem from Lecture using One-sided \(\chi^2\)-Distribution (Slide 4-27):
dist = Chi2_Distribution()
n = 10
s = 12 # this is sample_standard_deviation.
sigma = 10 # this is hypothesis standard deviation.
probability = 0.90 # confidence level for null hypothesis (1-alpha)
chi2_test, chi2_prob = dist.onesided_hypothesis_test(s, sigma, n, probability)
print(chi2_test, chi2_prob)
chi2_prob (from table, based on significance, alpha, and number of samples, n): 14.683656573259837 (upper bound implemented only!)
chi2_test (based on sample s (sample standard deviation), null hypothesis sigma, and number of samples, n: 12.96
Null Hypothesis accepted.
12.96 14.683656573259837
Nonparametric Tests of Not-Normally Distributed Data
Nonparametric Tests (e.g. Sign Test for Non-Normally Distributed Data) (Slide 6-8)
npt = NonparametricTests()
p = 0.5
alpha = 0.1
N = 10 # number of pilots
x = 3 # number of pilots preferring the old system or having no opinion.
# Null hypothesis is that the new system is no better than the old.
probability = npt.SignTest(x, N, p, alpha)
print(probability)
Probability obtained is: 0.171875
Above probability must be smaller than alpha= 0.1 for us to reject the null hypothesis.
Probability is larger than alpha, thus accept null hypothesis.
0.171875
Problem Set 5, Problem 1
We want to know if the logic in a new RADAR tape has increased detection range. We need to be 95% certain before we give the program office the green light. Given the following data, decide. Do not assume normally distributed data.
Data: Before: 3, 5, 5, 6, 7, 8, 12, 12, 17; After: 8, 10, 12, 14, 15, 16, 19
# DO NOT DO THIS FOR TEST. REQUIRES MATERIALS NOT IN LECTURE.
Problem Set 5, Problem 2
The YF-l9 has vertical tape instruments. Before going into production, the Program Director polls the test pilots and finds that 8 prefer round dials, 2 have no preference, and 2 want to keep the tapes. You want to be 80% sure before approving a change. What should you do?
If pilots with no preference are included in the people who do not want a change:
npt = NonparametricTests()
p = 0.5
alpha = 0.2
N = 12 # number of pilots
x = 4 # number of pilots preferring the old system or having no opinion.
# Null hypothesis is that the new system is no better than the old.
probability = npt.SignTest(x, N, p, alpha)
print(probability)
Probability obtained is: 0.19384765625
Above probability must be smaller than alpha= 0.2 for us to reject the null hypothesis.
Probability is smaller than alpha; thus reject null hypothesis.
0.19384765625
If throw away the two no-preference pilots:
npt = NonparametricTests()
p = 0.5
alpha = 0.2
N = 10 # number of pilots
x = 2 # number of pilots preferring the old system or having no opinion.
# Null hypothesis is that the new system is no better than the old.
probability = npt.SignTest(x, N, p, alpha)
print(probability)
Probability obtained is: 0.0546875
Above probability must be smaller than alpha= 0.2 for us to reject the null hypothesis.
Probability is smaller than alpha; thus reject null hypothesis.
0.0546875
Tests of Normality
Problem Set 6, Problem 1
Construct a probability plot for the following data using the normal distribution as the assumed distribution.
0, 20, 40, 30, 0, 20, -20, -30, -50, -20
data = [0, 20, 40, 30, 0, 20, -20, -30, -50, -20]
ton = TestsOfNormality()
xdata, ydata = ton.perform_normality_test(data)
Original Data: [0, 20, 40, 30, 0, 20, -20, -30, -50, -20]
Sorted Data: [-50, -30, -20, -20, 0, 0, 20, 20, 30, 40]
Percentile: [5.0, 15.0, 25.0, 35.0, 45.0, 55.0, 65.0, 75.0, 85.0, 95.0]
xdata: [-50, -30, -20, -20, 0, 0, 20, 20, 30, 40]
ydata: [-1.6448536269514729, -1.0364333894937898, -0.6744897501960817, -0.38532046640756773, -0.12566134685507402, 0.12566134685507416, 0.38532046640756773, 0.6744897501960817, 1.0364333894937898, 1.6448536269514722]
The code also produces a plot of (xdata, ydata) with a line fit through the data, which is not shown here.
Number of Samples Required for a Desired Accuracy
Example Problem from Lecture (Slide 6-13):
Note that the problem has multiple typos on the original updated slide; refer to even more updated slide. The problem is supposed to have $\alpha=0.1$ (10%) and the result is $n=10.8$.
mu = 1.0 # expected mean.
# Can simply set mu above to 1, without loss of generality in this problem, because the value of the mean cancels out.
error = 0.1 * mu
sigma = 0.2 * mu
alpha = 0.1 # alpha = 1 - confidence level (or probability)
samsi = SampleSize()
n = samsi.get_sample_size(error, sigma, alpha)
print(n)
10.822173816381655
Problem Set 7, Problem 1
Assume that a test will produce normally distributed data with a standard deviation of 5% of the mean ($\sigma= 0.05\mu$). How many samples (n) do we need to determine the mean at the 95% confidence level if we want the error to be:
a. less than 0.10 $\mu$?
b. less than 0.05 $\mu$?
c. less than 0.01 $\mu$?
mu = 1.0 # expected mean.
# Can simply set mu above to 1, without loss of generality in this problem, because the value of the mean cancels out.
sigma = 0.05 * mu
alpha = 0.05 # alpha = 1 - confidence level
samsi = SampleSize()
error = 0.1 * mu
n = samsi.get_sample_size(error, sigma, alpha)
print(n)
error = 0.05 * mu
n = samsi.get_sample_size(error, sigma, alpha)
print(n)
error = 0.01 * mu
n = samsi.get_sample_size(error, sigma, alpha)
print(n)
0.9603647051735313
3.8414588206941254
96.03647051735311
CEP, MIP, CEP about Target, and CEA
Example Problem from Lecture about CEP and MIP (Slides 7ff in Part 7)
xdata = [-100, -45, -10, 0, 40, 95, 100, -130]
ydata = [0, 20, 40, 30, 0, 20, 10, 15]
xdata = [-100, -45, -10, 0, 40, 95, 100, -130]
ydata = [0, 20, 40, 30, 0, 20, 10, 15]
# CEP and MIP:
cep = CEP()
cep_value, mip_value, sx, sy = cep.compute_CEP_and_MIP(xdata, ydata)
print("CEP (about MIP):", cep_value)
print("MIP:", mip_value)
print("sx, sy:", sx, sy)
print("")
# CEP about Target:
cep = CEP()
cep_value, sx, sy = cep.compute_CEP_about_target(xdata, ydata)
print("CEP about target:", cep_value)
print("sx, sy:", sx, sy)
print("")
# CEA about Target (this is an inferior method):
cep = CEP()
cea = cep.compute_CEA(xdata, ydata)
print("CEA about target:", cea)
CEP (about MIP): 55.74048981609133
MIP: (-6.25, 16.875)
sx, sy: 83.95364367145886 13.871218918527466
CEP about target: 61.37188024607714
sx, sy: 84.21910200695055 22.75647474581999
CEA about target: 73.614882994266
Problem Set 8, Problem 1
Find the CEP about the MIP for the following data.
xdata = [-100, -45, -10, 0, 40, 95, 100, 0, 0, -40, 210, -190]
ydata = [0, 20, 40, 30, 0, 20, -20, -30, -50, -20, -20, -75]
xdata = [-100, -45, -10, 0, 40, 95, 100, 0, 0, -40, 210, -190]
ydata = [0, 20, 40, 30, 0, 20, -20, -30, -50, -20, -20, -75]
# CEP and MIP:
cep = CEP()
cep_value, mip_value, sx, sy = cep.compute_CEP_and_MIP(xdata, ydata)
print("CEP:", cep_value)
print("MIP:", mip_value)
print("sx, sy:", sx, sy)
CEP: 78.16266767841464
MIP: (5.0, -8.75)
sx, sy: 101.91351056834239 33.85295743761137
Problem Set 8, Problem 2
Find the CEP about the target (\(x = y = 0\)).
# CEP about Target
cep = CEP()
cep_value, sx, sy = cep.compute_CEP_about_target(xdata, ydata)
print("CEP about target (target at x = y = 0):", cep_value)
print("sx, sy:", sx, sy)
CEP about target (target at x = y = 0): 78.985568930666
sx, sy: 102.04722614376328 35.06487493982339
Chauvenet’s Criterion
Example Problem from Lecture about Chauvenet’s Criterion (Slide 8-12)
n = 6
mu = 3256
sigma = 175
x = 3954
chcrit = Chauvenet_Criterion()
z, z_chauvenet = chcrit.apply_criterion(x, mu, sigma, n)
P Chauvenet: 0.08333333333333333
z from Normal Distribution: 3.9885714285714284
z from Chauvenet (P=1/2n): [-1.7316643961222453, 1.7316643961222453]
Chauvenet's criterion met. Disregard outlier.