Probability, Statistics and Other Frightening Stuff (Volume II of the Working Guides to Estimating & Forecasting series) considers many of the commonly used Descriptive Statistics in the world of estimating and forecasting. It considers values that are representative of the middle ground (Measures of Central Tendency), and the degree of data scatter (Measures of Dispersion and Shape) around the middle ground values.
A number of Probability Distributions and where they might be used are discussed, along with some fascinating and useful rules of thumb or short-cut properties that estimators and forecasters can exploit in plying their trade. With the help of a Correlation Chicken, the concept of partial correlation is explained, including how the estimator or forecaster can exploit this in reflecting varying levels of independence and imperfect dependence between an output or predicted value (such as cost) and an input or predictor variable such as size.
Under the guise of Tails of the unexpected the book concludes with two chapters devoted to Hypothesis Testing (or knowing when to accept or reject the validity of an assumed estimating relationship), and a number of statistically-based tests to help the estimator to decide whether to include or exclude a data point as an outlier, one that appears not to be representative of that which the estimator is tasked to produce. This is a valuable resource for estimators, engineers, accountants, project risk specialists as well as students of cost engineering.
This practical guide to probability and statistics explains these concepts that underpin all professional estimating. Alan Jones considers what are called Measures of Central Tendency; Means, Modes and Medians, describing the differences, relevance and uses for each in the context of Estimating and the like.
Volume II Table of Contents, 1 Introduction and Objectives, 1.1 Why
write this book? Who might find it useful? Why Five Volumes? 1.1.1 Why write
this series? Who might find it useful? 1.1.2 Why Five Volumes? 1.2 Features
you'll find in this book and others in this series, 1.2.1
Chapter Context,
1.2.2 The Lighter Side (humour), 1.2.3 Quotations, 1.2.4 Definitions, 1.2.5
Discussions and Explanations with a Mathematical Slant for Formula-philes,
1.2.6 Discussions and Explanations without a Mathematical Slant for Formula
-phobes, 1.2.7 Caveat Augur, 1.2.8 Worked Examples, 1.2.9 Useful Microsoft
Excel Functions and Facilities, 1.2.10 References to Authoritative Sources,
1.2.11
Chapter Reviews, 1.3 Overview of
Chapters in this Volume, 1.4
Elsewhere in the 'Working Guide to Estimating & Forecasting' Series, 1.4.1
Volume I: Principles, Process and Practice of Professional Number Juggling,
1.4.2 Volume II: Probability, Statistics and other Frightening Stuff, 1.4.3
Volume III: Best Fit Lines & Curves, and some Mathe-Magical Transformations,
1.4.4 Volume IV: Learning, Unlearning and Re-Learning Curves, 1.4.5 Volume V:
Risk, Opportunity, Uncertainty and Other Random Models, 1.5 Final Thoughts
and Musings on this Volume and Series, References, 2 Measures of Central
Tendency: Means, Modes, Medians, 2.1 'S' is for Shivers, Statistics and Spin,
2.1.1 Cutting through the Mumbo-Jumbo: What is or are Statistics? 2.1.2 Are
there any types of Statistics that are not 'Descriptive'? 2.1.3 Samples,
Populations and the Dreaded Statistical Bias, 2.2 Measures of Central
Tendency, 2.2.1 What do we mean by Mean? 2.2.2 Can we take the Average of
an Average? 2.3 Arithmetic Mean - The Simple Average, 2.3.1 Properties of
Arithmetic Means: A Potentially Unachievable Value! 2.3.2 Properties of
Arithmetic Means: An Unbiased Representative Value of the Whole, 2.3.3 Why
would we not want to use the Arithmetic Mean? 2.3.4 Is an Arithmetic Mean
useful where there is an upward or downward trend? 2.3.5 Average of Averages:
Can we take the Arithmetic Mean of an Arithmetic Mean? 2.4 Geometric Mean,
2.4.1 Basic Rules and Properties of a Geometric Mean, 2.4.2 When might we
want to use a Geometric Mean? 2.4.3 Finding a steady state rate of growth or
decay with a Geometric Mean, 2.4.4 Using a Geometric Mean as a Cross-Driver
Comparator, 2.4.5 Using a Geometric Mean with Certain Non-Linear Regressions,
2.4.6 Average of Averages: Can we take the Geometric Mean of a Geometric
Mean? 2.5 Harmonic Mean, 2.5.1 Surely Estimators would never use the Harmonic
Mean? 2.5.2 Cases where the Harmonic Mean and the Arithmetic Mean are both
inappropriate, 2.5.3 Average of Averages: Can we take the Harmonic Mean of a
Harmonic Mean?, 2.6 Quadratic Mean: Root Mean Square, 2.6.1 When would we
ever use a Quadratic Mean? 2.7 Comparison of Arithmetic, Geometric, Harmonic
and Quadratic Means, 2.8 Mode, 2.8.1 When would we use the Mode instead of
the Arithmetic Mean? 2.8.2 What does it mean if we observe more than one
Mode? 2.8.3 What if we have two modes that occur at adjacent values? 2.8.4
Approximating the Theoretical Mode when there is no Real Observable Mode! 2.9
Median, 2.9.1 Primary Use of the Median, 2.9.2 Finding the Median, 2.10
Choosing a Representative Value: The 5-Ms, 2.10.1 Some Properties of the
5-Ms, 2.11
Chapter Review, References, 3 Measures of Dispersion and Shape,
3.1 Measures of Dispersion or Scatter around a Central Value, 3.2 Minimum,
Maximum and Range, 3.3 Absolute Deviations, 3.3.1 Mean or Average Absolute
Deviation (AAD), 3.3.2 Median Absolute Deviation (MAD), 3.3.3 Is there a Mode
Absolute Deviation? 3.3.4 When would we use an Absolute Deviation? 3.4
Variance and Standard Deviation, 3.4.1 Variance and Standard Deviation -
Compensating for Small Samples, 3.4.2 Coefficient of Variation, 3.4.3 The
Range Rule - Is it Myth or Magic? 3.5 Comparison of Deviation-Based Measures
of Dispersion, 3.6 Confidence Levels, Limits and Intervals, 3.6.1 Open and
Closed Confidence Level Ranges, 3.7 Quantiles: Quartiles, Quintiles, Deciles
and Percentiles, 3.7.1 A few more words about Quartiles, 3.7.2 A few thoughts
about Quintiles, 3.7.3 And a few words about Deciles, 3.7.4 Finally, a few
words about Percentiles, 3.8 Other Measures of Shape: Skewness and
Peakedness, 3.8.1 Measures of Skewness, 3.8.2 Measures of Peakedness or
Flatness - Kurtosis, 3.9
Chapter Review, References, 4 Probability
Distributions, 4.1 Probability, 4.1.1 Discrete Distributions, 4.1.2
Continuous Distributions, 4.1.3 Bounding Distributions, 4.2 Normal
Distributions, 4.2.1 What is a Normal Distribution? 4.2.2 Key Properties of a
Normal Distribution, 4.2.3 Where is the Normal Distribution observed? When
can, or should, it be used? 4.2.4 Probability Density Function and Cumulative
Distribution Function, 4.2.5 Key Stats and Facts about the Normal
Distribution, 4.3 Uniform Distributions, 4.3.1 Discrete Uniform
Distributions, 4.3.2 Continuous Uniform Distributions, 4.3.3 Key Properties
of a Uniform Distribution, 4.3.4 Where is the Uniform Distribution observed?
When can, or should, it be used? 4.3.5 Key Stats and Facts about the Uniform
Distribution, 4.4 Binomial and Bernoulli Distributions, 4.4.1 What is a
Binomial Distribution? 4.4.2 What is a Bernoulli Distribution? 4.4.3
Probability Mass Function and Cumulative Distribution Function, 4.4.4 Key
Properties of a Binomial Distribution, 4.4.5 Where is the Binomial
Distribution observed? When can, or should, it be used? 4.4.6 Key Stats and
Facts about the Binomial Distribution, 4.5 Beta Distributions, 4.4.1 What is
a Beta Distribution? 4.4.2 Probability Density Function and Cumulative
Distribution Function, 4.4.3 Key Properties of a Beta Distribution, 4.4.4
PERT-Beta or Project Beta Distributions, 4.4.5 Where is the Beta Distribution
observed? When can, or should, it be used? 4.4.6 Key Stats and Facts about
the Beta Distribution, 4.6 Triangular Distributions, 4.6.1 What is a
Triangular Distribution? 4.6.2 Probability Density Function and Cumulative
Distribution Function, 4.6.3 Key Properties of a Triangular Distribution,
4.6.4 Where is the Triangular Distribution observed? When can, or should, it
be used? 4.6.5 Key Stats and Facts about the Triangular Distribution, 4.7
Lognormal Distributions, 4.7.1 What is a Lognormal Distribution? 4.7.2
Probability Density Function and Cumulative Distribution Function, 4.7.3 Key
Properties of a Lognormal Distribution, 4.7.4 Where is the Lognormal
Distribution observed? When can, or should, it be used? 4.7.5 Key Stats and
Facts about the Lognormal Distribution, 4.8 Weibull Distributions, 4.8.1 What
is a Weibull Distribution? 4.8.2 Probability Density Function and Cumulative
Distribution Function, 4.8.3 Key Properties of a Weibull Distribution, 4.8.4
Where is the Weibull Distribution observed? When can, or should, it be used?
4.8.5 Key Stats and Facts about the Weibull Distribution, 4.9 Poisson
Distributions, 4.9.1 What is a Poisson Distribution? 4.9.2 Probability Mass
Function and Cumulative Distribution Function, 4.9.3 Key Properties of a
Poisson Distribution, 4.9.4 Where is the Poisson Distribution observed? When
can, or should, it be used? 4.9.5 Key Stats and Facts about the Poisson
Distribution, 4.10 Gamma and Chi-Squared Distributions, 4.10.1 What is a
Gamma Distribution? 4.10.2 What is a Chi-Squared Distribution? 4.10.3
Probability Density Function and Cumulative Distribution Function, 4.10.4 Key
Properties of Gamma and Chi-Squared Distributions, 4.10.5 Where are the Gamma
and Chi-Squared Distributions used? 4.10.6 Key Stats and Facts about the
Gamma and Chi-Squared Distributions,4.11 Exponential Distributions, 4.11.1
What is an Exponential Distribution? 4.11.2 Probability Density Function and
Cumulative Distribution Function, 4.11.3 Key Properties of an Exponential
Distribution, 4.11.4 Where is the Exponential Distribution observed? When
can, or should, it be used? 4.11.5 Key Stats and Facts about the Exponential
Distribution, 4.12 Pareto Distributions, 4.12.1 What is a Pareto
Distribution? 4.12.2 Probability Density Function and Cumulative Distribution
Function, 4.12.3 The Pareto Principle: How does it fit in with the Pareto
Distribution? 4.12.4 Key Properties of a Pareto Distribution, 4.12.5 Where is
the Pareto Distribution observed? When can, or should, it be used? 4.12.6 Key
Stats and Facts about the Pareto Distribution, 4.13 Choosing an Appropriate
Distribution, 4.14
Chapter Review, References, 5 Measures of Linearity,
Dependence and Correlation, 5.1 Covariance, 5.2 Linear Correlation or
Measures of Linear Dependence, 5.2.1 Pearson's Correlation Coefficient, 5.2.2
Pearson's Correlation Coefficient - Key Properties and Limitations, 5.2.3
Correlation is not Causation, 5.2.4 Partial Correlation: Time for some
Correlation Chicken, 5.2.5 Coefficient of Determination, 5.3 Rank
Correlation, 5.3.1 Spearman's Rank Correlation Coefficient, 5.3.2 If
Spearman's Rank Correlation is so much trouble, why bother? 5.3.3
Interpreting Spearman's Rank Correlation Coefficient, 5.3.4 Kendall's Tau
Rank Correlation Coefficient, 5.3.5If Kendall's Tau Rank Correlation is so
much trouble, why bother? 5.4 Correlation: What if you want to 'Push' it not
'Pull' it? 5.4.1 The Pushy Pythagorean Technique or Restricting the Scatter
around a Straight Line, 5.4.2 Controlling Partner Technique, 5.4.3
Equivalence of the Pushy Pythagorean and Controlling Partner Techniques,
5.4.4 Equal Partners Technique, 5.4.5 Copulas, 5.5
Chapter Review,
References, 6 Tails of the Unexpected (1): Hypothesis Testing, 6.1 Hypothesis
Testing, 6.1.1 Tails of the Unexpected, 6.2 Z-Scores and Z-Tests, 6.2.1
Standard Error, 6.2.2 Example: Z-Testing the Mean Value of a Normal
Distribution, 6.2.3 Example: Z-Testing the Median Value of a Beta
Distribution, 6.3 Student's t-Distribution and t-Tests, 6.3.1 Student's
t-Distribution, 6.3.2 t-Tests, 6.3.3 Performing a t-Test in Microsoft Excel
on a Single Sample, 6.3.4 Performing a t-Test in Microsoft Excel to Compare
Two Samples, 6.4 Mann-Whitney U-Tests, 6.5 Chi-Squared Tests or 2-Tests,
6.5.1 Chi-Squared Distribution Revisited, 6.5.2 Chi-Squared Test, 6.6
F-Distribution and F-Tests, 6.6.1 F-Distribution, 6.6.2 F-Test, 6.6.3 Primary
Use of the F-Distribution, 6.7 Checking for Normality, 6.7.1 Q-Q Plots, 6.7.2
Using a Chi-Square Test for Normality, 6.7.3 Using the Jarque-Bera Test for
Normality, 6.8
Chapter Review, References, 7 Tails of the Unexpected (2):
Outing the Outliers, 7.1 Outing the Outliers: Detecting and Dealing with
Outliers, 7.1.1 Mitigation of Type I and Type II Outlier Errors, 7.2 Tukey
Fences, 7.2.1 Tukey Slimline Fences - For Larger Samples and Less Tolerance
of Outliers? 7.3 Chauvenet's Criterion, 7.3.1 Variation on Chauvenet's
Criterion for Small Sample Sizes (SSS), 7.3.2 Taking a Q-Q Perspective on
Chauvenet's Criterion for Small Sample Sizes (SSS), 7.4 Peirce's Criterion,
7.5 Iglewicz and Hoaglin's MAD Technique, 7.6 Grubbs' Test, 7.7 Generalised
Extreme Studentised Deviate (GESD), 7.8 Dixon's Q-Test, 7.9 Doing the JB
Swing - Using Skewness and Excess Kurtosis to identify Outliers, 7.10 Outlier
Tests - A Comparison, 7.11
Chapter Review, References, Glossary of Estimating
Terms
Alan R. Jones is Principal Consultant at Estimata Limited, an estimating consultancy service. He is a Certified Cost Estimator/Analyst (US) and Certified Cost Engineer (CCE) (UK). Prior to setting up his own business, he has enjoyed a 40-year career in the UK aerospace and defence industry as an estimator, culminating in the role of Chief Estimator at BAE Systems. Alan is a Fellow of the Association of Cost Engineers and a Member of the International Cost Estimating and Analysis Association. Historically (some four decades ago), Alan was a graduate in Mathematics from Imperial College of Science and Technology in London, and was an MBA Prize-winner at the Henley Management College (. . . that was slightly more recent, being only two decades ago). Oh, how time flies when you are enjoying yourself.