Once more I have to apologise in advance to any statistics practitioners. In the prior article we explored basic statistics terminology and dealt with the expected value or mean of a set of determined values. Here, we extend this to the median, mode, variance and the helpful standard deviation. Hopefully, this article will offer people a bit more grasp with these terms.
Median:
In the earlier article we reviewed the 'expected value' or 'mean' of a set of values that we could determine. Those were:
Values: 3, 5, 5, 6, 7, 9, 10, 11, 12, 12, 15, 16
Generally, measured values would have an arbitrary sequence but the values above have been ordered from low to high.
The median value is that value at the middle of the range that has 50 % of its values greater and 50 % of its values lower.
In the above case there isn't a 'midpoint' value because we possess twelve values. And so, we pick the two central values of 10 and 9 and average them to obtain 9.5. That is the median value.
If we had the values: 3, 5, 5, 6, 7, 9, 9, 10, 11, 12, 12, 15, 16, the middle value (of a total number of 13) would be 9 as the median.
Anytime values change a lot the median can be beneficial as a tool that smoothes out the data values. It can serve to track trends in the data by way of documenting the median values. Data values can then be seen as a difference from the median and may provide an idea whether it is migrating beyond this trend.
If the median value is exactly the same as the mean or expected value at that point there is a balanced spread of values. Whenever the median is greater or less than the expected value or mean, then the distribution of the values will be biased either towards the right or left.
Mode:
When it comes to basic statistics terminology this is basic. If we once more consider the above values measured and adjust one from 15 to 12 we have:
Values: 3, 5, 5, 6, 7, 9, 10, 11, 12, 12, 12, 16
The mode is that value that arises the most times, in the above case it will be 12, which arises in 3 places.
There might be 2 or more modes in a set of values.
Variance:
If you recall, the mean was additionally known as the 'expected value'. Each determined data value will vary from this mean or expected value by a certain amount. The variance delivers a concept of exactly how 'spread out the data values are' when compared to the mean or expected value.
The overall variance is equal to the average of the sum of the individual variances.
The variance is determined as the square of the deviation between it and the mean or expected value. For example:
If we investigate 6 (the 4th value) the variance will be:
Variance = (6-9.25) x (6-9.25) = (-3.25) x (-3.25) = 10.56
We could calculate this for all of the values, sum them up then divide by the number of values, 12 to get the overall variance.
We could use this basic principle for a simple project activity delay in the last article:
Delay..........Probability..........Contribution
6......................0.3...................6 x 0.3 = 1.8
16....................0.5..................16 x 0.5 = 8.0
20....................0.2...................20 x 0.2 = 4.0
The expected value = 1.8 + 8.0 + 4.0 = 13.8 weeks
The total variance will be the total of the separate variances divided by 3, the number of values.
Overall variance = [(6-- 13.8) x (6-- 13.8) x 0.3 + (16-- 13.8) x (16-- 13.8) x 0.5 + (20-- 13.8) x (20-- 13.8) x 0.2]/3
= [(-7.8) x (-7.8) x 0.3 + (2.2) x (2.2) x 0.5 + (6.2) x (6.2) x 0.2]/3
= [(60.84 x 0.3) + (4.84 x 0.5) + (38.44 x 0.2)]/3
= (18.25 + 2.42 +7.69)/3
= 28.36/3
= 9.45
This gives a notion of the distribution of values with respect to the 'expected value'.
Notice that in this illustration the 'values' were put forward by an expert's assessment founded upon assumptions, so these are not 'determined data values'. For determined values we would have had to, actually, carry out the activity 3 times in exactly the exact same way and shown that, on those 3 different circumstances, the delays were 6, 16 and 20 weeks. This would not take place in practice.
Standard deviation:
It is the square root of the variance. For the earlier example we get:
Standard deviation = √9.45 = 3.07
It is a really useful value.
When we measure values there will occur a 68 percent likelihood that each of the data values will fall inside 1 standard deviation of the expected value or mean.
For the instance above:
Expected value or mean = 13.8
Variance = 9.45
Standard deviation = 3.07
68 percent of values will fall within (13.8-- 3.07) and (13.8 + 3.07) = 10.73 to 17.5
In a similar way 95 % of the values will occur within 2 standard deviations and 99.7 % will fall inside 3 standard deviations. So, we would have:
2 standard deviations = 6.14
3 standard deviations = 9.21
95 percent of values will fall inside (13.8-- 6.14) and (13.8 + 6.14) = 7.66 to 19.94
99.7 % of values will land within (13.8-- 9.21) and (13.8 + 9.21) = 4.59 to 23.01
Ideally, this article has presented a modest insight into a few statistical expressions.