Normal Data – DMAICTools.com

A number of statistical tools require that the underlying data be normally distributed. Keep in mind that no real-world data-set is perfectly normal, but data should be checked to ensure that it is reasonably normal, when a given statistical tool requires it. Note: The 3.4 DPM level associated with Six Sigma processes assumes normal data.

First Option – Plot a Histogram

In the DMAIC world, plotting a histogram and looking at its shape is usually sufficient for checking normality. The only exception is when sample sizes are very small, in which case a normal probability plot (below) may be the best approach. Normally distributed data will form a bell-shaped histogram, with the highest bars in the middle, and progressively smaller bars toward the edges, as shown in the following data (randomly generated using MINITAB) –

histogram-normal

Second Option – Normal Probability Plot

Normal probability plots can take different forms, but all have one thing in common: the closer the data points are to the theoretical-normal line, the more likely it is that the data is normal.

The normal probability plots below show data values along the x-axis, versus the cumulative percentage of data points collected, on the y-axis. The blue line on the chart reflects a perfectly normal distribution:

normaldata0001

Here are some examples of normal and non-normal data (made into histograms), and their corresponding probability plots (generated with MINITAB software).

normaldata0002 normaldata0003 normaldata0004 normaldata0005 normaldata0006 normaldata0007 normaldata0008 normaldata0009

Note that the histograms are as indicative of normality (or non-normality) as the probability plots in these cases.

Defect-Rate Predictions and Non-Normal Data

Statistical techniques are available for dealing with non-normal data, but we’d like to bring some “real-world” perspective into the discussion from a Six Sigma practitioner’s viewpoint – Six Sigma practitioners get paid to reduce variation, not to model variation. It is far better for a team to put its energy into learning the underlying causes of variation than to get wrapped up in finding the correct distribution or transformation method to make defect-rate predictions.

Once the underlying causes are understood, process redesign and process control are much greater assurances of zero defects over the long run than the fact that a sample taken from the population happened to be normal and capable at one point in time.

So the message here is, there are very few cases where non-normal data should stop a project from moving forward.