Type III Error: Can I use 2.66 MR for Control Limits if my data is not Normal?

Sunday, September 11, 2005

Can I use 2.66 MR for Control Limits if my data is not Normal?

Since the factor 2.66 for MR-bar depends on the Normal Distribution, what do I do if my data does not follow the Normal Distribution?

When sample size n=1 it is common to calculate Control Limits by
CL ± 2.66×MR-Bar
where
CL (Center Line) is usually estimated by some data average and

MR-Bar is the average of successive adjacent point pair-wise ranges over the same data (average moving range of size 2) and

2.66 is a factor based on the Normal Distribution.

What is commonly overlooked, is where the factor 2.66 comes from. This factor is 3/d2 where d2 is the mean ratio of the range of two independent random Normal Distribution values to the standard deviation of the Normal Distribution. The value of d2 is a factor that can be used to estimate the standard deviation from the sample range. There are tabulated values of d2 for different sample sizes. In this case the sample size of 2 is of interest because the range used in Individual Moving Range Charts is of two adjacent values in the data series.

The value of d2 depends not only on the sample size for the range, but also on the distribution. The usual values for d2 and 3/d2 are based on the Normal Distribution. Whether these resulting values apply to other distributions or how approximate they may be is rarely questioned by Quality Control practitioners.
The value of 3/d2 can be both lower or higher than the usual Normal Distribution based value 2.66, depending on the distribution. For low degree of freedom t-Distributions (symmetric distributions), the correct value is higher. These are heavy-tailed distributions. For low degree of freedom Chi Square distributions (skewed distributions), the correct value is lower. These are skewed distributions. This does not appear to be a pattern for symmetric and skewed distributions, since at least one case of a Gamma Distribution (skewed) has a higher value of 3/d2 than the Normal Distribution. Incidentally, Chi Square distributions are Gamma Distributions. And the Uniform Distribution (symmetric distribution) appears to have a slightly lower value of 3/d2 than the Normal Distribution. Since high degree of freedom t-Distributions and Chi Square Distributions are approximately Normal Distributions, their values of 3/d2 approach the Normal Distribution value of 2.66.

The value d2 can no doubt be computed analytically for the Normal Distribution and various other distributions. However, I am not that good of a mathematician and it is easier and quicker to estimate values of d2 for different distributions by simulation of several tens of thousands of cases (more cases would give more precise values).

The following estimated values of d2 = R2 / σ were calculated by simulations with SAS.

Distribution d2 3/d2

Normal Distribution 1.128 2.66

t-Distribution df=1 Not Possible (no Standard Deviation)
t-Distribution df=2 Not Possible
t-Distribution df=3 0.850 3.53
t-Distribution df=4 0.979 3.06
t-Distribution df=5 1.032 2.91
t-Distribution df=6 1.058 2.84
t-Distribution df=9 1.093 2.75
…
t-Distribution df=29 1.122 2.67

Uniform Distribution 1.154 2.60

Chi Square df=1 1.415 2.12
Chi Square df=2 1.274 2.36
Chi Square df=3 1.224 2.45
Chi Square df=4 1.200 2.50
Chi Square df=5 1.187 2.53
Chi Square df=6 1.177 2.55
Chi Square df=7 1.169 2.57
Chi Square df=10 1.157 2.59
Chi Square df=30 1.138 2.64

Gamma Alpha=2 Beta=1 1.060 2.83

The more or less correctness of the above values can be assessed by the fact that the simulation for the Normal Distribution comes out very close to the standard published values. And also from the fact that the high degree of freedom t-Distributions and Chi Square Distributions also approach the Normal Distribution values as would be expected.

So, one might ask, should we use one of these factors instead of 2.66? And what if our distribution is neither Normal nor one of these ? What should we use then?

I am not recommending that these factors be used in place of 2.66. I present these factors to show that 2.66 is not a reliable value if the distribution is not Normal.

Douglas Montgomery in his fine book Introduction to Statistical Quality Control says “if the process shows even moderate departure from normality, the control limits given here [+/-2.66 MR-Bar] may be entirely inappropriate. He suggests using control limits “based on the percentiles of the correct distribution.” That is fine if you know what that distribution is.

The advantage of the X-Bar Control Chart is that since it charts a Mean value, the Central Limit Theorem comes into action. Means are approximately Normally distributed. So with an X-bar Chart we more or less do know what the distribution is. It is Normal, approximately. So we can rely on the factor 3 for the standard deviation to always mean the same thing, regardless of the raw data distribution.

My suggestion is to use X-Bar Mean charts, not Individual Value Charts. If you must use the Individual Values, then use EWMA or Cusum.

Type III Error

Sunday, September 11, 2005

Can I use 2.66 MR for Control Limits if my data is not Normal?

0 Comments:

About Me

Previous Posts