Type III Error: Why do I use 2.66 when Control Limits are 3 Times Sigma?

Sunday, August 21, 2005

Why do I use 2.66 when Control Limits are 3 Times Sigma?

I am frequently asked something like this, which happened again recently:

I have seen Control Limits based on Moving Range (MR) multiplied by 2.66. What does this have to do with Three-Sigma Limits? I thought Control Limits were three standard deviatoins out. Can you explain this type of control limit?

Here was my reply:

Yes, this is a very confusing point to a lot of people.

It isn’t pretty, but you asked for it. It is not difficult to understand, but probably not what you expected. If you understand this, then you are ahead of 99+% of the people who use it.

The formula you are referring to is
Mean +/- 2.66 x MR-Bar

Where
Mean is the Control Chart Center Line, the apparent Process Mean
MR-Bar is the Average Pair-wise Moving Range
2.66 is a magic number multiplier
+/- means ‘plus or minus”, meaning “plus” for the upper control limit and “minus” for the lower control limit.

The Pair-wise Moving Range is the absolute difference between adjacent pairs of control chart points.

That is, if the values (points) are

Values 12 15 13 14 12
Differences - 3 -2 1 -2
Absolute Diff (MR) - 3 2 1 2

Then the average MR (Average Moving Range or MR-Bar) is
2 = 8/4 = (3+2+1+2)/4

The apparent process mean or average (CL Centerline) is
13.2 = 66/5 = (12+15+13+14+12)/5

So the Upper Control Limit is calculated as

UCL = CL + 2.66 x MR-Bar
UCL = 13.2 + (2.66) x (2)
UCL = 13.2 + 5.32
UCL = 18.52

UCL = CL - 2.66 x MR-Bar
UCL = 13.2 - (2.66) x (2)
UCL = 13.2 - 5.32
UCL = 7.88

So much for the arithmetic. Except in reality, you would need to use many more data values, and you should cull out MR cases involving shifts or outrageous spikes up or down.

What is really confusing about all of this is that in other contexts, it is explained that Control Limits are

Mean +/- 3 x Sigma

Where Sigma is the Process Standard Deviation.

Actually, this is a simplification. Really the Control Limits are

Mean +/- 3 x Sigma / SquareRoot(n)

Where “n” is the sample size of the plotted point.

That is, the points being plotted are themselves averages of several values collected at the same time from the process, like all from the same lot or the same run or the same shift or whatever the plotting unit is.

Sigma is the Standard Deviation characteristic of such values collected at about the same time. That is, the Sigma in the formula’s above is the standard deviation within these groups, where each subgroup is averaged and it is the average that is plotted.

If there are n items in each subgroup, then subgroup averages of n values are what is plotted.

This is called an X-Bar Chart.

This is a proper Control Chart.

Now, what happens is that people want to cut costs and do as little as possible. So there is immense pressure to choose n to be 1. That is, there is only one value measured in the subgroup. An average of one value is just the value.

So, the formula for the Control Limits reduces to

Mean +/- 3 x Sigma

With no dividing by the Square root since the Square Root of one is one.

The problem with all of this is that it is impossible to estimate the Standard Deviation (Sigma) from within a group of one point.

Conceivably, we could collect several values for each subgroup (plotted point) for a while in order to estimate sigma (standard deviation), then drop back to one value per subgroup for routine running. But no one wants to do that.

Now if you cannot calculate variation within a subgroup, the next best thing is variation between subgroups close to each other. If there is no intrinsic subgroup-to-subgroup variation (usually called Assignable Cause or Special Cause variation), then this variation would also be the same as Sigma within a Subgroup. If this was true, then also, the variation of all the subgroup values would be the same as Sigma, too. However, no one believes that (nevertheless misguided people do compute it this way for Control Charts).

However, if we suppose that while there is perhaps some intrinsic subgroup-to-subgroup variation, it is minimal or near zero between adjacent subgroups, then we could estimate Sigma pretty closely from differences between adjacent subgroup individual values.

Now there is a direct way to calculate the pair-wise standard deviation, but hardly anyone uses it (too scary).

Now it turns out, that for a Normal Distribution there is on the average a relationship between the Range of n values and the Standard Deviation. Basically, it is this

Range = d2 x Sigma

Where d2 is a special valued constant. If you look at standard Quality Control tables you will find values of d2. I say “values” because the d2 value is different depending on the number n values included in the Range. Range is just the Largest Value minus the Smallest value. These tables were originally compiled for computing the Within-Subgroup Standard Deviation. That is, if subgroups of 5 values are being averaged and plotted, the Average Range would be divided by the d2 value for n=5 to get the estimated Standard Deviation Sigma to be used in the Formula

Mean +/- 3 x Sigma / SquareRoot(n)

That is

Sigma = (Average Range) / d2

where "Range" is the "Within-Subgroup Range."

Now in the case of the Moving Range considered above, the ranges are of two adjacent values, n=2. They are values between subgroups rather than within subgroups. However, as noted above, if there is no intrinsic subgroup-to-subgroup variation, then it is the same as within-subgroup variation.

So we could use this same formula

Sigma = (Average Range) / d2

Or rather

Sigma = (Average Moving Range) / d2

Or using the notation above

Sigma = MR-Bar / d2

To estimate Sigma and use it in the formula for n=1 subgroup sample size Control Limits as above

Mean +/- 3 x Sigma

Or

Mean +/- 3 x (MR-Bar / d2)

Or

Mean +/- (3 / d2) x MR-Bar

Now the question is, what is the value of d2 for converting Ranges to Standard Deviation for samples of size n=2? Look in any typical Quality Control table and you will find that for n=2 then d2=1.128. This value is a characteristic of Normal Distributions.

So we have

Mean +/- (3 / 1.128) x MR-Bar

But 3 / 1.128 is 2.659 or about 2.66.

So we have

Mean +/- (2.66) x MR-Bar

Which is the magic formula.

Notice, we had to go through a lot of gyrations to get here, but, it reduces to three things, namely,

Assume that there is not intrinsic subgroup-to-subgroup variation.
Assume that the data is distributed according to a Normal Distribution
Estimate the Standard Deviation from the Range instead of directly

This method, as mentioned above, is applied mainly to single valued subgroups.

A proper Control Chart uses subgroup averages rather than individual values. While individual values are occasionally Normally distributed, Average Values are always nearly Normally distributed. This is the importance of the Normal Distribution. It is not that typical data is distributed Normally (it is not). It is that averages of pretty much any data will be distributed approximately normally.

So, Individual Value or X-Charts (the ones using the magic number 2.66) are very vulnerable since the data distribution is rarely Normal, only sometimes even symmetric. And presuming that there is no intrinsic subgroup-to-subgroup variation is a stretch. You can cull out the obvious cases but if every case is contaminated with across-subgroup variation, it is a futile task. That is, we are doing a Control Chart because we have intrinsic subgroup-to-subgroup variation. So we base it on the assumption that there is none (!?).

Generally, Control charts of this type are not recommended. However, they are very popular.

Okay, so there you have it.

6 Comments:

At 4:56 AM, Robert Jones said...: Very helpful thanks. Still remarkably relevant considering the date it was written and how high it came up the search list.

As a newcomer to the topic, what types of control chart do you recommend, please?
At 12:06 PM, Unknown said...: Hi There,

I am wondering what your thoughts are on using same calculation for UCL/LCL if using the median rather than average to calculate the subgroup values
At 7:48 PM, Verge Marie said...: This has been very helpful. Thanks!
At 12:11 PM, Unknown said...: Now i know,
Thank you very much. 😘
At 5:44 AM, Tim said...: Excellent! Thank-you!
At 7:37 AM, Praveen said...: Super!!

Type III Error

Sunday, August 21, 2005

Why do I use 2.66 when Control Limits are 3 Times Sigma?

6 Comments:

About Me

Previous Posts