Validation Principle
OIVMAAS105 Principle of validation of routine methods with respect of reference methods
Principle of validation of routine methods with respect to reference methods
The OIV acknowledges the existence of methods of analysis of wines in addition to those described in the Summary of International Methods of Analysis of Wines and Musts, of common methods most often automated. These methods are economically and commercially important because they permit maintaining a complete and efficient analytical framework around the production and marketing of wine. Moreover, these methods allow the use of modern means of analysis and the development and adaptation of techniques of analysis.
In order to allow laboratories to use these methods and to insure their linkage to methods described within the Summary, the OIV decides to establish a plan of evaluation and validation by a laboratory of an alternative, common method, mechanized or not with respect to a reference method described in the Summary of International Methods of Analysis of Wines and Musts.
This principle, which will be adapted to the particular situation of the analysis of wines and musts, will take its inspiration from international standards in current use and allow the laboratory to assess and validate its alternative method in two ways:
Collaborative Study
OIVMAAS107 Collaborative study
The purpose of the collaborative study is to give a quantified indication of the precision of method of analysis, expressed as its repeatability r and reproducibility R.
Repeatability: the value below which the absolute difference between two single test results obtained using the same method on identical test material, under the same conditions (same operator, same apparatus, same laboratory and a short period of time) may be expected to lie within a specified probability.
Reproducibility: the value below which the absolute difference between two single test results obtained using the same method on identical test material, under different conditions (different operators, different apparatus and/or different laboratories and/or different time) may be expected to lie within a specified probability.
The term "individual result" is the value obtained when the standardized trial method is applied, once and fully, to a single sample. Unless otherwise stated, the probability is 95%.
General Principles
 The method subjected to trial must be standardized, that is, chosen from the existing methods as the method best suited for subsequent general use.
 The protocol must be clear and precise.
 The number of laboratories participating must be at least ten.
 The samples used in the trials must be taken from homogeneous batches of material.
 The levels of the analyte to be determined must cover the concentrations generally encountered.
 Those taking part must have a good experience of the technique employed.
 For each participant, all analyses must be conducted within the same laboratory by the same analyst.
 The method must be followed as strictly as possible. Any departure from the method described must be documented.
 The experimental values must be determined under strictly identical conditions: on the same type of apparatus, etc.
 They must be determined independently of each other and immediately after each other.
 The results must be expressed by all laboratories in the same units, to the same number of decimal places.
 Five replicate experimental values must be determined, free from outliers. If an experimental value is an outlier according to the Grubbs test, three additional measurements must be taken.
Statistical Model
The statistical methods set out in this document are given for one level (concentration, sample). If there are a number of levels, the statistical evaluation must be made separately for each. If a linear relationship is found (y = bx or y = a + bx) as between the repeatability (r) or reproducibility (R) and the concentration (), a regression of r (or R) may be run as a function of .
The statistical methods given below suppose normally‑distributed random values.
The steps to be followed are as follows:
A/ Elimination of outliers within a single laboratory by Grubbs test. Outliers are values which depart so far from the other experimental values that these deviations cannot be regarded as random, assuming the causes of such deviations are not known.
B/ Examine whether all laboratories are working to the same precision, by comparing variances by the Bartlett test and Cochran test. Eliminate those laboratories for which statistically deviant values are obtained.
C/ Track down the systematic errors from the remaining laboratories by a variance analysis and by a Dixon test identify the extreme outlier values. Eliminate those laboratories for which the outlier values are significant.
D/ From the remaining figures, calculate standard deviation of repeatability); Sr., and repeatability r standard deviation of reproducibility SR and reproducibility R.
Notation:
The following designations have been chosen:
m Number of laboratories
i(i = 1, 2... m) Index (No. of the laboratory)
Number of individual values from the ith laboratory
Total number of individual values
x(i = 1, 2... ni) Individual value of the ith laboratory
Mean value of the ith laboratory
Total mean value
Standard deviation of the ith laboratory
A/ Verification of outlier values within one laborator
After determining five individual values , a Grubbs test is performed at the laboratory, to identify the outliers’ values.
Test the null hypothesis whereby the experimental value with the greatest absolute deviation from the mean is not an outlier observation.
Calculate PG =
= suspect value
Compare PG with the corresponding value shown in Table 1 for P = 95%.
If PG < value as read, value is not an outlier and si can be calculated.
If PG > value as read, value probably is an outlier therefore make a further three determinations.
Calculate the Grubbs test for with the eight determinations.
If PG > corresponding value for P = 99%, regard as a deviant value and calculate without .
B/ Comparison of variances among laboratories
Bartlett Test
The Bartlett test allows us to examine both major and minor variances. It serves to test the null hypothesis of the equality of variances in all laboratories, as against the alternative hypothesis whereby the variances are not equal in the case of some laboratories.
At least five individual values are required per laboratory.
Calculate the statistics of the test:




Compare PB with the value indicated in table 2 at m  1 degrees of freedom.
If PB > the value in the table, there are differences among the variances.
The Cochran test is used to confirm that the variance from one laboratory is greater than that from other laboratories.
Calculate the test statistics:

Compare PC with the value shown in table 3 for m and at P = 99%.
If PC > the table value, the variance is significantly greater than the others.
If there is a significant result from the Bartlett or Cochran tests, eliminate the outlier variance and calculate the statistical test again.
In the absence of a statistical method appropriate to a simultaneous test of several outlier values, the repeated application of the tests is permitted, but should be used with caution.
If the laboratories produce variances that differ sharply from each other, an investigation must be made to find the causes and to decide whether the experimental values found by those laboratories are to be eliminated or not. If they are, the coordinator will have to consider how representative the remaining laboratories are.
If statistical analysis shows that there are differing variances, this shows that the laboratories have operated the methods at varying precisions. This may be due to inadequate practice or to lack of clarity or inadequate description in the method.
C/ Systematic errors
Systematic errors made by laboratories are identified using either Fischer's method or Dixon's test.
R .A. Fischer variance analysis
This test is applied to the remaining experimental values from the laboratories with an identical variance.
The test is used to identify whether the spread of the mean values from the laboratories is very much greater than that for the individual values expressed by the variance among the laboratories () or the variance within the laboratories ().
Calculate the test statistics :



Compare PF with the corresponding value shown in table 4 (distribution of F) where =_{ }= m ‑ 1 and = = N ‑ m degrees of freedom.
If PF > the table value, it can be concluded that there are differences among the means, that is, there are systematic errors.
Dixon test
This test enables us to confirm that the mean from one laboratory is greater or smaller than that from the other laboratories.
Take a data series Z(h), h = 1,2,3...H, ranged in increasing order.
Calculate the statistics for the test:
3 to 7 
Or 
8 to 12 
Or 
13 plus 
Or 
Compare the greatest value of Q with the critical values shown in table 5.
If the test statistic is > the table value at P = 95%, the mean in question can be regarded as an outlier.
If there is a significant result in the R A Fischer variance analysis or the Dixon test, eliminate one of the extreme values and calculate the test statistics again with
the remaining values. As regards repeated application of the tests, see the explanations in paragraph (B).
If the systematic errors are found, the corresponding experimental values concerned must not be included in subsequent computations; the cause of the systematic error must be investigated.
D/Calculating repeatability (r) and reproducibility (R).
From the results remaining after elimination of outliers, calculate the standard deviation of repeatability sr and repeatability r, and the standard deviation of reproducibility sR and reproducibility R, which are shown as characteristic values of the method of analysis.





If there is no difference between the means from the laboratories, then there is no difference between sr and sR or between r and R. But, if we find differences among the laboratory means, although these may be tolerated for practical considerations, we have to show and and r and R.
Bibliography
 AFNOR, norme NFX06041, Fidélitè des méthodes d'essai. Détermination de la répétabilité et de la reproductibilité par essais interlaboratoires.
 DAVIES O. L., GOLDSMITH P.l., Statistical Methods in Research and Production, Oliver and Boyd, Edinburgh, 1972.
 GOETSCH F. H., KRÖNERT W., OLSCHIMKE D., OTTO U., VIERKÖTTER S., Meth. An., 1978, No 667.
 GOTTSCHALK G., KAISER K. E., Einführung in die Varianzanalyse und Ringversuche, B‑1 Hoschultaschenbücher, Band 775, 1976.
 GRAF, HENNING, WILRICH, Statistische Methoden bei textilen Untersuchungen, Springer Verlag, Berlin, Heidelberg, New York, 1974.
 GRUBBS F. E., Sample Criteria for Testing Outlying Observations, The Annals of Mathematical Statistics, 1950, vol. 21, p 27‑58.
 GRUBBS F. E., Procedures for Detecting Outlying Observations in Samples, Technometrics, 1969, vol. 11, No 1, p 1‑21.
 GRUBBS F. E. and BECK G., Extension of Sample Sizes and Percentage Points for Significance Tests of Outlying Observations, Technometrics, 1972, vol. 14, No 4, p 847‑854.
 ISO, norme 5725.
 KAISER R., GOTTSCHALK G., Elementare Tests zur Beurteilung von Messdaten, B‑I Hochschultaschenbücher, Band 774, 1972.
 LIENERT G. A., Verteilungsfreie Verfahren in der Biostatistik, Band I, Verlag Anton Haine, Meisenheim am Glan, 1973.
 NALIMOV V. V., The Application of Mathematical Statistics to Chemical Analysis, Pergamon Press, Oxford, London, Paris, Frankfurt, 1963.
 SACHS L., Statistische Auswertungsmethoden, Springer Verlag, Berlin, Heidelberg, New York, 1968
Table 1  Critical values for the Grubbs test 


P = 95% 
P 99% 

3 4 5 6 7 8 9 10 11 12 
1,155 1,481 1,715 1,887 2,020 2,126 2,215 2,290 2,355 2,412 
1,155 

1,496 

1,764 

1,973 

2,139 

2,274 

2,387 

2,482 

2,564 

2,636 

Table 2 – Critical values for the Bartlett test (P = 95%) 

f(m  1) 
X^{2} 
f(m  1) 
X^{2} 

1 
3,84 5,99 7,81 9,49 11,07 12,59 14,07 15,51 16,92 18,31 19,68 21,03 22,36 23,69 25,00 26,30 27,59 28,87 30,14 31,41 
21 22 23 24 25 26 27 28 29 30 35 40 50 60 70 80 90 100 
32,7 33,9 35,2 36,4 37,7 38,9 40,1 41,3 42,6 43,8 49,8 55,8 67,5 79,1 90,5 101,9 113,1 124,3 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

Table 3 – Critical values for the Cochran test 

m 
n_{i} = 2 
n_{i}= 3 
n_{i} = 4 
n_{i} = 5 
n_{i} = 6 

99% 
95% 
99% 
95% 
99% 
95% 
99% 
95% 
99% 
95% 

2 
 
 
0.995 
0.975 
0.979 
0.939 
0.959 
0.906 
0.937 
0.877 

3 
0.993 
0.967 
0.942 
0.871 
0.883 
0.798 
0.834 
0.746 
0.793 
0.707 

4 
0.968 
0.906 
0.864 
0.768 
0.781 
0.684 
0.721 
0.629 
0.676 
0.590 

5 
0.928 
0.841 
0.788 
0.684 
0.696 
0.598 
0.633 
0.544 
0.588 
0.506 

6 
0.883 
0.781 
0.722 
0.616 
0.626 
0.532 
0.564 
0.480 
0.520 
0.445 

7 
0.838 
0.727 
0.664 
0.561 
0.568 
0.480 
0.508 
0.431 
0.466 
0.397 

8 
0.794 
0.680 
0.615 
0.516 
0.521 
0.438 
0.463 
0.391 
0.423 
0.360 

9 
0.754 
0.638 
0.573 
0.478 
0.481 
0.403 
0.425 
0.358 
0.387 
0.329 

10 
0.718 
0.602 
0.536 
0.445 
0.447 
0.373 
0.393 
0.331 
0.357 
0.303 

11 
0.684 
0.570 
0.504 
0.417 
0.418 
0.348 
0.366 
0.308 
0.332 
0.281 

12 
0.653 
0.541 
0.475 
0.392 
0.392 
0.326 
0.343 
0.288 
0.310 
0.262 

13 
0.624 
0.515 
0.450 
0.371 
0.369 
0.307 
0.322 
0.271 
0.291 
0.246 

14 
0.599 
0.492 
0.427 
0.352 
0.349 
0.291 
0.304 
0.255 
0.274 
0.232 

15 
0.575 
0.471 
0.407 
0.335 
0.332 
0.276 
0.288 
0.242 
0.259 
0.220 

16 
0.553 
0.452 
0.388 
0.319 
0.316 
0.262 
0.274 
0.230 
0.246 
0.208 

17 
0.532 
0.434 
0.372 
0.305 
0.301 
0.250 
0.261 
0.219 
0.234 
0.198 

18 
0.514 
0.418 
0.356 
0.293 
0.288 
0.240 
0.249 
0.209 
0.223 
0.189 

19 
0.496 
0.403 
0.343 
0.281 
0.276 
0.230 
0.238 
0.200 
0.214 
0.181 

20 
0.480 
0.389 
0.330 
0.270 
0.265 
0.220 
0.229 
0.192 
0.205 
0.174 

21 
0.465 
0.377 
0.318 
0.261 
0.255 
0.212 
0.220 
0.185 
0.197 
0.167 

22 
0.450 
0.365 
0.307 
0.252 
0.246 
0.204 
0.212 
0.178 
0.189 
0.160 

23 
0.437 
0.354 
0.297 
0.243 
0.238 
0.197 
0.204 
0.172 
0.182 
0.155 

24 
0.425 
0.343 
0.287 
0.235 
0.230 
0.191 
0.197 
0.166 
0.176 
0.149 

25 
0.413 
0.334 
0.278 
0.228 
0.222 
0.185 
0.190 
0.160 
0.170 
0.144 

26 
0.402 
0.325 
0.270 
0.221 
0.215 
0.179 
0.184 
0.155 
0.164 
0.140 

27 
0.391 
0.316 
0.262 
0.215 
0.209 
0.173 
0.179 
0.150 
0.159 
0.135 

28 
0.382 
0.308 
0.255 
0.209 
0.202 
0.168 
0.173 
0.146 
0.154 
0.131 

29 
0.372 
0.300 
0.248 
0.203 
0.196 
0.164 
0.168 
0.142 
0.150 
0.127 

30 
0.363 
0.293 
0.241 
0.198 
0.191 
0.159 
0.164 
0.138 
0.145 
0.124 

31 
0.355 
0.286 
0.235 
0.193 
0.186 
0.155 
0.159 
0.134 
0.141 
0.120 

32 
0.347 
0.280 
0.229 
0.188 
0.181 
0.151 
0.155 
0.131 
0.138 
0.117 

33 
0.339 
0.273 
0.224 
0.184 
0.177 
0.147 
0.151 
0.127 
0.134 
0.114 

34 
0.332 
0.267 
0.218 
0.179 
0.172 
0.144 
0.147 
0.124 
0.131 
0.111 

35 
0.325 
0.262 
0.213 
0.175 
0.168 
0.140 
0.144 
0.121 
0.127 
0.108 

36 
0.318 
0.256 
0.208 
0.172 
0.165 
0.137 
0.140 
0.119 
0.124 
0.106 

37 
0.312 
0.251 
0.204 
0.168 
0.161 
0.134 
0.137 
0.116 
0.121 
0.103 

38 
0.306 
0.246 
0.200 
0.164 
0.157 
0.131 
0.134 
0.113 
0.119 
0.101 

39 
0.300 
0.242 
0.196 
0.161 
0.154 
0.129 
0.131 
0.111 
0.116 
0.099 

40 
0.294 
0.237 
0.192 
0.158 
0.151 
0.126 
0.128 
0.108 
0.114 
0.097 

Table 4 – Critical values for the FTest (P=99%) 

f_{1} f_{2} 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 

1 
4052 
4999 
5403 
5625 
5764 
5859 
5928 
5981 
6023 
6056 
6083 
6106 
6126 
6143 
6157 

2 
98.5 
99.0 
99.2 
99.3 
99.3 
99.3 
99.4 
99.4 
99.4 
99.4 
99.4 
99.4 
99.4 
99.4 
99.4 

3 
34.1 
30.8 
29.4 
28.7 
28.2 
27.9 
27.7 
27.5 
27.3 
27.2 
27.1 
27.1 
27.0 
26.9 
26.9 

4 
21.2 
18.0 
16.7 
16.0 
15.5 
15.2 
15.0 
14.8 
14.7 
14.5 
14.5 
14.4 
14.3 
14.2 
14.2 

5 
16.3 
13.3 
12.1 
11.4 
11.0 
10.7 
10.5 
10.3 
10.2 
10.1 
9.96 
9.89 
9.82 
9.77 
9.72 

6 
13.7 
10.9 
9.78 
9.15 
8.75 
8.47 
8.26 
8.10 
7.98 
7.87 
7.79 
7.72 
7.66 
7.60 
7.56 

7 
12.2 
9.55 
8.45 
7.85 
7.46 
7.19 
6.99 
6.84 
6.72 
6.62 
6.54 
6.47 
6.41 
6.36 
6.31 

8 
11.3 
8.65 
7.59 
7.01 
6.63 
6.37 
6.18 
6.03 
5.91 
5.81 
5.73 
5.67 
5.61 
5.56 
5.52 

9 
10.6 
8.02 
6.99 
6.42 
6.06 
5.80 
5.61 
5.47 
5.35 
5.26 
5.18 
5.11 
5.05 
5.01 
4.96 

10 
10.0 
7.56 
6.55 
5.99 
5.64 
5.39 
5.20 
5.06 
4.94 
4.85 
4.77 
4.71 
4.65 
4.60 
4.56 

11 
9.64 
7.20 
6.21 
5.67 
5.31 
5.07 
4.88 
4.74 
4.63 
4.54 
4.46 
4.39 
4.34 
4.29 
4.25 

12 
9.33 
6.93 
5.95 
5.41 
5.06 
4.82 
4.64 
4.50 
4.39 
4.30 
4.22 
4.16 
4.10 
4.05 
4.01 

13 
9.07 
6.70 
5.74 
5.21 
4.86 
4.62 
4.44 
4.30 
4.19 
4.10 
4.02 
3.96 
3.90 
3.86 
3.82 

14 
8.86 
6.51 
5.56 
5.04 
4.69 
4.46 
4.28 
4.14 
4.03 
3.94 
3.86 
3.80 
3.75 
3.70 
3.66 

15 
8.68 
6.36 
5.42 
4.89 
4.56 
4.32 
4.14 
4.00 
3.89 
3.80 
3.73 
3.67 
3.61 
3.56 
3.52 

16 
8.53 
6.23 
5.29 
4.77 
4.44 
4.20 
4.03 
3.89 
3.78 
3.69 
3.62 
3.55 
3.50 
3.45 
3.41 

17 
8.40 
6.11 
5.18 
4.67 
4.34 
4.10 
3.93 
3.79 
3.68 
3.59 
3.52 
3.46 
3.40 
3.35 
3.31 

18 
8.29 
6.01 
5.09 
4.58 
4.25 
4.01 
3.84 
3.71 
3.60 
3.51 
3.43 
3.37 
3.32 
3.27 
3.23 

19 
8.18 
5.93 
5.01 
4.50 
4.17 
3.94 
3.77 
3.63 
3.52 
3.43 
3.36 
3.30 
3.24 
3.19 
3.15 

20 
8.10 
5.85 
4.94 
4.43 
4.10 
3.87 
3.70 
3.56 
3.46 
3.37 
3.29 
3.23 
3.18 
3.13 
3.09 

21 
8.02 
5.78 
4.87 
4.37 
4.04 
3.81 
3.64 
3.51 
3.40 
3.31 
3.24 
3.17 
3.12 
3.07 
3.03 

22 
7.95 
5.72 
4.82 
4.31 
3.99 
3.76 
3.59 
3.45 
3.35 
3.26 
3.18 
3.12 
3.07 
3.02 
2.98 

23 
7.88 
5.66 
4.76 
4.26 
3.94 
3.71 
3.54 
3.41 
3.30 
3.21 
3.14 
3.07 
3.02 
2.97 
2.93 

24 
7.82 
5.61 
4.72 
4.22 
3.90 
3.67 
3.50 
3.36 
3.26 
3.17 
3.09 
3.03 
2.98 
2.93 
2.89 

25 
7.77 
5.57 
4.68 
4.18 
3.85 
3.63 
3.46 
3.32 
3.22 
3.13 
3.06 
2.99 
2.94 
2.89 
2.85 

26 
7.72 
5.53 
4.64 
4.14 
3.82 
3.59 
3.42 
3.29 
3.18 
3.09 
3.02 
2.96 
2.90 
2.86 
2.81 

27 
7.68 
5.49 
4.60 
4.11 
3.78 
3.56 
3.39 
3.26 
3.15 
3.06 
2.99 
2.93 
2.87 
2.82 
2.78 

28 
7.64 
5.45 
4.57 
4.07 
3.75 
3.53 
3.36 
3.23 
3.12 
3.03 
2.96 
2.90 
2.84 
2.79 
2.75 

29 
7.60 
5.42 
4.54 
4.04 
3.73 
3.50 
3.33 
3.20 
3.09 
3.00 
2.93 
2.87 
2.81 
2.77 
2.73 

30 
7.56 
5.39 
4.51 
4.02 
3.70 
3.47 
3.30 
3.17 
3.07 
2.98 
2.91 
2.84 
2.79 
2.74 
2.70 

40 
7.31 
5.18 
4.31 
3.83 
3.51 
3.29 
3.12 
2.99 
2.89 
2.80 
2.73 
2.66 
2.61 
2.56 
2.52 

50 
7.17 
5.06 
4.20 
3.72 
3.41 
3.19 
3.02 
2.89 
2.78 
2.70 
2.62 
2.56 
2.51 
2.46 
2.42 

60 
7.07 
4.98 
4.13 
3.65 
3.34 
3.12 
2.95 
2.82 
2.72 
2.63 
2.56 
2.50 
2.44 
2.39 
2.35 

70 
7.01 
4.92 
4.07 
3.60 
3.29 
3.07 
2.91 
2.78 
2.67 
2.59 
2.51 
2.45 
2.40 
2.35 
2.31 

80 
6.96 
4.88 
4.04 
3.56 
3.25 
3.04 
2.87 
2.74 
2.64 
2.55 
2.48 
2.42 
2.36 
2.31 
2.27 

90 
6.92 
4.85 
4.01 
3.53 
3.23 
3.01 
2.84 
2.72 
2.61 
2.52 
2.45 
2.39 
2.33 
2.29 
2.24 

100 
6.89 
4.82 
3.98 
3.51 
3.21 
2.99 
2.82 
2.69 
2.59 
2.50 
2.43 
2.37 
2.31 
2.27 
2.22 

200 
6.75 
4.71 
3.88 
3.41 
3.11 
2.89 
2.73 
2.60 
2.50 
2.41 
2.34 
2.27 
2.22 
2.17 
2.13 

500 
6.69 
4.65 
3.82 
3.36 
3.05 
2.84 
2.68 
2.55 
2.44 
2.36 
2.29 
2.22 
2.17 
2.12 
2.07 


6.63 
4.61 
3.78 
3.32 
3.02 
2.80 
2.64 
2.51 
2.41 
2.32 
2.25 
2.18 
2.13 
2.08 
2.04 

Table 4 – Critical values for the FTest (P=99%) [Continued] 

f_{1} f_{2} 
16 
17 
18 
19 
20 
30 
40 
50 
60 
70 
80 
100 
200 
500 


1 
6169 
6182 
6192 
6201 
6209 
6261 
6287 
6303 
6313 
6320 
6326 
6335 
6350 
6361 
6366 

2 
99.4 
99.4 
99.4 
99.4 
99.5 
99.5 
99.5 
99.5 
99.5 
99.5 
99.5 
99.5 
99.3 
99.5 
99.5 

3 
26.8 
26.8 
26.8 
26.7 
26.7 
26.5 
26.4 
26.4 
26.3 
26.3 
26.3 
26.2 
26.2 
26.1 
26.1 

4 
14.2 
14.1 
14.1 
14.0 
14.0 
13.8 
13.7 
13.7 
13.7 
13.6 
13.6 
13.6 
13.5 
13.5 
13.5 

5 
9.68 
9.64 
9.61 
9.58 
9.55 
9.38 
9.29 
9.24 
9.20 
9.18 
9.16 
9.13 
9.08 
9.04 
9.02 

6 
7.52 
7.48 
7.45 
7.42 
7.40 
7.23 
7.14 
7.09 
7.06 
7.03 
7.01 
6.99 
6.93 
6.90 
6.88 

7 
6.28 
6.24 
6.21 
6.18 
6.16 
5.99 
5.91 
5.86 
5.82 
5.80 
5.78 
5.75 
5.70 
5.67 
5.65 

8 
5.48 
5.44 
5.41 
5.38 
5.36 
5.20 
5.12 
5.07 
5.03 
5.01 
4.99 
4.96 
4.91 
4.88 
4.86 

9 
4.92 
4.89 
4.86 
4.83 
4.81 
4.65 
4.57 
4.52 
4.48 
4.46 
4.44 
4.41 
4.36 
4.33 
4.31 

10 
4.52 
4.49 
4.46 
4.43 
4.41 
4.25 
4.17 
4.12 
4.08 
4.06 
4.04 
4.01 
3.96 
3.93 
3.91 

11 
4.21 
4.18 
4.15 
4.12 
4.10 
3.94 
3.86 
3.81 
3.77 
3.75 
3.73 
3.70 
3.65 
3.62 
3.60 

12 
3.97 
3.94 
3.91 
3.88 
3.86 
3.70 
3.62 
3.57 
3.54 
3.51 
3.49 
3.47 
3.41 
3.38 
3.36 

13 
3.78 
3.74 
3.72 
3.69 
3.66 
3.51 
3.42 
3.37 
3.34 
3.32 
3.30 
3.27 
3.22 
3.19 
3.17 

14 
3.62 
3.59 
3.56 
3.53 
3.51 
3.35 
3.27 
3.22 
3.18 
3.16 
3.14 
3.11 
3.06 
3.03 
3.00 

15 
3.49 
3.45 
3.42 
3.40 
3.37 
3.21 
3.13 
3.08 
3.05 
3.02 
3.00 
2.98 
2.92 
2.89 
2.87 

16 
3.37 
3.34 
3.31 
3.28 
3.26 
3.10 
3.02 
2.97 
2.93 
2.91 
2.89 
2.86 
2.81 
2.78 
2.75 

17 
3.27 
3.24 
3.21 
3.19 
3.16 
3.00 
2.92 
2.87 
2.83 
2.81 
2.79 
2.76 
2.71 
2.68 
2.65 

18 
3.19 
3.16 
3.13 
3.10 
3.08 
2.92 
2.84 
2.78 
2.75 
2.72 
2.70 
2.68 
2.62 
2.59 
2.57 

19 
3.12 
3.08 
3.05 
3.03 
3.00 
2.84 
2.76 
2.71 
2.67 
2.65 
2.63 
2.60 
2.55 
2.51 
2.49 

20 
3.05 
3.02 
2.99 
2.96 
2.94 
2.78 
2.69 
2.64 
2.61 
2.58 
2.56 
2.54 
2.48 
2.44 
2.42 

21 
2.99 
2.96 
2.93 
2.90 
2.88 
2.72 
2.64 
2.58 
2.55 
2.52 
2.50 
2.48 
2.42 
2.38 
2.36 

22 
2.94 
2.91 
2.88 
2.85 
2.83 
2.67 
2.58 
2.53 
2.50 
2.47 
2.45 
2.42 
2.36 
2.33 
2.31 

23 
2.89 
2.86 
2.83 
2.80 
2.78 
2.62 
2.54 
2.48 
2.45 
2.42 
2.40 
2.37 
2.32 
2.28 
2.26 

24 
2.85 
2.82 
2.79 
2.76 
2.74 
2.58 
2.49 
2.44 
2.40 
2.38 
2.36 
2.33 
2.27 
2.24 
2.21 

25 
2.81 
2.78 
2.75 
2.72 
2.70 
2.54 
2.45 
2.40 
2.36 
2.34 
2.32 
2.29 
2.23 
2.19 
2.17 

26 
2.78 
2.75 
2.72 
2.69 
2.66 
2.50 
2.42 
2.36 
2.33 
2.30 
2.28 
2.25 
2.19 
2.16 
2.13 

27 
2.75 
2.71 
2.68 
2.66 
2.63 
2.47 
2.38 
2.33 
2.29 
2.27 
2.25 
2.22 
2.16 
2.12 
2.10 

28 
2.72 
2.68 
2.65 
2.63 
2.60 
2.44 
2.35 
2.30 
2.26 
2.24 
2.22 
2.19 
2.13 
2.09 
2.06 

29 
2.69 
2.66 
2.63 
2.60 
2.57 
2.41 
2.33 
2.27 
2.23 
2.21 
2.19 
2.16 
2.10 
2.06 
2.03 

30 
2.66 
2.63 
2.60 
2.57 
2.55 
2.39 
2.30 
2.25 
2.21 
2.18 
2.16 
2.13 
2.07 
2.03 
2.01 

40 
2.48 
2.45 
2.42 
2.39 
2.37 
2.20 
2.11 
2.06 
2.02 
1.99 
1.97 
1.94 
1.87 
1.85 
1.80 

50 
2.38 
2.35 
2.32 
2.29 
2.27 
2.10 
2.01 
1.95 
1.91 
1.88 
1.86 
1.82 
1.76 
1.71 
1.68 

60 
2.31 
2.28 
2.25 
2.22 
2.20 
2.03 
1.94 
1.88 
1.84 
1.81 
1.78 
1.75 
1.68 
1.63 
1.60 

70 
2.27 
2.23 
2.20 
2.18 
2.15 
1.98 
1.89 
1.83 
1.78 
1.75 
1.73 
1.70 
1.62 
1.57 
1.54 

80 
2.23 
2.20 
2.17 
2.14 
2.12 
1.94 
1.85 
1.79 
1.75 
1.71 
1.69 
1.65 
1.58 
1.53 
1.49 

90 
2.21 
2.17 
2.14 
2.11 
2.09 
1.92 
1.82 
1.76 
1.72 
1.68 
1.66 
1.62 
1.55 
1.50 
1.46 

100 
2.19 
2.15 
2.12 
2.09 
2.07 
1.89 
1.80 
1.74 
1.69 
1.66 
1.63 
1.60 
1.52 
1.47 
1.43 

200 
2.09 
2.06 
2.03 
2.00 
1.97 
1.79 
1.69 
1.63 
1.58 
1.55 
1.52 
1.48 
1.39 
1.33 
1.28 

500 
2.04 
2.00 
1.97 
1.94 
1.92 
1.74 
1.63 
1.56 
1.52 
1.48 
1.45 
1.41 
1.31 
1.23 
1.16 


2.00 
1.97 
1.93 
1.90 
1.88 
1.70 
1.59 
1.52 
1.47 
1.43 
1.40 
1.36 
1.25 
1.15 
1.00 

Table 5 – Critical values for the Dixon test 

Test criteria 
Critical values 

m 
95% 
99% 

3 
0,970 
0,994 

Z(2) – Z(1) ou Z(H) – Z (H – 1) 
4 
0,829 
0,926 

Z(H) – Z(1) Z(H) – Z(1) 
5 
0,710 
0,821 

The greater of the two values 
6 
0,628 
0,740 

7 
0,569 
0,680 

8 
0,608 
0,717 

Z(2) – Z(1) ou Z(H) – Z (H – 1) 
9 
0,564 
0,672 

Z(H – 1) – Z(1) Z(H) – Z(2) 
10 
0,530 
0,635 

The greater of the two values 
11 
0,502 
0,605 

12 
0,479 
0,579 

13 
0,611 
0,697 

Z(3) – Z(1) ou Z(H) – Z (H – 2) 
14 
0,586 
0,670 

Z(H – 2) – Z(1) Z(H) – Z(3) 
15 
0,565 
0,647 

The greater of the two values 
16 
0,546 
0,627 

17 
0,529 
0,610 

18 
0,514 
0,594 

19 
0,501 
0,580 

20 
0,489 
0,567 

21 
0,478 
0,555 

22 
0,468 
0,544 

23 
0,459 
0,535 

24 
0,451 
0,526 

25 
0,443 
0,517 

26 
0,436 
0,510 

27 
0,429 
0,502 

28 
0,423 
0,495 

29 
0,417 
0,489 

30 
0,412 
0,483 

31 
0,407 
0,477 

32 
0,402 
0,472 

33 
0,397 
0,467 

34 
0,393 
0,462 

35 
0,388 
0,458 

36 
0,384 
0,454 

37 
0,381 
0,450 

38 
0,377 
0,446 

39 
0,374 
0,442 

40 
0,371 
0,438 

Table 6 – Results of the collaborative study
Analysis 
Sample 

Lab nº 
Individual values x_{1} 

1 
2 
3 
4 
5 
6 
7 
8 





1 
548 
556 
558 
553 
542 
5 
551 
6,47 
41,8 

2 
300 
299 
304 
308 
300 
5 
302 
3,83 
14,7 


3 
567 
558 
563 
532* 
560 
560 
563 
567 
7 
563 
3,51 
12,3 

4 
557 
550 
555 
560 
551 
5 
555 
4,16 
17,3 

5 
569 
575 
565 
560 
572 
5 
568 
5,89 
34,7 

6 
550 
546 
549 
557 
588 
570 
576 
568 
8 
563 
14,92 
222,6 


7 
557 
560 
560 
552 
547 
5 
555 
5,63 
31,7 

8 
548 
543 
560 
551 
548 
5 
550 
6,28 
39,5 

9 
558 
563 
551 
555 
560 
5 
556 
5,63 
31,7 

10 
554 
559 
551 
545 
557 
5 
553 
5,5 
30,2 

Statistical Figures: 
Bartlett Test:

Within laboratory: = 5.37 _{ } 
PB = 3.16 < 15.51 (95%; ƒ = 8) _{ } 
Between laboratory: = 13.97 ƒz = 7 
Analysis of variance: 
= 5.37 r = 15 s_{R} = 7.78 R = 22 
PF = 6.76 > 3.21 (99%; = 7; _{} = 34) 
Reliability of methods
OIVMAAS108 Reliability of analytical results
Data concerning the reliability of analytical methods, as determined by collaborative studies, are applicable in the following cases:
 Verifying the results obtained by a laboratory with a reference method
 Evaluating analytical results which indicate a legal limit has been exceeded
 Comparing results obtained by two or more laboratories and comparing those results with a reference value
 Evaluating results obtained from a nonvalidated method
 Verification of the acceptability of results obtained with a reference method
The validity of analytical results depends on the following:
 the laboratory should perform all analyses within the framework of an appropriate quality control system which includes the organization, responsibilities, procedures, etc.
 as part of the quality control system, the laboratory should operate according to an internal Quality Control Procedure
 results should be obtained in accordance with the acceptability criteria described in the internal Quality Control Procedure
Internal quality control shall be established in accordance with internationally recognized standards, such those of the IUPAC document titled, "Harmonized Guidelines for Internal Quality Control in Analytical Laboratories."
Internal Quality Control implies an analysis of the reference material.
Reference samples should consist of a template of the samples to be analyzed and should contain an appropriate, known concentration of the substance analyzed which is similar to that found in the sample.
To the extent possible, reference material shall be certified by an internationally recognized organization.
However, for many types of analysis, there are no certified reference materials. In this case, one could use, for example, material analyzed by several laboratories in a competence test and considering the average of the results to be the value assigned to the substance analyzed.
One could also prepare reference material by formulation (model solution with known components) or by adding a known quantity of the substance analyzed to a
sample which does not contain (or not yet contain) the substance by means of a recovery test (dosed addition) on one of the samples to analyze.
Quality Control is assured by adding reference material to each series of samples, and analyzing these pairs (test samples and reference material). This verifies correct implementation of the method and should be independent of the analytical calibration and protocol as its goal is to verify the aforementioned.
Series means a number of samples analyzed under repeatable conditions. Internal controls serve to ensure the appropriate level of uncertainty is not exceeded.
If the analytical results are considered to be part of a normal population whose mean is m and standard deviation is s, only around 0.3% of the results will be outside the limits m ± 3s. When aberrant results are obtained (outside these limits), the system is considered to be outside statistical control (unreliable data).
The control is graphically represented using Shewhart Control Graphs. To produce these graphical results, the measured values obtained from the reference material are placed on the vertical axis while the series numbers are placed on the horizontal axis. The graph also includes horizontal lines representing the mean, m, m ± 2 (warning limits) and m ± 3 (action limits) (Figure 1).
To estimate the standard deviation, a control should be analyzed, in pairs, in at least 12 trials. Each analytical pair shall be analyzed under repeatable conditions and randomly inserted in a sample series. Analyses will be duplicated on different days to reflect reasonable changes from one series to another. Variations can have several causes: modification of the reactants composition, instrument recalibration and even different operators. After eliminating aberrant data using the Grubbs test, calculate the standard deviation to construct the Shewhart graphs. This standard deviation is compared to that of the reference method. If a published precision level is not obtained for the reference method, caused should be investigated.
The precision limits of the laboratory should be periodically revised by repeating the indicated procedure.
Once the Quality Control graph is constructed, graph the results obtained from each series for the control material.
A series is considered outside statistical control if:
(I)a value is outside the action limit,
(II)the current and previous values are situated outside the attention limits even in within the action limits,
III) nine successive values lie on the same side of the mean.
The laboratory response to "outside control" conditions is to reject the results for the series and perform tests to determine the cause, then take action to remedy the situation.
A Shewhart Control Graph can also be produced for the differences between analytical pairs in the same sample, especially when reference material does not exist. In this case, the absolute difference between two analyses of the same sample is graphed. The graph's lower line is 0 and the attention limit is 1.128 while the action limit is 3.686Sw where = the standard deviation of a series.
This type of graph only accounts for repeatability. It should be no greater than the published repeatability limit for the method.
In the absence of control material, it sometimes becomes necessary to verify that the reproducibility limit of the reference method is not exceeded by comparing the results obtained to those of obtained by an experimental laboratory using the same sample.
Each laboratory performs two tests and the following formula is used:

= Critical difference (P=0,95)
= Means of 2 results obtained by lab 1
= Means of 2 results obtained by lab 2
R = Reproducibility of reference method
r = Repeatability of reference method
If the critical difference has been exceeded, the underlying reason is to be found and the test is to be repeated within one month.
 Evaluation of analytic results indicating that a legal limit has been exceeded
When analytical results indicated that a legal limit has been exceeded, the following procedure should be followed:
In the case of an individual result, conduct a second test under repeatable conditions. If it is not possible to conduct a second test under repeatable conditions, conduct a double analysis under repeatable conditions and use these data to evaluate the critical difference.
Determine the absolute value of the difference between the mean of the results obtained under repeatable conditions and the legal limit. An absolute value of the difference which is greater than the critical distance indicates that the sample does not fit the specifications.

Critical difference is calculated by the formula:Mean of results obtained
= Limit
n=Number of analyses
R=reproducibility
r=repeatability
In other words, this is a maximal limit where the average of the results obtained should not be greater than:

If the limit is a minimum, the average of the results obtained should not be less than:

 Comparing results obtained using two or more laboratories and comparing these results to a reference value

To determine whether or not data originating in two laboratories are in agreement, calculate the absolute difference between the two results and compare to the critical difference:= Mean of 2 results obtained by lab 1
=Mean of 2 results obtained by lab 2
= number of analyses in lab 1 sample
=number of analyses in lab 2 sample
R=Reproducibility of reference method
r=Repeatability of reference method
If the result is the average of two tests, the equation can be simplified to:

If the data are individual results, the critical difference is R.
If the critical difference is not exceeded, the conclusion is that the results of the two laboratories are in agreement.
Comparing results obtained by several laboratories with a reference value:
Suppose p laboratories have made n1 determinations, whose mean for each laboratory is y_{1} and whose total mean is:

The mean of all laboratories is compared with the reference value. If the absolute difference exceeds the critical difference, as calculated using the following formula, we conclude the results are not in agreement with the reference value:
) 
=Critical difference, calculated as indicated in point 2, for the reference method.
For example, the reference value can be the value assigned to a reference material or the
value obtained by the same laboratory or by a different laboratory with a different method.
 Evaluating analytical results obtained using nonvalitated methods
A provisional reproducibility value can be assigned to a nonvalidated method by comparing it to that of a second laboratory:

= Mean of 2 results obtained by lab 1
= Mean of 2 results obtained by lab 2
r = Repeatability of reference method
Provisional reproducibility can be used to calculate critical difference.
If provisional reproducibility is less than twice the value of repeatability, it should be set to 2r.
A reproducibility value greater than three times repeatability or twice the value calculated using the Horwitz equation is not acceptable.
Horwitz equation:

_{}%=Standard deviation for reproducibility(expressed as a percentage of the mean)
C= concentration, expressed as a decimal fraction (for example, 10g/100g = 0.1)
This equation was empirically obtained from more than 3000 collaborative studies including a diverse group of analyzed substances, matrices and measurement techniques. In the absence of other information, RSD_{R} values that are lower or equal to the RSD_{R} values calculated using the Horwitz equation can be considered acceptable.
_{} values calculated by the Horwitz equation:
Concentration 
_{}% 
10^{9} 
45 
10^{8} 
32 
10^{7} 
23 
10^{6} 
16 
10^{5} 
11 
10^{4} 
8 
10^{3} 
5,6 
10^{2} 
4 
10^{1} 
2,8 
1 
2 
If the result obtained using a nonvalidated method is close to the limit specified by legislation, the decision on the limit shall be decided as follows (for upper limits):

and, for lower limits,

S = decision limit
= legal limit
= provisional reproducibility for nonvalidated method
=reproducibility for reference method
= critical difference, calculated as indicated in point 2, for the reference method
The result which exceeds the decision limit should be replaced with a final result obtained using the reference method.
Critical differences for probability levels other than 95%
This difference can be determined by multiplying the critical differences at the 95% level by the coefficients shown in Table 1.
Table 1  Multiplicative coefficients allowing
the calculation of critical differences for
probability levels other than 95%
Probability level P 
Multiplicative coefficient 
90 
0,82 
95 
1,00 
98 
1,16 
99 
1,29 
99,5 
1,40 
Shewhart control graph 

Bibliography
 "Harmonized Guidelines for Internal Quality Control in Analytical Chemistry Laboratories". IUPAC. Pure and App. Chem. Vol 67, nº 4, 649666, 1995
 "Shewhart Control Charts" ISO 8258. 1991.
 "Precision of test methods  Determination of repeatability and reproducibility for a standard test method by interlaboratory tests". ISO 5725, 1994.
 "Draft Commission Regulation of establishing rules for the application of reference and routine methods for the analysis and quality evaluation of milk and milk products". Commission of the European Communities, 1995.
 "Harmonized protocols for the adoption of standardized analytical methods and for the presentation of their performance characteristics". IUPAC. Pure an App. Chem., Vol. 62, nº 1, 149162. 1990.
Protocol for the design, conducts and interpretation of collaborative studies
OIVMAAS109 Protocol for the designe, conducts and interpretation of collaborative studies
Introduction
After a number of meetings and workshops, a group of representatives from 27 organizations adopted by consensus a "Protocol for the design, conducts and interpretation of collaborative studies" which was published in Pure & Appl. Chem. 60, 855864, 1995. A number of organizations have accepted and used this protocol. As a result of their experience and the recommendations of the Codex Committee on Methods of Analysis and Sampling (Joint FAO/WHO Food Standards Programme, Report of the Eighteenth Session, 913 November, 1992; FAO, Rome Italy, ALINORM 93/23, Sections 3439), three minor revisions were recommended for incorporation into the original protocol. These are: (1) Delete the double split level design because the interaction term it generates depends upon the choice of levels and if it is statistically significant, the interaction cannot be physically interpreted. (2) Amplify the definition of "material". (3) Change the outlier removal criterion from 1% to 2.5%.
The revised protocol incorporating the changes is reproduced below. Some minor editorial revisions to improve readability have also been made. The vocabulary and definitions of the document 'Nomenclature of Interlaboratory Studies (Recommendations 1994)' [published in Pure Appl Chem., 66, 19031911 (1994)] has been incorporated into this revision, as well as utilizing, as far as possible, the appropriate terms of the International Organization for Standardization (ISO), modified to be applicable to analytical chemistry.
Protocol
 Preliminary work
Methodperformance (collaborative) studies require considerable effort and should be conducted only on methods that have received adequate prior testing. Such withinlaboratory testing should include, as applicable, information on the following:
1.1. Preliminary estimates of precision
Estimates of the total withinlaboratory standard deviation of the analytical results over the concentration range of interest as a minimum at the upper and lower limits
of the concentration range, with particular emphasis on any standard or specification value.
Note 1: The total withinlaboratory standard deviation is a more inclusive measure of imprecision that the ISO repeatability standard deviation, §3.3 below. This standard deviation is the largest of the withinlaboratory type precision variables to be expected from the performance of a method; it includes at least variability from different days and preferably from different calibration curves. It includes betweenrun (betweenbatch) as well as withinrun (withinbatch) variations. In this respect it can be considered as a measure of withinlaboratory reproducibility. Unless this value is well within acceptable limits, it cannot be expected that the betweenlaboratory standard deviation (reproducibility standard deviation) will be any better. This precision term is not estimated from the minimum study described in this protocol.
NOTE 2: The total withinlaboratory standard deviation may also be estimated from ruggedness trials that indicate how tightly controlled the experimental factors must be and what their permissible ranges are. These experimentally determined ranges should be incorporated into the description of the method.
1.2. Systematic error (bias)
Estimates of the systematic error of the analytical results over the concentration range and in the substances of interest, as a minimum at the upper and lower limits of the concentration range, with particular emphasis on any standard or specification value.
The results obtained by applying the method to relevant reference materials should be noted.
1.3. Recoveries
The recoveries of "spikes" added to real materials and to extracts, digests, or other treated solutions thereof.
1.4. Applicability
The ability of the method to identify and measure the physical and chemical forms of the analyte likely to be present in the materials, with due regard to matrix effects.
1.5. Interference
The effect of other constituents that are likely to be present at appreciable concentrations in matrices of interest and which may interfere in the determination.
1.6. Method comparison
The results of comparison of the application of the method with existing tested methods intended for similar purposes.
1.7. Calibration Procedures
The procedures specified for calibration and for blank correction must not introduce important bias into the results.
1.8. Method description
The method must be clearly and unambiguously written.
1.9. Significant figures
The initiating laboratory should indicate the number of significant figures to be reported, based on the output of the measuring instrument.
Note: In making statistical calculations from the reported data, the full power of the calculator or computer is to be used with no rounding or truncating until the final reported mean and standard deviations are achieved. At this point the standard deviations are rounded to 2 significant figures and the means and related standard deviations are rounded to accommodate the significant figures of the standard deviation. For example, if = 0.012, c is reported as 0.147, not as 0. 1473 or 0. 15, and RSD_{R} is reported as 8.2%. (Symbols are defined in Appendix L) If standard deviation calculations must be conducted manually in steps, with the transfer of intermediate results, the number of significant figures to be retained for squared numbers should be at least 2 times the number of figures in the data plus 1.
 Design of the methodperformance study
2.1. Number of materials
For a single type of substance, at least 5 materials (test samples) must be used; only when a single level specification is involved for a single matrix may this minimum required number of materials to be reduced to 3. For this design parameter, the two portions of a split level and the two individual portions of blind replicates per laboratory are considered as a single material.
Note 1: A material is an 'analyte/matrix/concentration' combination to which the methodperformance parameters apply. This parameter determines the applicability of a method. For application to a number of different substances, a sufficient number of matrices and levels should be chosen to include potential interferences and the concentration of typical use.
Note 2: The 2 or more test samples of blind or open replicates statistically, are a single material (they are not independent).
NOTE 3: A single split level (Youden pair) statistically analyzed as a pair is a single material; if analyzed statistically and reported as single test samples, they are 2 materials. In addition, the pair can be used to calculate the withinlaboratory standard deviation, as
(for duplicates, blind or open 
(for duplicates, blind or open 
where , the difference between the 2 individual values from the split level for each laboratory and n is the number of laboratories. In this special case, , the among laboratories standard deviation, is merely the average of the two values calculated from the individual components of the split level, and it is used only as a check of the calculations.
Note 4: The blank or negative control may be a material or not depending on the usual purpose of the analysis. For example, in trace analysis, where very low levels (near the limit of quantitation) are often sought, the blanks are considered as materials and are necessary to determine certain 'limits of measurement.' However, if the blank is merely a procedural control in macro analysis (e.g., fat in cheese), it would not be considered a material.
2.2. Number of laboratories
At least 8 laboratories must report results for each material; only when it is impossible to obtain this number (e.g., very expensive instrumentation or specialized laboratories required) may the study be conducted with less, but with an absolute minimum of 5 laboratories. If the study is intended for international use, laboratories from different countries should participate. In the case of methods requiring the use of specialized instruments, the study might include the entire population of available laboratories. In such cases, "n" is used in the denominator for calculating the standard deviation instead of "(n  1)". Subsequent entrants to the field should demonstrate the ability to perform as well as the original participant.
2.3. Number of Replicates
The repeatability precision parameters must be estimated by using one of the following sets of designs (listed in approximate order of desirability):
2.3.1. Split Level
For each level that is split and which constitutes only a single material for purposes of design and statistical analysis, use 2 nearly identical test samples that differ only slightly in analyte concentration (e.g., <15%). Each laboratory must analyse each test sample once and only once.
Note: The statistical criterion that must be met for a pair of test samples to constitute a split level is that the reproducibility standard deviation of the two parts of the single split level must be equal.
2.3.2. Combination blind replicates and split level
Use split levels for some materials and blind replicates for other materials in the same study (single values from each submitted test sample).
2.3.3. Blind replicates
For each material, use blind identical replicates, when data censoring is impossible (e.g., automatic input, calculation, and printout) nonblind identical replicates may be used.
2.3.4. Known replicates
For each material, use known replicates (2 or more analyses of test portions from the same test sample), but only when it is not practical to use one of the preceding designs.
2.3.5. Independent analyses
Use only a single test portion from each material (i.e., do not perform multiple analyses) in the study, but rectify the inability to calculate repeatability parameters by quality control parameters or other withinlaboratory data obtained independently of the methodperformance study.
 Statistical analysis (See Flowchart, A.4. 1)
For the statistical analysis of the data, the required statistical procedures listed below must be performed and the results reported. Supplemental, additional procedures are not precluded.
3.1. Valid data
Only valid data should be reported and subjected to statistical treatment. Valid data are those data that would be reported as resulting from the normal performance of laboratory analyses; they are not marred by method deviations, instrument malfunctions, unexpected occurrences during performance, or by clerical, typographical and arithmetical errors.
3.2. Oneway analysis of variance
Oneway analysis of variance and outlier treatments must be applied separately to each material (test sample) to estimate the components of variance and repeatability and reproducibility parameters.
3.3. Initial estimation
Calculate the mean, c (= the average of laboratory averages), repeatability relative standard deviation, and reproducibility relative standard deviation, RSD_{R} with no outliers removed, but using only valid data.
3.4. Outlier treatment
The estimated precision parameters that must also be reported are based on the initial valid data purged of all outliers flagged by the harmonized 1994 outlier removal procedure. This procedure essentially consists of sequential application of the Cochran and Grubbs tests (at 2.5% probability (P) level, 1tail for Cochran and 2tail for Grubbs) until no further outliers are flagged or until a drop of 22.2% (= 219) in the original number of laboratories providing valid data would occur.
Note: Prompt consultation with a laboratory reporting suspect values may result in correction of mistakes or discovering conditions that lead to invalid data, 3.1.
Recognizing mistakes and invalid data per se is much preferred to relying upon statistical tests to remove deviate values.
3.4.1. Cochran test
First apply Cochran outlier test (1tail test a P = 2.5%) and remove any laboratory whose critical value exceeds the tabular value given in the tale, Appendix A.3. 1, for the number of laboratories and replicates involved.
3.4.2. Grubbs tests
Apply the single value Grubbs test (2 tail) and remove any outlying laboratory. If no laboratory is flagged, then apply the pair value tests (2 tail)  2 at the same end and 1 value at each end, P = 2.5% overall. Remove any laboratory(ies) flagged by these tests whose critical value exceeds the tabular value given in the appropriate column of the table Appendix A.3.3. Stop removal when the next application of the test will flag as table, A outliers more that 22.2% (2 of 9) of the laboratories.
Note: The Grubbs tests are to be applied one material at a time to the set of replicate means from all laboratories, and not to the individual values from replicated designs because the distribution of all the values taken together is multimodal, not Caussian, i.e., their differences from the overall mean for that material are not independent.
3.4.3. Final estimation
Recalculate the parameters as in §3.3 after the laboratories flagged by the preceding procedure have been removed. If no outliers were removed by the CochranGrubbs sequence, terminate testing. Otherwise, reapply the CochranGrubbs sequence to the data purged of the flagged outliers until no further outliers are flagged or until more than a total of 22.2% (2 of 9 laboratories) would be removed in the next cycle. See flowchart A.3.4.
 Final report
The final report should be published and should include all valid data. Other information and parameters should be reported in a format similar (with respect to the reported items) to the following, as applicable:
[x] Methodperformance tests carried out at the international level in [year(s)] by [organisation] in which [y and z] laboratories participated, each performing [k] replicates, gave the following statistical results:
Table of method Performance parameters
Analyte; Results expressed in [units]
Material [Description and listed in columns across top of table in increasing order of magnitude of means]
Number of laboratories retained after eliminating outliers
Number of outlying laboratories
Code (or designation) of outlying laboratories
Number of accepted results
Mean
True or accepted value, if known
Repeatability standard deviation (S_{r})
Repeatability relative standard deviation (RSD_{R})
Repeatability limit, r (2.8 x S_{r})
Reproducibility standard deviation (S_{R})
Reproducibility relative standard deviation (RSD_{R})
Reproducibility limit, R (2.8 X S_{R})
4.1. Symbols
A set of symbols for use in reports and publications is attached as Appendix 1 (A.1.).
4.2. Definitions
A set of definitions for use in study reports and publications is attached as Appendix 2 (A.2.).
4.3. Miscellaneous
4.3.1. Recovery
Recovery of added analyte as a control on method or laboratory bias should be calculated as follows:
[Marginal] Recovery, %=
(Total analyte found  analyte originally present) x 100/(analyte added)
Although the analyte may be expressed as either concentration or amount, the units must be the same throughout. When the quantity of analyte is determined by analysis, it must be determined in the same way throughout.
Analytical results should be reported uncorrected for recovery. Report recoveries separately.
4.3.2. When , is negative
By definition, is greater than or equal to _{ } in methodperformance studies; occasionally the estimate of is greater than the estimate of (the average of the replicates is greater than the range of laboratory averages and the calculated is then negative). When this occurs, set = 0 and = .
 References
 Horwitz, W. (1988) Protocol for the design, conduct, and interpretation of method performance studies. Pure & Appl. Chem. 60, 855864.
 Pocklington, W.D. (1990) Harmonized protocol for the adoption of standardized analytical methods and for the presentation of their performance characteristics. Pure and Appl. Chem. 62, 149162.
 International Organization for Standardization. International Standard 57251986. Under revision in 6 parts; individual parts may be available from National Standards member bodies.
Appendices
Appendix 1.  Symbols
Use the following set of symbols and terms for designating parameters developed by a methodperformance study.
Mean (of laboratory averages): x
Standard deviations:s (estimates)
 Repeatability:
 'Pure' betweenlaboratory:
 Reproducibility;
Variances: (with subscripts, r, L, and R)

Relative standard deviations: RSD (with subscripts, r, L, and r)
Maximum tolerable differences
(as defined by ISO 57251986);
See A.2.4 and A.2.5)
Repeatability limitr = (2.8 x )
Reproducibility limit R = (2.8 X )
Number of replicates per laboratory :k (general)
Average number of replicates per laboratory i:k (for a balanced design)
Number of laboratories :L
Number of materials (test samples): m
Total number of values in a given assay: n (= kL for a balanced design)
Total number of values in a given study: N (= kLm for an overall balanced design)
____________________
If other symbols are used, their relationship to the recommended symbols should be explained fully.
Appendix 2.  Definitions
Use the following definitions. The first three definitions utilize the 1UPAC document "Nomenclature of Interlaboratory Studies" (approved for publication 1994). The next two definitions are assembled from components given in ISO 35341:1993. All test results are assumed to be independent, i.e., 'obtained in a manner not influenced by any previous result on the same or similar test object. Quantitative measures of precision depend critically on the stipulated conditions. Repeatability and reproducibility conditions are particular sets of extreme stipulated conditions.'
 A.2.1 Methodperformance studies
An interlaboratory study in which all laboratories follow the same written protocol and use the same test method to measure a quantity in sets of identical test items [test samples, materials]. The reported results are used to estimate the performance characteristics of the method. Usually these characteristics are withinlaboratory and amonglaboratories precision, and when necessary and possible, other pertinent characteristics such as systematic error, recovery, internal quality control parameters, sensitivity, limit of determination, and applicability.
 A.2.2 Laboratoryperformance study
An interlaboratory study that consists of one or more analyses or measurements by a group of laboratories on one or more homogeneous, stable test items, by the method selected or used by each laboratory. The reported results are compared with those of other laboratories or with the known or assigned reference value, usually with the objective of evaluating or improving laboratory performance.
 A.2.3 Material certification stud
An interlaboratory study that assigns a reference value ('true value') to a quantity (concentration or property) in the test item, usually with a stated uncertainty.
 A.2.4 Repeatability limit (r)
When the mean of the values obtained from two single determinations with the same method on identical test items in the same laboratory by the same operator using the same equipment within short intervals of time, lies within the range of the mean values cited in the Final Report, 4.0, the absolute difference between the two test results obtained should be less than or equal to the repeatability limit (r) [= 2.8 x s,) that can generally be inferred by linear interpolation of from the Report.
Note: This definition, and the corresponding definition for reproducibility limit, has been assembled from five cascading terms and expanded to permit application by interpolation to a test item whose mean is not the same as that used to establish the original parameters, which is the usual case in applying these definitions. The term 'repeatability [and reproducibility] limit' is applied specifically to a probability of 95% and is taken as 2.8 x s, [or SRI. The general term for this statistical concept applied to any measure of location (e.g., median) and with other probabilities (e.g., 99%) is "repeatability [and reproducibility] critical difference".
 A.2.5 Reproducibility limit (R)
When the mean of the values obtained from two single determinations with the same method on identical test items in different laboratories with different operators using different equipment, lies within the range of the mean values cited in the Final Report, 4.0, the absolute difference between the two test results obtained should be less than or equal to the reproducibility limit (R) [= 2.8 x ] that can generally be inferred by linear interpolation of from the Report.
Note 1: When the results of the interlaboratory test make it possible, the value of r and R can be indicated as a relative value (e.g., as a percentage of the determined mean value) as an alternative to the absolute value.
Note 2: When the final reported result in the study is an average derived from more than a single value, i.e., k is greater than 1, the value for R must be adjusted according to the following formula before using R to compare the results of a single routine analyses between two laboratories.

Similar adjustments must be made for replicate results constituting the final values for and , if these will be the reported parameters used for quality control purposes.
Note 3: The repeatability limit, r, may be interpreted as the amount within which two determinations should agree with each other within a laboratory 95% of the time. The reproducibility limit, R, may be interpreted as the amount within which two separate determinations conducted in different laboratories should agree with each other 95% of the time.
Note 4: Estimates Of can be obtained only from a planned, organized method performance study; estimates of can be obtained from routine work within a laboratory by use of control charts. For occasional analyses, in the absence of control charts, withinlaboratory precision may be approximated as one half S_{R} (Pure and Appl. Chem., 62, 149162 (1990) , Sec. L3, Note.).
 A.2.6 Oneway analysis of variance
Oneway analysis of variance is the statistical procedure for obtaining the estimates of within laboratory and betweenlaboratory variability on a materialbymaterial basis. Examples of the calculations for the single level and singlesplitlevel designs can be found in ISO 57251986.
Appendix 3. – Critical values
 A.3.1 Critical values for the Cochran maximum variance ratio at the 2.5% (1 tail) rejection level, expressed as the percentage the highest variance is of the total variance; r = number of replicates.
No of labs 
r=2 
r=3 
r=4 
r=5 
=6 
4 
94.3 
81.0 
72.5 
65.4 
62.5 
5 
88.6 
72.6 
64.6 
58.1 
53.9 
6 
83.2 
65.8 
58.3 
52.2 
47.3 
7 
78.2 
60.2 
52.2 
47.3 
42.3 
8 
73.6 
55.6 
47.4 
43.0 
38.5 
9 
69.3 
51.8 
43.3 
39.3 
35.3 
10 
65.5 
48.6 
39.9 
36.2 
32.6 
11 
62.2 
45.8 
37.2 
33.6 
30.3 
12 
59.2 
43.1 
35.0 
31.3 
28.3 
13 
56.4 
40.5 
33.2 
29.2 
26.5 
14 
53.8 
38.3 
31.5 
27.3 
25.0 
15 
51.5 
36.4 
29.9 
25.7 
23.7 
16 
49.5 
34.7 
28.4 
24.4 
22.0 
17 
47.8 
33.2 
27.1 
23.3 
21.2 
18 
46.0 
31.8 
25.9 
22.4 
20.4 
19 
44.3 
30.5 
24.8 
21.5 
19.5 
20 
42.8 
29.3 
23.8 
20.7 
18.7 
21 
41.5 
28.2 
22.9 
19.9 
18.0 
22 
40.3 
27.2 
22.0 
19.2 
17.3 
23 
39.1 
26.3 
21.2 
18.5 
16.6 
24 
37.9 
25.5 
20.5 
17.8 
16.0 
25 
36.7 
24.8 
19.9 
17.2 
15.5 
26 
35.5 
24.1 
19.3 
16.6 
15.0 
27 
34.5 
23.4 
18.7 
16.1 
14.5 
28 
33.7 
22.7 
18.1 
15.7 
14.1 
29 
33.1 
22.1 
17.5 
15.3 
13.7 
30 
32.5 
21.6 
16.9 
14.9 
13.3 
35 
29.3 
19.5 
15.3 
12.9 
11.6 
40 
26.0 
17.0 
13.5 
11.6 
10.2 
50 
21.6 
14.3 
11.4 
9.7 
8.6 
Tables A.3.1 and A.3.3 were calculated by R. Albert (October, 1993) by computer simulation involving several runs of approximately 7000 cycles each for each value, and then smoothed. Although Table A.3.1 is strictly applicable only to a balanced design (same number of replicates from all laboratories), it can be applied to an unbalanced design without too much error, if there are only a few deviations.
 A.3.2 Calculation of Cochran maximum variance outlier ratio
Compute the withinlaboratory variance for each laboratory and divide the largest of these variances by the sum of the all of the variances and multiply by 100. The resulting quotient is the Cochran statistic which indicates the presence of a removable outlier if this quotient exceed the critical value listed above in the Cochran table for the number of replicates and laboratories specified.
 A.3.3 Critical values for the Grubbs extreme deviation outlier tests at the 2.5% (2tail), 1.25% (1tail) rejection level, expressed as the percent reduction in standard deviations caused by the removal of the suspect value(s).
No. of labs 
One highest or lowest 
Two highest or two lowest 
One highest and one lowest 
4 
86.1 
98.9 
99.1 
5 
73.5 
90.9 
92.7 
6 
64.0 
81.3 
84.0 
7 
57.0 
73.1 
76.2 
8 
51.4 
66.5 
69.6 
9 
46.8 
61.0 
64.1 
10 
42.8 
56.4 
59.5 
11 
39.3 
52.5 
55.5 
12 
36.3 
49.1 
52.1 
13 
33.8 
46.1 
49.1 
14 
31.7 
43.5 
46.5 
15 
29.9 
41.2 
44.1 
16 
28.3 
39.2 
42.0 
17 
26.9 
37.4 
40.1 
18 
25.7 
35.9 
38.4 
19 
24.6 
34.5 
36.9 
20 
23.6 
33.2 
35.4 
21 
22.7 
31.9 
34.0 
22 
21.9 
30.7 
32.8 
23 
21.2 
29.7 
31.8 
24 
20.5 
28.8 
30.8 
25 
19.8 
28.0 
29.8 
26 
19.1 
27.1 
28.9 
27 
18.4 
26.2 
28.1 
28 
17.8 
25.4 
27.3 
29 
17.4 
24.7 
26.6 
30 
17.1 
24.1 
26.0 
40 
13.3 
19.1 
20.5 
50 
11.1 
16.2 
17.3 
 A.3.4 Calculation of the Grubbs test values
To calculate the single Chubbs test statistic, compute the average for each laboratory and then calculate the standard deviation (M) of these L averages (designate as the original s). Calculate the SD of the set of averages with the highest average removed (SH); calculate the SD of the set of averages with the lowest average removed (SL). The calculate the percentage decrease in SD for both as follows:
 100 x [ 1  (sL/s] and 100 x [ 1  (sH/s)].
The higher of these two percentage decreases is the singe Grubbs test statistic, which signal the presence of an outlier to be omitted at the P = 2.5% level, 2tail, if it exceeds the critical value listed in the single value column, Column 2, of Table A.3.3 , for the number of laboratory averages used to calculate the original s.
To calculate the paired Grubbs test statistics, calculate the percentage decrease in standard deviation obtained by dropping the two highest averages and also by dropping the two lowest averages, as above. Compare the higher of the percentage changes in standard deviation with the tabular values in column 3 and proceed with (1) or (2): (1) If the tabular value is exceeded, remove the responsible pair. Repeat the cycle again, starting at the beginning with the Cochran extreme variance test again, the Grubbs extreme value test, and the paired Grubbs extreme value test. (2) If no further values are removed, then calculate the percentage change in standard deviation obtained by dropping both the highest extreme value and the lowest extreme value together, and compare with the tabular values in the last column of A.3.3. If the tabular value is exceeded, remove the highlow pair of averages, and start the cycle again with the Cochran test until no further values are removed. In all cases, stop outlier testing when more than 22.2% (2/9) of the averages are removed.
Appendix 4
 A.4.1. Flowchart for outlier removal
