Submit manuscript...
eISSN: 2378-315X

Biometrics & Biostatistics International Journal

Research Article Volume 7 Issue 4

Some combinatorial structures in experimental design: overview, statistical models and applications

Petya Valcheva,1 Teresa A Oliveira2

1Department of Probability, Sofia University, Bulgaria
2Departmento de Ci

Correspondence: Petya Valcheva, Department of Probability, Operations research and Statistics, Sofia University ?St. Kliment Ohridski?, Student?s Town building 55, entrance V, Bulgaria

Received: July 02, 2018 | Published: August 10, 2018

Citation: Valcheva P, Oliveira TA. Some combinatorial structures in experimental design: overview, statistical models and applications. Biom Biostat Int J. 2018;7(4):346-351. DOI: 10.15406/bbij.2018.07.00228

Download PDF

Abstract

Background: Design and analysis of experiments will become much more prevalent simultaneously in scientific, academic and applied aspects over the next few years. Combinatorial designs are touted as the most important structures in this field taking into account their desirable features from statistical perspective.1,2 The applicability of such designs is widely spread in areas such as biostatistics, biometry, medicine, information technologies and many others. Usually, the most significant and vital objective of the experimenter is to maximize the profit and respectively to minimize the expenses and moreover the timing under which the experiment take place. This necessity emphasizes the importance of the more efficient mathematical and statistical methods in order to improve the quality of the analysis.

We review combinatorial structures,3 in particular balanced incomplete block design (BIBD)4–6 and Latin squares designs (LSD),7–9 which were first introduced by R.A Fisher and et al. in 1925, who developed the basic statistical theory of such designs.

We propose general framework, using the mathematical structures in Experimental design, to demonstrate those combinatorial designs which sometimes can be easily constructed by dint of computer tools.10 Applications on Biostatistics and Biometry fields are illustrated, namely an example dealing with the comparison of pharmacological substances in terms of reaction time in a bio-statistical experiment and another one dealing with comparisons of clinical effects of a new medical product. Simulations and statistical analysis are presented using R Studio and the variety of built packages related to Design of Experiment.11,12

Keywords: balanced incomplete block design, design of experiments, latin square, r statistics, biostatistics, biometry

Introduction

Design of experiments (DOE) is an important branch of applied statistics that deals with planning, conducting of the experiment, analyzing and interpreting final results. It combines mathematical and statistical tools, which aim at constructing optimal designs to be tested. Due to the widely application during recent decades, this science is strongly spread in many areas such as optimization, process quality control as well as product performance prediction.

The historical notes highlight that some of the most remarkable and progressive contributions of statistics in the twentieth century have been those in Experimental design. British statistician and geneticist Sir R.A. Fisher first laid the foundations in this area, between 1918 and 1940, as a result of different applications and simulations in agricultural experiments. Most of his early publications have emphasized the fact that profound conclusions could be drawn efficiently from fluctuations of nuisance variable such as fertilizers, temperature and other natural conditions. Similar methods have been successfully applied to variety of areas in order to investigate the effects of many different factors by changing them at one time instead of changing one factor at a time.

Next significant period, also known as "The First Industrial Era", occurred as a result of the application of experimental designs in chemical industry. It was elaborated in the 1950s till late nineties by the extensive work of G.Box and B.Wilson on the famous Response Surface Methodology (RSM) that explores the relationships between several explanatory variables and one or more response variables. Over the past years there has been a tremendous increase in the exploitation similar experimental techniques in optimization processes and industry. This is due largely to the increased emphasis on quality improvement and the essential role played by statistical methods used in DOE. "The Second Industrial Era" was conceived in late 1970s after the exhaustive work of the Japanese quality consultant Genichi Taguchi. His Robust Design method (RDM) was the leading approach in quality improvement methods focused on response surfaces associated with both mean and variation reduction, and to choose the noise factor settings, so that both variability and bias are made simultaneously small.

Experimental design techniques are effective and powerful methods that are also becoming popular in the area of computer-aided design and engineering using computer simulation models. Some basic properties as maximizing the amount of information while minimizing the amount of the collected data have had revolutionary impact among scientists. This fact allows us to lay the foundations of the "Modern Era", beginning circa 1990, when the design techniques have been also becoming popular in different sectors of economy.

We perform description of two combinatorial structures, namely Balanced incomplete block design and Latin square design, and demonstrate its application in statistical analysis. Practical methods for analyzing data from life testing will be provided for each design. We focus on planning experiments efficiently and how to create statistical analysis with the aid of R packages for experimental design.

Generating balanced incomplete block designs

Block designs arise in experimental design as fundamental units for testing too many varieties in an experiment. Such constructions can efficiently provide information in cases when treatments are included in blocks, because they are expensive or he testing time should be minimized. However, sometimes blocks or experiment's budget may not be large enough to allow all desirable treatments to be executed in all blocks. The incomplete block designs refer to the condition when each block has less than a full complement of treatments. But the most intensely studied are the balanced incomplete block designs (BIBDs or 2-designs), in which all treatment effects and their differences are estimated with the same precision as long as every pair of treatments occurs together the same number of times. The statistical analysis of such designs is considerably more complicated, although they are used in cases having one source of variation.

A Balanced incomplete block design with parameters (v,b,r,k,λ) is an ordered pair (V,B), where V is a finite v-element set of treatments or varieties, B is a family of k-element subset of V, called blocks such that satisfy the following conditions:
(i) Each block contains exactly k members
(ii) Every treatment is contained in exactly r blocks (or is replicated r times)
(iii) Every 2-subset of V (pair of treatments) is contained in the same block exactly λ times.

BIBD(v,b,r,k,λ) is an arrangement of b subsets of size k from a set of v treatments, such that (i), (ii) and (iii) are satisfied. The parameter λ must be an integer. The necessary, but not sufficient conditions for the existence of a BIBD are:

vr = bk λ( v1 )=r( k1 ) bv MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaWG2bGaamOCaiaacckacqGH9aqpcaGGGcGaamOyaiaadUga caGGGcGaeq4UdWMcdaqadaWdaeaajugib8qacaWG2bGaeyOeI0IaaG ymaaGccaGLOaGaayzkaaqcLbsacqGH9aqpcaWGYbGcdaqadaWdaeaa jugib8qacaWGRbGaeyOeI0IaaGymaaGccaGLOaGaayzkaaqcLbsaca GGGcGaamOyaiabgwMiZkaadAhaaaa@5482@

If v=b, the BIBD is said to be a symmetric.
There are different packages in R for creating and analyzing experimental designs for research purposes. The package “crossdes” generate cross-over designs of various types, including Latin squares and BIBD. The build-in function “find.BIB” gives rise to design with desired parameters, where number of rows corresponds to the blocks and columns - the number of elements per block. The R output gives the following result: (Figure 1).

Figure 1 Fano plane.

 The resulting design can be verified concerning balanced manner via “isGYD” function. The conclusion is shown below:

  1. isGYD (find.BIB(7,7,3))
  2. The design is a balanced incomplete block design w.r.t. rows.

There are also other packages in R, which can be used for generating block designs. For example “ibd”, “AlgDesign” and “dae”.

The above design, BIBD(7,7,3,3,1), which is symmetric (v=b=7), corresponds to the Steiner triple system of order 7(ST S(7)) consists of a set V of 7 points, and a collection B of subsets of V called triples, such that each block contains exactly 3 points, and any two points lie together in exactly one block. This system has cyclic representation: let the set V={0,1,...,6} be the integers mod 7 and the triples are the set {1,2,4} of quadratic residues mod 7 and its cyclic shifts. The system is also known as the projective plane of order 2, or the Fano plane, which has the smallest possible number of points and lines - 3 points on every line and 3 lines through every point.

Statistical analysis of numerical example

Balanced incomplete block designs are typically used when all comparisons are equally important for the experiment, but the researcher is not able to run all possible combinations. In such cases the treatments that are used in each block should be selected in balanced manner, i.e. any pair occurs together in the same number of times as any other pair.6
Consider a BIBD(v,b,r,k,λ) that satisfies conditions (i), (ii) and (iii).
The statistical model of the design is

y ij =μ+ τ i + β j + ε ij  i=1,,v j=1,,b MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaape GaaeyEa8aadaWgaaWcbaWdbiaabMgacaqGQbaapaqabaGcpeGaeyyp a0JaaeiVdiabgUcaRiaabs8apaWaaSbaaSqaa8qacaqGPbaapaqaba GcpeGaey4kaSIaaeOSd8aadaWgaaWcbaWdbiaabQgaa8aabeaak8qa cqGHRaWkcaqG1oWdamaaBaaaleaapeGaaeyAaiaabQgaa8aabeaak8 qacaGGGcGaamyAaiabg2da9iaaigdacaGGSaGaeyOjGWRaaiilaiaa dAhacaGGGcGaamOAaiabg2da9iaaigdacaGGSaGaeyOjGWRaaiilai aadkgaaaa@58B3@

where

y ij  is the  i th  observation in the  j th  block MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaqG5bGcpaWaaSbaaKqaGeaajugWa8qacaqGPbGaaeOAaaWc paqabaqcLbsapeGaaeiOaiaabMgacaqGZbGaaeiOaiaabshacaqGOb GaaeyzaiaabckacaWGPbWcpaWaaWbaaKqaGeqabaqcLbmapeGaamiD aiaadIgaaaqcLbsacaqGGcGaae4BaiaabkgacaqGZbGaaeyzaiaabk hacaqG2bGaaeyyaiaabshacaqGPbGaae4Baiaab6gacaqGGcGaaeyA aiaab6gacaqGGcGaaeiDaiaabIgacaqGLbGaaeiOaiaadQgal8aada ahaaqcbasabeaajugWa8qacaWG0bGaamiAaaaacaqGGcqcLbsacaqG IbGaaeiBaiaab+gacaqGJbGaae4Aaaaa@696C@

μ is the general mean effect  MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaqG8oGaaeiiaiaabMgacaqGZbGaaeiOaiaabshacaqGObGa aeyzaiaabckacaqGNbGaaeyzaiaab6gacaqGLbGaaeOCaiaabggaca qGSbGaaeiOaiaab2gacaqGLbGaaeyyaiaab6gacaqGGcGaaeyzaiaa bAgacaqGMbGaaeyzaiaabogacaqG0bGaaeiOaaaa@54BA@

τ i  is the effect of the  i th  treatment  MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaqGepWcpaWaaSbaaKqaGeaajugWa8qacaqGPbaajeaipaqa baqcLbsapeGaaiiOaiaabMgacaqGZbGaaiiOaiaabshacaqGObGaae yzaiaabckacaqGLbGaaeOzaiaabAgacaqGLbGaae4yaiaabshacaqG GcGaae4BaiaabAgacaqGGcGaaeiDaiaabIgacaqGLbGaaeiOaiaadM gal8aadaahaaqcbasabeaajugWa8qacaWG0bGaamiAaaaajugibiaa bckacaqG0bGaaeOCaiaabwgacaqGHbGaaeiDaiaab2gacaqGLbGaae OBaiaabshacaqGGcaaaa@62EC@

β j   is the effect of the  j th  block  MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaqGYoGcpaWaaSbaaKqaGeaajugWa8qacaqGQbaal8aabeaa jugib8qacaGGGcGaaeyAaiaabohacaqGGcGaaeiDaiaabIgacaqGLb GaaeiOaiaabwgacaqGMbGaaeOzaiaabwgacaqGJbGaaeiDaiaabcka caqGVbGaaeOzaiaabckacaqG0bGaaeiAaiaabwgacaqGGcGaaeiOai aabQgak8aadaahaaWcbeqcbasaaKqzadWdbiaabshacaqGObaaaKqz GeGaaeiOaiaabkgacaqGSbGaae4BaiaabogacaqGRbGaaeiOaaaa@600D@

the i.i.d. random error component with ε ij MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaqG1oWcpaWaaSbaaKqaGeaajugWa8qacaqGPbGaaeOAaaqc baYdaeqaaaaa@3DCB@ NID (0, σ 2 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacqaHdpWCl8aadaahaaqcbasabeaajugWa8qacaaIYaaaaaaa @3CFE@ ) In the following illustrative example, we will use already generated design BIBD (7,7,3,3,1). Suppose an experiment is to be run to compare v = 7 composi­tions of pharmacological substances in terms of reaction time in the bio statistical experiment. Further, assume that only 3 observations can be taken per day, and that the experiment must be completed within 7 days. The incidence matrix in Table 1 below has a 1 in the (i, j)-cell, if the treatment i is contained in block j and 0 otherwise, and also the data of above-mentioned example (Table 1).

 Blocks

Treatments

1

2

3

4

5

6

7

1

1(=73)

0

0

0

1(=64)

0

1(=66)

2

1(=71)

1(=68)

0

0

0

1(=65)

0

3

0

1(=67)

1(=72)

0

0

0

1(=72)

4

1(=75)

0

1(=74)

1(=73)

0

0

0

5

0

1(=71)

0

1(=69)

1(=70)

0

0

6

0

0

1(=68)

0

1(=67)

1(=71)

0

7

0

0

0

1(=71)

0

1(=75)

1(=74)

Table 1 Incidence matrix for BIBD (7,7,3,3,1)

For this experiment we apply Inter-block analysis of variance, where the treatment effects are estimated after eliminating the block effects from the normal equations. When blocks are incomplete, there are two sources of information about treatment effects, but the bigger part comes from the analysis done below. In Table 2, we give variance table about such analysis that can be compiled into the intra - block analysis of variance table for testing the significance of treatment effect given as follows: (Table 2).

Source of variation

Sum of squares

Degrees of freedom

Mean square

F0

Between Treatments

 

 

 

 

(adjusted)

SSTr(adj)

v-1

MSTr=SSTr(adj)/v-1

MSTreatment/MSE

Between Blocks

 

 

 

 

(adjusted)

SSBlocks(unadj)

b-1

 

 

Intrablock

 

 

 

 

Error

SSError(substraction)

N-a-b+1

MSError

 

Total

SSTotal=∑∑y2i j-G2/N

N-1

 

 

Table 2 Intra-block analysis of variance table for BIBD

The form of the ANOVA used to analyze BIBD data depends on the type of analysis. After its application, the researcher retains or rejects the hypothesis, often based on a statistical mechanism called hypothesis testing. The null hypothesis of our interest is

H 0 :  τ 1 = τ 2 == τ 7 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaqGibGcpaWaaSbaaKqaGeaajugWa8qacaaIWaaal8aabeaa jugib8qacaGG6aGaaeiOaiaabs8al8aadaWgaaqcbasaaKqzadWdbi aaigdaaKqaG8aabeaajugib8qacqGH9aqpcaqGepWcpaWaaSbaaKqa GeaajugWa8qacaaIYaaajeaipaqabaqcLbsapeGaeyypa0JaeyOjGW Raeyypa0JaaeiXdOWdamaaBaaajeaibaqcLbmapeGaaG4naaWcpaqa baaaaa@5004@

and the alternative hypothesis is

H 1 :  at least one pair of τ j 's is different  MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaqGibGcpaWaaSbaaKqaGeaajugWa8qacaaIXaaal8aabeaa jugib8qacaGG6aGaaeiOaiaabggacaqG0bGaaeiOaiaabYgacaqGLb GaaeyyaiaabohacaqG0bGaaeiOaiaab+gacaqGUbGaaeyzaiaabcka caqGWbGaaeyyaiaabMgacaqGYbGaaeiOaiaab+gacaqGMbGaaeiOai aabs8al8aadaWgaaqcbasaaKqzadWdbiaabQgaaKqaG8aabeaajugi b8qacaGGNaGaam4CaiaabckacaqGPbGaae4CaiaabckacaqGKbGaae yAaiaabAgacaqGMbGaaeyzaiaabkhacaqGLbGaaeOBaiaabshacaqG Gcaaaa@671F@

In Table 3 we present output results, estimated using some basic functions in R. For ex­ample, the linear model function lm() to conduct linear regression analysis and anova() function as a traditional statistical approach (Table 3).

 

Df

Sum of squares

Mean square

F value

Pr(>F)

Blocks

6

85.619

14.2698

1.1515

0.4146

Treatments

6

29.524

4.9206

0.3971

0.8617

Residuals

8

99.143

12.3929

 

 

Table 3 Analysis of variance table

The test for null hypothesis  is based on the rule that if  , then  is rejected. At the 5% significance level, the p-value for treatments is less than 0.05, which means that the null hypothesis is rejected and the difference between group considering the blocks is not significant as well (p-value > 0.05).

Regression analysis is a very powerful tool for better understanding the relationship between one or more predictor variables and the response variable. When we run such model, the variance of the errors must be constant and they must have a mean of zero. If this isn’t the case, the model may not be valid. To verify these assumptions, we should check the model adequacy that includes the verification of the independence and normality of the residuals. Below are the plots from the analysis we do for the numerical example: The first graph illustrates residuals versus fitted values from the standard regression model for BIBD. The errors have constant variance, with the residuals scattered ran­domly around 0. If the residuals increase or decrease with the fitted values, the errors may not have the constant variance. The third Normal Q-Q plot (Quantile - Quantile plot) indicates the normality of the residuals e ij =  y ij y ˜ ij MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaKqzGeaeaaaaaa aaa8qacaWGLbGcpaWaaSbaaKqaGeaajugWa8qacaWGPbGaamOAaaWc paqabaqcLbsapeGaeyypa0JaaiiOaiaadMhak8aadaWgaaqcbasaaK qzadWdbiaadMgacaWGQbaal8aabeaajugib8qacqGHsislceWG5bGb aGaakmaaBaaajeaibaqcLbmacaWGPbGaamOAaaWcbeaaaaa@4AC1@ . The second picture shows residuals in case when variance is more constant. We emphasize that the regression model is transformed concerning a logarithmitic function (Figure 2).

Figure 2 Checking assumptions about residuals in regression analysis.

Statistical computing for Latin square design

In this section we consider the brief history of Latin square designs (LSDs), the basic statistical model and analysis of variance table, and finally a numerical example, esti­mated using R and appropriate packages. In 1782, the famous Swiss mathematician Leonhard Euler first introduced Latin squares in his famous entertaining Thirty-six Officers problem: Given 6 distinct regiments each consisting of 6 distinct ranks, is it possible to arrange a grid such that each row and each column of the grid contains exactly one representative from each regiment and exactly one representative of each rank? After so many years, this problem is still unsolved and is conjectured that there was no such arrangement. But on the other hand, it is believed that the question marks the beginning of the progressive investiga­tion of Latin squares.10,12

A Latin square of order n is an n x n array consisting of n distinct symbols from a set N of cardinality n, such that each symbol appears exactly once in each row and exactly once in each column. Such efficient designs are primarily used in Experimental design, in particular in agri­cultural, biological and medical experiments. The use of LSD seems to be highly effective for controlling two source of external variation. The principle can be further extended to control more than two sources of variation. The design is also useful for investigating simultaneously effects a single treatment and two possible blocking vari­ables, each with the same number of levels.The statistical model for a Latin square is

y ijk =μ+ α i + τ j + β k + ε ijk  i,j,k=1,,p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaabaaaaaaaaape GaaeyEa8aadaWgaaWcbaWdbiaabMgacaqGQbGaae4AaaWdaeqaaOWd biabg2da9iaabY7acqGHRaWkcqaHXoqypaWaaSbaaSqaa8qacaWGPb aapaqabaGcpeGaey4kaSIaaeiXd8aadaWgaaWcbaWdbiaabQgaa8aa beaak8qacqGHRaWkcaqGYoWdamaaBaaaleaapeGaae4AaaWdaeqaaO WdbiabgUcaRiaabw7apaWaaSbaaSqaa8qacaqGPbGaaeOAaiaabUga a8aabeaak8qacaGGGcGaamyAaiaacYcacaWGQbGaaiilaiaadUgacq GH9aqpcaaIXaGaaiilaiabgAci8kaacYcacaWGWbaaaa@5A04@

where:
yijk  is the observation in the ith  row and kth  column for the jth  treatment
μ is the overall mean
αi  is the ith  row effect
τj  is the jth  treatment effect
is the kth  column effect
is the random error component

Consider an experiment conducted to investigate the clinical effect of a new medi­cal product. Four volunteers were given varying doses from the medicine and each of them received four different treatments with the corresponding priority levels - L=”Low”, M= ”Medium”, H=”High”, C=”Critical”. The table below shows the order of the treatments and the clinical result (change in heart rate) for each volunteer and treatment. The analysis of experiment includes diverse types of tests. Before running an exper­iment, a researcher must design a global plan, including the tests he wishes to use in the data analysis procedure after the test (Table 4&5).

Source of variation

Sum of squares

Degrees of freedom

Mean square

F0

Treatments

SSTreatments

p-1

MSTreatments

F0=MSTreatment/MSE

Rows

SSRows

p-1

MSRows

Columns

SSColumns

p-1

MSColumns

 

Error

SSE

(p-2)(p-1)

MSE

 

Total

SSTotal

p2-1

 

 

Table 4 Analysis of variance table for the Latin square design

 

Position 1

Position 2

Position 3

Position 4

Volunteer 1

H=26.7

C=19.7

M=29

L=29.8

Volunteer 2

L=23.1

M=21.7

C=24.9

H=29

Volunteer 3

M=29.3

L=20.1

H=29

C=27.3

Volunteer 4

C=25.1

H=17.4

L=28.7

M=35.1

Table 5 Data for the clinical effect

In some circumstances, the preliminary analysis indicates that there may be some interesting results that cannot be analyzed through the preplanned trials.

Before proceeding with ANOVA analysis of LSD, we perform Box—and—Whisker d diagram, which is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile and maxi­mum (Figure 3).

Figure 3 Box plots for Latin square design.

Note that the differences considering the volunteer is low, it is medium considering the treatments and high considering the positions. Now let confirm these graphic ob­servations with the analysis of variance table. ANOVA is a set of statistical methods used mainly to compare the means of two or more samples. ANOVA can be treated as a special case of general linear regression where predictor variables are factors. Each value that can be taken by a factor is reflected to as a level. The build-in function in R aov() both examine a dependent variable and determine the variability of this variable in response to various factors. The results for the numerical example are listed in the table below.

Significance of F - test for null hypothesis: H 0 : τ i =0 vs  H a : τ i 0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqkY=MjYJH8sqFD0xXdHaVhbbf9v8qqaqFr0xc9pk0xbb a9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr 0=vqpWqaaeaabaGaciaacaqabeaadaqaaqaaaOqaaGqacabaaaaaaa aapeGaa8hsa8aadaWgaaWcbaWdbiaaicdaa8aabeaak8qacaGG6aGa a8hXd8aadaWgaaWcbaWdbiaa=Lgaa8aabeaak8qacqGH9aqpcaaIWa Gaa8hOaiaa=zhacaWFZbGaa8hOaiaa=HeapaWaaSbaaSqaa8qacaWF HbaapaqabaGcpeGaaiOoaiaa=r8apaWaaSbaaSqaa8qacaWFPbaapa qabaGcpeGaeyiyIKRaaGimaaaa@4BA1@

  1. The difference between group considering the volunteer is not significant (p-value>0.1).
  2. The difference between group considering the positions is quite significant.
  3. Among the treatments, there is no evidence of significant differences, at the sig­nificance levels of 1% and 5% (Table 6).

 

Df

Sum of squares

Mean square

F value

Pr(>F)

Volunteer

3

9.427

3.142

0.8821

0.501548

Positions

3

245.912

81.971

23.0106

0,001084**

Treatments

3

45.277

15.092

4.2367

0,062818.

Residuals

6

21.374

3.562

 

 

Table 6 Analysis of variance table for the clinical data

Note: *** significant at 0.1%,** at 1%, * at 5%, . at 10%

Considerations and remarks

This paper explores the application of BIBDs and LSDs in statistical design of exper­iments. We revise the simplest combinatorial designs, as was previously stated, in order to summarize the basic idea of their usage. On the whole, the main reason to choose these designs is the opportunity to do a comparison between structures with one and two source of variation. Block designs provides error control measures for elimination in only one direction - block variations, whereas the improved design, Latin square can eliminate treatments effects using two source of variations, namely row and column.

As an extension of this work, we plan to consider particular cases of these combinato­rial designs applying to other statistical models, exploring and improving the compu­tational features in R.

Acknowledgements

Teresa A. Oliveira was partially sponsored by national funds through the Fundação Nacional para a Ciência e Tecnologia, Portugal - FCT under the project UID/MAT/00006/2013.

Conflict of interest

Author declares that there is no conflict of interest.

References

  1. Colbourn C, Dinitz J. CRC Handbook of Combinatorial Designs. Boca Raton: CRC Press; 2007.
  2. Beth T, Jungnickel D, Lenz H. Design Theory. England: Cambridge University Press, Cam­bridge; 1999.
  3. Hedayat AS, Sloane NJA, Stufken John. Orthogonal arrays: theory and application. Springer; 1999.
  4. John, Peter William Meredith. Incomplete block designs. Marcel Dekker; 1980.
  5. Montgomery, Douglas C. Design and analysis of experiments. 5th edn. John Wiley and Sons; 2001.
  6. Oliveira TA. BIB designs with repeated blocks: review and perspectives. Proceedings of the Tenth Islamic Countries Conference on Statistical Sciences (ICCS– X), Volume I. The Islamic Countries Society of Statistical Sciences, Lahore: Pakistan. 2010;82–96.
  7. Brouwer AE. The number of mutually orthogonal Latin squares–a table up to order 10000. Report ZW 123. Amsterdam: Math Center; 1978.
  8. Johnson DM, Dulmage AI, Mendelson NS. Orthomorphisms of group and orthogonal Latin squares. Canadian J Math. 1961;13:356– 372.
  9. Schellenberg PJ, Van Rees GM, Vanstone SA. Four pair wise orthogonal Latin squares of order 15. Ars Combinatoria. 1978;6:141– 150.
  10. Davim, Paulo J. Statistical and Computational Techniques in Manufacturing. Springer. 2012.
  11. Todorov DT. Four mutually orthogonal Latin squares of order 14. Journal of Combinato­rial Designs. 2012;20:1– 5.
  12. Abel RJR, Todorov DT. Four mutually orthogonal Latin squares of orders 20, 30, 38 and 44. J Comb Theory(A). 1993;64:144–148.
Creative Commons Attribution License

©2018 Valcheva, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.