Association scheme

The theory of association schemes arose in statistics, in the theory of experimental design for the analysis of variance.^[1]^[2]^[3] In mathematics, association schemes belong to both algebra and combinatorics. In algebraic combinatorics, association schemes provide a unified approach to many topics, for example combinatorial designs and the theory of error-correcting codes.^[4]^[5] In algebra, association schemes generalize groups, and the theory of association schemes generalizes the character theory of linear representations of groups.^[6]^[7]^[8]

Definition

An n-class association scheme consists of a set X together with a partition S of X × X into n + 1 binary relations, R₀, R₁, ..., R_n which satisfy:

$R_{0}=\{(x,x):x\in X\}$ ; it is called the identity relation.
Defining $R^{*}:=\{(x,y):(y,x)\in R\}$ , if R in S, then R* in S.
If $(x,y)\in R_{k}$ , the number of $z\in X$ such that $(x,z)\in R_{i}$ and $(z,y)\in R_{j}$ is a constant $p_{ij}^{k}$ depending on $i$ , $j$ , $k$ but not on the particular choice of $x$ and $y$ .

An association scheme is commutative if $p_{ij}^{k}=p_{ji}^{k}$ for all $i$ , $j$ and $k$ . Most authors assume this property. Note, however, that while the notion of an association scheme generalizes the notion of a group, the notion of a commutative association scheme only generalizes the notion of a commutative group.

A symmetric association scheme is one in which each $R_{i}$ is a symmetric relation. That is:

if (x, y) ∈ R_i, then (y, x) ∈ R_i. (Or equivalently, R* = R.)

Every symmetric association scheme is commutative.

Two points x and y are called i th associates if $(x,y)\in R_{i}$ . The definition states that if x and y are i th associates then so are y and x. Every pair of points are i th associates for exactly one $i$ . Each point is its own zeroth associate while distinct points are never zeroth associates. If x and y are k th associates then the number of points $z$ which are both i th associates of $x$ and j th associates of $y$ is a constant $p_{ij}^{k}$ .

Graph interpretation and adjacency matrices

A symmetric association scheme can be visualized as a complete graph with labeled edges. The graph has $v$ vertices, one for each point of $X$ , and the edge joining vertices $x$ and $y$ is labeled $i$ if $x$ and $y$ are $i$ th associates. Each edge has a unique label, and the number of triangles with a fixed base labeled $k$ having the other edges labeled $i$ and $j$ is a constant $p_{ij}^{k}$ , depending on $i,j,k$ but not on the choice of the base. In particular, each vertex is incident with exactly $p_{ii}^{0}=v_{i}$ edges labeled $i$ ; $v_{i}$ is the valency of the relation $R_{i}$ . There are also loops labeled $0$ at each vertex $x$ , corresponding to $R_{0}$ .

The relations are described by their adjacency matrices. $A_{i}$ is the adjacency matrix of $R_{i}$ for $i=0,\ldots ,n$ and is a v × v matrix with rows and columns labeled by the points of $X$ .

\left(A_{i}\right)_{x,y}={\begin{cases}1,&{\mbox{if }}(x,y)\in R_{i},\\0,&{\mbox{otherwise.}}\end{cases}}\qquad (1)

The definition of a symmetric association scheme is equivalent to saying that the $A_{i}$ are v × v (0,1)-matrices which satisfy

A_{i}

is symmetric,

II.

\sum _{i=0}^{n}A_{i}=J

(the all-ones matrix),

III.

A_{0}=I

IV.

A_{i}A_{j}=\sum _{k=0}^{n}p_{ij}^{k}A_{k}=A_{j}A_{i},i,j=0,\ldots ,n

The (x, y)-th entry of the left side of (IV) is the number of paths of length two between x and y with labels i and j in the graph. Note that the rows and columns of $A_{i}$ contain $v_{i}$ $1$ 's:

A_{i}J=JA_{i}=v_{i}J.\qquad (2)

Terminology

The numbers $p_{ij}^{k}$ are called the parameters of the scheme. They are also referred to as the structural constants.

History

The term association scheme is due to (Bose & Shimamoto 1952) but the concept is already inherent in (Bose & Nair 1939).^[9] These authors were studying what statisticians have called partially balanced incomplete block designs (PBIBDs). The subject became an object of algebraic interest with the publication of (Bose & Mesner 1959) and the introduction of the Bose–Mesner algebra. The most important contribution to the theory was the thesis of P. Delsarte (Delsarte 1973) who recognized and fully used the connections with coding theory and design theory.^[10] Generalizations have been studied by D. G. Higman (coherent configurations) and B. Weisfeiler (distance regular graphs).

Basic facts

$p_{00}^{0}=1$ , i.e., if $(x,y)\in R_{0}$ then $x=y$ and the only $z$ such that $(x,z)\in R_{0}$ is $z=x$ .
$\sum _{i=0}^{k}p_{ii}^{0}=|X|$ ; this is because the $R_{i}$ partition $X$ .

The Bose–Mesner algebra

The adjacency matrices $A_{i}$ of the graphs $\left(X,R_{i}\right)$ generate a commutative and associative algebra ${\mathcal {A}}$ (over the real or complex numbers) both for the matrix product and the pointwise product. This associative, commutative algebra is called the Bose–Mesner algebra of the association scheme.

Since the matrices in ${\mathcal {A}}$ are symmetric and commute with each other, they can be diagonalized simultaneously. Therefore, ${\mathcal {A}}$ is semi-simple and has a unique basis of primitive idempotents $J_{0},\ldots ,J_{n}$ .

There is another algebra of $(n+1)\times (n+1)$ matrices which is isomorphic to ${\mathcal {A}}$ , and is often easier to work with.

Examples

The Johnson scheme, denoted by J(v, k), is defined as follows. Let S be a set with v elements. The points of the scheme J(v, k) are the ${v \choose k}$ subsets of S with k elements. Two k-element subsets A, B of S are i th associates when their intersection has size k − i.
The Hamming scheme, denoted by H(n, q), is defined as follows. The points of H(n, q) are the qⁿ ordered n-tuples over a set of size q. Two n-tuples x, y are said to be i th associates if they disagree in exactly i coordinates. E.g., if x = (1,0,1,1), y = (1,1,1,1), z = (0,0,1,1), then x and y are 1st associates, x and z are 1st associates and y and z are 2nd associates in H(4,2).
A distance-regular graph, G, forms an association scheme by defining two vertices to be i th associates if their distance is i.
A finite group G yields an association scheme on $X=G$ , with a class R_g for each group element, as follows: for each $g\in G$ let $R_{g}=\{(x,y)\mid x=g*y\}$ where $*$ is the group operation. The class of the group identity is R₀. This association scheme is commutative if and only if G is abelian.
A specific 3-class association scheme:^[11]

Let A(3) be the following association scheme with three associate classes on the set X = {1,2,3,4,5,6}. The (i, j ) entry is s if elements i and j are in relation R_s.

	1	2	3	4	5	6
1	0	1	1	2	3	3
2	1	0	1	3	2	3
3	1	1	0	3	3	2
4	2	3	3	0	1	1
5	3	2	3	1	0	1
6	3	3	2	1	1	0

Coding theory

The Hamming scheme and the Johnson scheme are of major significance in classical coding theory.

In coding theory, association scheme theory is mainly concerned with the distance of a code. The linear programming method produces upper bounds for the size of a code with given minimum distance, and lower bounds for the size of a design with a given strength. The most specific results are obtained in the case where the underlying association scheme satisfies certain polynomial properties; this leads one into the realm of orthogonal polynomials. In particular, some universal bounds are derived for codes and designs in polynomial-type association schemes.

In classical coding theory, dealing with codes in a Hamming scheme, the MacWilliams transform involves a family of orthogonal polynomials known as the Krawtchouk polynomials. These polynomials give the eigenvalues of the distance relation matrices of the Hamming scheme.

Notes

^ Bailey 2004, pg. 387
^ Bose & Mesner 1959
^ Bose & Nair 1939
^ Bannai & Ito 1984
^ Godsil 1993
^ Bailey 2004, pg. 387
^ Zieschang 2005b
^ Zieschang 2005a
^ Dembowski 1968, pg. 281, footnote 1
^ Bannai & Ito 1984, pg. vii
^ Street & Street 1987, pg. 238

References

Bailey, Rosemary A. (2004), Association Schemes: Designed Experiments, Algebra and Combinatorics, Cambridge University Press, ISBN 978-0-521-82446-0, MR 2047311. (Chapters from preliminary draft are available on-line.)
Bannai, Eiichi; Ito, Tatsuro (1984), Algebraic combinatorics I: Association schemes, Menlo Park, CA: Benjamin/Cummings, ISBN 0-8053-0490-8, MR 0882540
Bose, R. C.; Mesner, D. M. (1959), "On linear associative algebras corresponding to association schemes of partially balanced designs", Annals of Mathematical Statistics, 30 (1): 21–38, doi:10.1214/aoms/1177706356, JSTOR 2237117, MR 0102157
Bose, R. C.; Nair, K. R. (1939), "Partially balanced incomplete block designs", Sankhyā, 4 (3): 337–372, JSTOR 40383923
Bose, R. C.; Shimamoto, T. (1952), "Classification and analysis of partially balanced incomplete block designs with two associate classes", Journal of the American Statistical Association, 47 (258): 151–184, doi:10.1080/01621459.1952.10501161
Camion, P. (1998), "18. Codes and Association Schemes: Basic Properties of Association Schemes Relevant to Coding", in Pless, V.S.; Huffman, W.C.; Brualdi, R.A. (eds.), Handbook of Coding Theory, vol. 1, Elsevier, pp. 1441–, ISBN 978-0-444-50088-5
Delsarte, P. (1973), "An Algebraic Approach to the Association Schemes of Coding Theory", Philips Research Reports (Supplement No. 10), OCLC 641852316
Delsarte, P.; Levenshtein, V. I. (1998). "Association schemes and coding theory". IEEE Transactions on Information Theory. 44 (6): 2477–2504. doi:10.1109/18.720545.
Dembowski, P. (1968), Finite Geometries, Springer, ISBN 978-3-540-61786-0
Godsil, C. D. (1993), Algebraic Combinatorics, New York: Chapman and Hall, ISBN 0-412-04131-6, MR 1220704
MacWilliams, F.J.; Sloane, N.J.A. (1977), The Theory of Error Correcting Codes, North-Holland Mathematical Library, vol. 16, Elsevier, ISBN 978-0-444-85010-2
Street, Anne Penfold; Street, Deborah J. (1987), Combinatorics of Experimental Design, Oxford U. P. [Clarendon], ISBN 0-19-853256-3

van Lint, J.H.; Wilson, R.M. (1992), A Course in Combinatorics, Cambridge University Press, ISBN 0-521-00601-5
Zieschang, Paul-Hermann (2005a), "Association Schemes: Designed Experiments, Algebra and Combinatorics by Rosemary A. Bailey, Review" (PDF), Bulletin of the American Mathematical Society, 43 (2): 249–253, doi:10.1090/S0273-0979-05-01077-3
Zieschang, Paul-Hermann (2005b), Theory of association schemes, Springer, ISBN 3-540-26136-2
Zieschang, Paul-Hermann (2006), "The exchange condition for association schemes", Israel Journal of Mathematics, 151 (3): 357–380, doi:10.1007/BF02777367, MR 2214129, S2CID 120009352

Design of experiments

Scientific
method

Treatment
and blocking

Treatment
Effect size
Contrast
Interaction
Confounding
Orthogonality
Blocking
Covariate
Nuisance variable

Models
and inference

Analysis of variance (Anova)
Cochran's theorem
Manova (multivariate)
Ancova (covariance)

Designs

Completely
randomized

Block
Generalized randomized block design (GRBD)
Latin square
Graeco-Latin square
Orthogonal array
Latin hypercube
Repeated measures design
Crossover study

Glossary
Category
Mathematics portal
Statistical outline
Statistical topics

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Population
Statistic
Probability distribution
Sampling distribution
- Order statistic
Empirical distribution
- Density estimation
Statistical model
- Model specification
- L^p space
Parameter
- location
- scale
- shape
Parametric family
- Likelihood (monotone)
- Location–scale family
- Exponential family
Completeness
Sufficiency
Statistical functional
- Bootstrap
- U
- V
Optimal decision
- loss function
Efficiency
Statistical distance
- divergence
Asymptotics
Robustness

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test (normal) Student's t-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging