Shapiro–Francia test

The Shapiro–Francia test is a statistical test for the normality of a population, based on sample data. It was introduced by S. S. Shapiro and R. S. Francia in 1972 as a simplification of the Shapiro–Wilk test.[1]

Theory

Let x ( i ) {\displaystyle x_{(i)}} be the i {\displaystyle i} -th ordered value from our size- n {\displaystyle n} sample. For example, if the sample consists of the values { 5.6 , 1.2 , 7.8 , 3.4 } {\displaystyle \left\{5.6,-1.2,7.8,3.4\right\}} , x ( 2 ) = 3.4 {\displaystyle x_{(2)}=3.4} , because that is the second-lowest value. Let m i : n {\displaystyle m_{i:n}} be the mean of the i {\displaystyle i} th order statistic when making n {\displaystyle n} independent draws from a normal distribution. For example, m 2 : 4 0.297 {\displaystyle m_{2:4}\approx -0.297} , meaning that the second-lowest value in a sample of four draws from a normal distribution is typically about 0.297 standard deviations below the mean.[2] Form the Pearson correlation coefficient between the x {\displaystyle x} and the m {\displaystyle m} :

W = cov ( x , m ) σ x σ m = i = 1 n ( x ( i ) x ¯ ) ( m i m ¯ ) ( i = 1 n ( x ( i ) x ¯ ) 2 ) ( i = 1 n ( m i m ¯ ) 2 ) {\displaystyle W'={\frac {\operatorname {cov} (x,m)}{\sigma _{x}\sigma _{m}}}={\frac {\sum _{i=1}^{n}(x_{(i)}-{\bar {x}})(m_{i}-{\bar {m}})}{\sqrt {\left(\sum _{i=1}^{n}(x_{(i)}-{\bar {x}})^{2}\right)\left(\sum _{i=1}^{n}(m_{i}-{\bar {m}})^{2}\right)}}}}

Under the null hypothesis that the data is drawn from a normal distribution, this correlation will be strong, so W {\displaystyle W'} values will cluster just under 1, with the peak becoming narrower and closer to 1 as n {\displaystyle n} increases. If the data deviate strongly from a normal distribution, W {\displaystyle W'} will be smaller.[1]

This test is a formalization of the older practice of forming a Q–Q plot to compare two distributions, with the x {\displaystyle x} playing the role of the quantile points of the sample distribution and the m {\displaystyle m} playing the role of the corresponding quantile points of a normal distribution.

Compared to the Shapiro–Wilk test statistic W {\displaystyle W} , the Shapiro–Francia test statistic W {\displaystyle W'} is easier to compute, because it does not require that we form and invert the matrix of covariances between order statistics.

Practice

There is no known closed-form analytic expression for the values of m i : n {\displaystyle m_{i:n}} required by the test. There, are however, several approximations that are adequate for most practical purposes.[2]

The exact form of the null distribution of W {\displaystyle W'} is known only for n = 3 {\displaystyle n=3} .[1] Monte-Carlo simulations have shown that the transformed statistic ln ( 1 W ) {\displaystyle \ln(1-W')} is nearly normally distributed, with values of the mean and standard deviation that vary slowly with n {\displaystyle n} in an easily parameterized form.[3]

Power

Comparison studies have concluded that order statistic correlation tests such as Shapiro–Francia and Shapiro–Wilk are among the most powerful of the established statistical tests for normality.[4] One might assume that the covariance-adjusted weighting of different order statistics used by the Shapiro–Wilk test should make it slightly better, but in practice the Shapiro–Wilk and Shapiro–Francia variants are about equally good. In fact, the Shapiro–Francia variant actually exhibits more power to distinguish some alternative hypothesis.[5]

References

  1. ^ a b c Shapiro, S. S.; Francia, R. S. (1972-03-01). "An Approximate Analysis of Variance Test for Normality". Journal of the American Statistical Association. 67 (337). American Statistical Association: 215–216. doi:10.2307/2284728. ISSN 1537-274X. JSTOR 2284728. OCLC 1480864.
  2. ^ a b Arnold, Barry C.; Balakrishnan, Narayanaswamy; Nagaraja, Haikady N. (2008) [1992]. A First Course in Order Statistics. Classics in Applied Mathematics. Vol. 54. Philadelphia, PA: Society for Industrial and Applied Mathematics. ISBN 978-0-89871-648-1. LCCN 2008061100.
  3. ^ Royston, Patrick (1993). "A Toolkit for Testing for Non-Normality in Complete and Censored Samples". The Statistician. 42 (1). Royal Statistical Society: 37–43. doi:10.2307/2348109. JSTOR 2348109.
  4. ^ Razali, Nornadiah Mohd; Wah, Yap Bee (2011). "Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling Tests". Journal of Statistical Modeling and Analytics. 2 (1). Kuala Lumpur: Institut Statistik Malaysia: 21–33. ISBN 978-967-363-157-5.
  5. ^ Ahmad, Fiaz; Khan, Rehan Ahmad (2015). "A power comparison of various normality tests". Pakistan Journal of Statistics and Operation Research. 11 (3). Lahore, Pakistan: College of Statistical and Actuarial Sciences, University of the Punjab: 331–345. doi:10.18187/pjsor.v11i3.845. ISSN 2220-5810.
  • v
  • t
  • e
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
  • Z-test (normal)
  • Student's t-test
  • F-test
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
  • Category
  • icon Mathematics portal
  • Commons
  • WikiProject