UP | HOME

overview of tests in gbutils

Table of Contents

Intro

Many tests are implemented in the gbtest utility. In the examples below we will use the utility gbrun to generate random samples for the tests and the utility gbfun to manipulate the data when necessary. Before starting let us review the command's option, whose list can be obtained passing -h to the command line or with man gbtest

Options:
-p  compute associated significance
-s  input data are already sorted in ascending order
-F  specify the input fields separators (default " \t")    
-v  verbosity: 0 none, 1 headings, 2+ description (default 0) 
-h  this help

In general, the option -p- prints a two-sided p-score while the one-sided scores are provided as further information using higher verbosity level -v 2 (see the examples below). Some statistics require the data to be sorted and the program perform the sorting internally. If the data are already sorted, using the option -s will save some computation time.

There are three types of tests implemented: those based on a single sample, those based on paired observations and those applicable to multiple samples. This is the list returned by option -h

name type description
D+,D-,D,V 1 samp Kolmogorov-Smirnov tests on cumulated data
W2,A2,U2 1 samp Cramer-von Mises tests on cumulated data
CHI2-1 1 samp Chi-Sqrd, 1 samp. 2nd column: th. prob.
WILCO 1 samp Wilcoxon (mode=0)
TS 1 samp Student's T (mean=0)
TR-TP 1 samp Test of randomness: turning points
TR-DS 1 samp Test of randomness: difference sign
TR-RT 1 samp Test of randomness: rank test
R pairs Pearson's correlation coefficient
RHO pairs Spearman's Rho rank correlation
TAU pairs Kendall's Tau correlation
CHI2-2 pairs Chi-Sqrd, 2 samples
TP pairs Student's T with paired samples
KS 2 samp Kolmogorov-Smirnov test
T 2 samp Student's T with same variances
TH 2 samp Student's T with different variances
F 2 samp F-Test for different variances
WMW 2 samp Wilcoxon-Mann-Whitney U
FP 2 samp Fligner-Policello standardized U^
LEV-MEAN 2+ samp Levene equality of variances using means
LEV-MED 2+ samp Levene equality of variances using medians
KW 3+ samp Kruscal-Wallis test on 3+ samples
CHI2-N 3+ samp Multi-columns contingency table analysis

The tests based on a single sample (1 samp) expect a single column of data. If more columns are provided, the statistics is computed on each column separately.

The tests based on paired observations (pairs) or on two samples (2 samp), require exactly two columns of data. If more columns are provided, in general the test is performed separately for any couple of columns. The output is a matrix that contains the result of the statistics in the lower triangle and, if the option -p is specified, the relative p-scores in the upper triangle. In this case, data are passed column first. That is, the value read in the \(i\) th row and \(j\) th column of the lower triangle is obtained considering the \(j\) th column of data as the first sample and the \(i\) th column as the second sample.

The tests based on more than 2 samples (2+ samp or 3+ samp) compute a single statistics using all provided columns.

In the case of paired samples, the number of observations in each column must be the same. In all other cases, this is not required and "NAN" values in the input are automatically ignored by the program.

Kolmogorov-Smirnov type tests

These tests measure the hypothesis that a set of data \((x_1,\ldots,x_n)\) are independently drawn from a uniform distribution. If you want to test that a set of data \((z_1,\ldots,z_n)\) are independently drawn for a given distribution \(F(x)\), you have to supply the tests with the transformed data \((F(z_1),\ldots,F(z_n))\). Before computing the statistics, the data are sorted in increasing order.. If you already supply sorted data, consider using the option -s to spare some time. In what follows the function \([x]\) is the integer part of \(x\) and \([x]^+\) is equal to \(x\) if \(x>0\) and zero otherwise.

\(D^+\) statistics (D+) and \(D^-\) statistics (D-)

Consider the maximum positive and maximum negative deviations observed in the sample from the uniform distribution

\[ D^+=\max_{i=1,\ldots,n} [x_i - i/n]^+, \quad D^-=\max_{i=1,\ldots,n} [i/n - x_i]^+. \]

Under the null, both statistics are exactly distributed according to

\[ G(x) = 1-x \sum_{j=1}^{[n (1-x)]} \binom{n}{j} \left( 1-x-j/n \right)^{n-j} \left(x+j/n \right)^{j-1}. \]

Their asymptotic distribution of \(\sqrt{n} D^+\) and \(\sqrt{n} D^-\) for large \(n\) is

\[ H(x) = 1-e^{-2x^2}. \]

>gbrand exponential 1 -c 1 -r 20 | gbfun '1-exp(-x)' | gbtest -p -v 2  D-
# loaded 20x1 data table
# D- Kolmogorov-Smirnov test
#  Observed D-                         0.0599563
#  Sample size                         20
#  Standardized score                  0.268133
#  Exact inference:
#   One-sided P-value: Pr{stat>D-}      0.823889
#   One-sided P-value: Pr{stat<D-}      0.176111
#  Asymptotic inference:
#   One-sided P-value: Pr{stat>D-}      0.866069
#   One-sided P-value: Pr{stat<D-}      0.133931
#statistic	p-score
 5.995630e-02  8.238891e-01

In this case we generate \(20\) observations from the exponential distribution and transform them using the correct distribution function, \(F(x)=1-e^{-x}\). As expected, the test does not reject the null. Note that the asymptotic p-score is already quite good, despite the relative small sample.

>gbrand flat 0 1 -c 1 -r 20 | gbfun '1-exp(-x)' | gbtest -p -v 2  D-  
# loaded 20x1 data table
# D- Kolmogorov-Smirnov test
#  Observed D-                         0.367974
#  Sample size                         20
#  Standardized score                  1.64563
#  Exact inference:
#   One-sided P-value: Pr{stat>D-}      0.003084
#   One-sided P-value: Pr{stat<D-}      0.996916
#  Asymptotic inference:
#   One-sided P-value: Pr{stat>D-}      0.00444396
#   One-sided P-value: Pr{stat<D-}      0.995556
#statistic	p-score
 3.679745e-01  3.084003e-03

In this case, data are from a uniform distribution. The test reject the hypothesis that they are exponentially distributed with a p-score of \(3\, 10^{-3}\).

D statistics (D)

This is a two-sided version of the previous statistics, as both positive and negative deviations are considered,

\[ D^+=\max_{i=1,\ldots,n} |x_i - i/n|. \]

The test provide both a large \(n\) asymptotic inference and a small sample corrected version.

>gbrand cauchy 2 -c 1 -r 20 | gbfun 'atan(x1/2)/pi+0.5' | gbtest -p -v 2 D
# loaded 20x1 data table
# D Kolmogorov-Smirnov test
#  Observed D                          0.16291
#  Sample size                         20
#  Asymptotic standardized score       0.728555
#  Small-sample corrected score        0.752111
#  Small-sample corrected inference:
#   One-sided P-value: Pr{stat>D}      0.623614
#   One-sided P-value: Pr{stat<D}      0.376386
#  Asymptotic inference:
#   One-sided P-value: Pr{stat>D}      0.663323
#   One-sided P-value: Pr{stat<D}      0.336677
#statistic	p-score
 1.629098e-01  6.236143e-01

We generate \(20\) observations independently from a Cauchy distribution with parameter \(2\) and transform them with the correct cumulative distribution function \(F(x)=1/\pi\arctan x/2 + 1/2\). As expected, the \(D\) statistics does not reject the correct hypothesis.

>gbrand gaussian 1 -c 1 -r 20 | gbfun 'atan(x1/2)/pi+0.5' | gbtest -p -v 2 D
# loaded 20x1 data table
# D Kolmogorov-Smirnov test
#  Observed D                          0.302732
#  Sample size                         20
#  Asymptotic standardized score       1.35386
#  Small-sample corrected score        1.39763
#  Small-sample corrected inference:
#   One-sided P-value: Pr{stat>D}      0.0402113
#   One-sided P-value: Pr{stat<D}      0.959789
#  Asymptotic inference:
#   One-sided P-value: Pr{stat>D}      0.0511635
#   One-sided P-value: Pr{stat<D}      0.948836
#statistic	p-score
 3.027317e-01  4.021128e-02

If the data are generated from a Normal distribution and then transformed with the Cauchy distribution, then the test recognizes it and rejects the hypothesis with a p-score of \(4\, 10^{-2}\).

Two-sample Kolmogorov-Smirnov Test (KS)

This is a two-sample version of the previous test. The test checks if two samples \((x_1,\ldots,x_n)\) and \((y_1,\ldots,y_m)\) are generated by independently drawing from the same distribution. Note that the two set of observations need not to have the same size. Starting from the empirical distribution functions, \(\hat{F}_x\) and \(\hat{F}_y\), the relevant statistics is defined as

\[ D = \sup_z | \hat{F}_x(z)- \hat{F}_y(z)|. \]

The test reports a one-sided asymptotic p-score applying the asymptotic inference of the previous test with an effective size sample equal to \(\sqrt{nm/(n+m)}\).

>gbrand gaussian 1 -c 2 -r 20 | gbtest -p -v 2 KS
# loaded 20x2 data table
# Kolmogorov-Smirnov Test; 2-samples
#  Observed KS                          0.3
#  Effective Size                       10
#  Asymptotic inference:
#   One-sided P-value: Pr{stat>KS}      0.275269
#statistic	p-score
 3.000000e-01  2.752689e-01

The two columns of data come from the same Normal distribution. The test does not reject the hypothesis.

>paste <(gbrand -R 1 gaussian 1 -c 1 -r 20) <(gbrand -R 2 cauchy 1 -c 1 -r 30) | gbtest KS -p -v 2
# loaded 30x2 data table
# Kolmogorov-Smirnov Test; 2-samples
#  Observed KS                          0.466667
#  Effective Size                       12
#  Asymptotic inference:
#   One-sided P-value: Pr{stat>KS}      0.00672794
#statistic	p-score
 4.666667e-01  6.727939e-03

By comparing a sample from a Normal distribution with a sample from a Cauchy distribution, the hypothesis is rejected with a p-score of \(6.7\, 10^{-3}\).

Tests of Randomness

These tests are performed over a sequence of observations \((x_1,\ldots,x_n)\) to detect a deviation form the null hypothesis that the observations are independent and identically distributed. In what follows the function \(\theta(x)\) is equal to one if \(x>0\) and zero otherwise.

The turning point test (TR-TP)

The following statistics counts the number of turning points in the sequence

\[ T=\sum_{i=1}^{n} \theta(x_i -x_{i-1}) \theta(x_i -x_{i+1}) + \theta(x_{i-1} - x_i) \theta(x_i -x_{i-1}). \]

Under the null, the statistics is asymptotically normally distributed with mean and variance given by

\[ \bar{T}=2 (n-2)/3, \quad \sigma^2_T = (16 n -29)/90. \]

A positive and large standardized score signal that the sequence oscillates too much to be iid, while a large negative standardized score reveal the present of anomalous local serial correlations. The one side statistics is the absolute value of the standardized score and signal the presence of serial correlation or anti-correlation.

>gbrand gaussian 1 -c 1 -r 100 | gbtest -p -v 2 TR-TP 
# loaded 100x1 data table
# Turning Points Test of Randomness (1 sample)
#  Observed Turning Points             63
#  Expected mean                       65.3333
#  Expected std. dev.                  4.17798
#  Standardized score                  -0.558483
#  Asymptotic inference:
#   One-sided P-value: Pr{stat<obs.}      0.288257
#   One-sided P-value: Pr{stat>obs.}      0.711743
#   Two-sided P-value: Pr{|stat|>|obs.|}  0.576515
#statistic	p-score
 6.300000e+01  5.765146e-01

In this case the test is unable to reject the hypothesis of iid observations. This is expected as the sample was made of \(100\) independent observations drawn from a Gaussian distribution.

>gbrand>gbrand gaussian 1 -c 1 -r 100 | gbfun 'x1+sin(x0)' | gbtest -p -v 2 TR-TP
# loaded 100x1 data table
# Turning Points Test of Randomness (1 sample)
#  Observed Turning Points             57
#  Expected mean                       65.3333
#  Expected std. dev.                  4.17798
#  Standardized score                  -1.99458
#  Asymptotic inference:
#   One-sided P-value: Pr{stat<obs.}      0.0230442
#   One-sided P-value: Pr{stat>obs.}      0.976956
#   Two-sided P-value: Pr{|stat|>|obs.|}  0.0460885
#statistic	p-score
 5.700000e+01  4.608848e-02

In this case a non-linear drift was added to the observations and the test reject the iid hypothesis with a p-score of \(4.6\, 10^{-2}\).

The difference-sign test (TR-DS)

This test is specifically designed to detect trends in data. Consider the statistics

\[ S=\sum_{i=2}^{n} \theta(x_i -x_{i-1}). \]

Under the null, this statistics is asymptotically normally distributed with mean and variance given by

\[ \bar{S}=(n-1)/2, \quad \sigma^2_S = (n=1)/12. \]

A positive and large standardized score signal the presence of a positive trend in data, while a negative large standardized score is associated to negative trends. The one side statistics is the absolute value of the standardized score and signal the presence of a positive or negative trend.

>gbrand gaussian 1 -c 1 -r 100 | gbtest -p -v 2 TR-DS 
# loaded 100x1 data table
# Difference Sign Test of Randomness (1 sample)
#  Observed Positive Differences       51
#  Expected mean                       49.5
#  Expected std. dev.                  2.90115
#  Standardized score                  0.517036
#  Asymptotic inference:
#   One-sided P-value: Pr{stat<obs.}      0.697435
#   One-sided P-value: Pr{stat>obs.}      0.302565
#   Two-sided P-value: Pr{|stat|>|obs.|}  0.605131
#statistic	p-score
 5.100000e+01  6.051307e-01

In this case the test is unable to reject the hypothesis of iid observations. This is expected as the sample was made of \(100\) independent observations drawn from a Gaussian distribution.

>gbrand gaussian 1 -c 1 -r 100 | gbfun 'x1+.2*x0' | gbtest -p -v 2 TR-DS 
# loaded 100x1 data table
# Difference Sign Test of Randomness (1 sample)
#  Observed Positive Differences       57
#  Expected mean                       49.5
#  Expected std. dev.                  2.90115
#  Standardized score                  2.58518
#  Asymptotic inference:
#   One-sided P-value: Pr{stat<obs.}      0.995134
#   One-sided P-value: Pr{stat>obs.}      0.00486637
#   Two-sided P-value: Pr{|stat|>|obs.|}  0.00973275
#statistic	p-score
 5.700000e+01  9.732748e-03

By adding a positive linear trend to the data, the idd hypothesis is rejected by the test with a p-score of \(9.7\, 10^{-3}\).

The rank test (TR-RT)

This test is particularly suited for detecting a linear trend in the data. The following statistics counts the number of ordered pairs.

\[ P=\sum_{i=1}^{n-1} \sum_{j=i+1}^n \theta(x_j -x_i), \]

If the observations are independent and identically distributed, \(P\) is asymptotically normally distributed with mean and variance given by (see Kendall and Stuart, "The Advanced Theory of Statistics", vol.3, section 45.24)

\[ \bar{P}=n (n-1)/4, \quad \sigma^2_P=n (n-1) (2n+5)/72. \]

A positive and large standardized score signal the presence of an increasing trend, while a negative and large score the presence of a decreasing trend. The one side statistics is based on the absolute value of the standardized score and signal the presence of a trend.

>gbrand gaussian 1 -c 1 -r 100 | gbtest -p -v 2 TR-RT
# loaded 100x1 data table
# Rank Test of Randomness (1 sample)
#  Observed increasing couples         2613
#  Expected mean                       2475
#  Expected std. dev.                  167.891
#  Standardized score                  0.82196
#  Asymptotic inference:
#   One-sided P-value: Pr{stat<obs.}      0.79445
#   One-sided P-value: Pr{stat>obs.}      0.20555
#   Two-sided P-value: Pr{|stat|>|obs.|}  0.4111
#statistic	p-score
 2.613000e+03  4.110995e-01

In this case the test accept the hypothesis of iid observations, as expected.

 >gbrand gaussian 1 -c 1 -r 100 | gbfun 'x1-0.02*x0' | gbtest -p -v 2 TR-RT
# loaded 100x1 data table
# Rank Test of Randomness (1 sample)
#  Observed increasing couples         1784
#  Expected mean                       2475
#  Expected std. dev.                  167.891
#  Standardized score                  -4.11576
#  Asymptotic inference:
#   One-sided P-value: Pr{stat<obs.}      1.92955e-05
#   One-sided P-value: Pr{stat>obs.}      0.999981
#   Two-sided P-value: Pr{|stat|>|obs.|}  3.8591e-05
#statistic	p-score
 1.784000e+03  3.859101e-05

In this case the test reject the hypothesis of absence of any trend with a p-score of \(3.8\, 10^{-5}\) and the hypothesis of the lack of a decreasing trend with a p-score of \(1.9\, 10^{-5}\).

Measure of correlation

Pearson correlation (R)

The program computes the statistics

\[ R= \frac{\sum_{i=1}^N (x_i-\bar{x}) (y_i-\bar{y})}{(\sqrt{\sum_{i=1}^N (x_i-\bar{x})^2})(\sqrt{\sum_{i=1}^N (y_i-\bar{x})^2})} \]

where \(N\) is the size of the samples, \(x_i\) and \(y_i\) are the sample data, \(\bar{x}\) an \(\bar{y}\) their respective mean. Notice that the two samples must have the same size.

The provided p-score is of the null hypothesis of zero correlation, and is computed from the Fisher's transformation assuming that the quantity

\[ \frac{\sqrt{N-3}}{2} \log \frac{1+R}{1-R} \]

is normally distributed with unit variance. This is exact if the original variables are normal and uncorrelated, but it is known to be approximately valid also for non normal variables as long as the sample size is large enough and the variance of the variables exists.

As an example, I generate two samples of 100 observations (two columns of 100 rows each) independently drawn from a normal distribution of unit variance saving them in a file

>gbrand gaussian 1 -c 2 -r 100 > data.txt

and then I test their correlation, asking for the associated p-score

>gbtest -p R < data.txt
1.499019e-02  8.826211e-01

If I'm interested in the one-sided inference I can obtain it using option -v

>gbtest -v 2 p R < data.txt
loaded 100x2 data table
# Pearson's R 2-sample correlation
#  Size of the sample                  100
#  Observed Pearson's R                        0.0149902
#  Asymptotic inference:
#   Negative correlation P-value: Pr{stat<R}   0.558689
#   Positive correlation P-value: Pr{stat>R}   0.441311
#   Any correlation P-value:  Pr{|stat|>|R|}   0.882621
1.499019e-02  8.826211e-01

Difference in mean

t-test for paired samples (TP)

The Student's t-test for paired samples assume two samples of equal size with paired observations. The pairing will eliminate any fixed effect of a specific couple, like testing some characteristic of the subject before and after a given treatment. The statistics is simply the ratio between the mean and the standard error of the difference of the pair's values, formally

\[ T = \sqrt{N (N-1)} \, \frac{\bar{x}-\bar{y}}{\sum_i (x_i-y_i-\bar{x}+\bar{y})^2} \]

where \(N\) is the size of the samples, \(x_i\) and \(y_i\) are the sample data, \(\bar{x}\) an \(\bar{y}\) their respective mean. Notice that the two samples must have the same size.

The provided p-score is of the null hypothesis of equal means in the two samples. It is based on the consideration than if the mean of the two samples is equal, than the \(T\) statistics is asymptotically distributed according to a Student's t distribution with \(N-1\) degrees of freedom.

To generate paired samples, I add an independent random variable of zero mean to the observations of the first sample

>gbrand gaussian 1 -c 2 -r 100 | gbfun 'x1','x1+x2' | gbtest -p -v 2 TP  
# loaded 100x2 data table
# Student's T test; paired observations
#  Observed T                          -0.726799
#  Degrees of freedom                  99
#  Asymptotic inference:
#   One-sided P-value: Pr{stat>T}      0.765468
#   One-sided P-value: Pr{stat<T}      0.234532
#   Two-sided P-value: Pr{|stat|>|T|}  0.469064
-7.267989e-01  4.690642e-01

As expected, the test report no difference in the means. When the observations are normally distributed, the test is quite powerful. With 100 observations it is possible to detect deviation equal to the 10% of the original standard deviation

>gbrand gaussian 1 -c 2 -r 100 | gbfun 'x1','x1+x2+0.1' | gbtest -p -v 2 TP
# loaded 100x2 data table
# Student's T test; paired observations
#  Observed T                          -1.68939
#  Degrees of freedom                  99
#  Asymptotic inference:
#   One-sided P-value: Pr{stat>T}      0.952854
#   One-sided P-value: Pr{stat<T}      0.0471459
#   Two-sided P-value: Pr{|stat|>|T|}  0.0942917
-1.689390e+00  9.429174e-02

Don't expect these performances when the observations are more fat-tailed.

Author: Giulio Bottazzi

Created: 2024-02-25 dom 12:38

Validate