# overview of tests in gbutils

## Table of Contents

## Intro

Many tests are implemented in the `gbtest`

utility. In the examples
below we will use the utility `gbrun`

to generate random samples for
the tests and the utility `gbfun`

to manipulate the data when
necessary. Before starting let us review the command's option, whose
list can be obtained passing `-h`

to the command line or with ```
man
gbtest
```

Options: -p compute associated significance -s input data are already sorted in ascending order -F specify the input fields separators (default " \t") -v verbosity: 0 none, 1 headings, 2+ description (default 0) -h this help

In general, the option `-p`

- prints a two-sided p-score while the
one-sided scores are provided as further information using higher
verbosity level `-v 2`

(see the examples below). Some statistics
require the data to be sorted and the program perform the sorting
internally. If the data are already sorted, using the option `-s`

will save some computation time.

There are three types of tests implemented: those based on a single
sample, those based on paired observations and those applicable to
multiple samples. This is the list returned by option `-h`

name | type | description |
---|---|---|

D+,D-,D,V | 1 samp | Kolmogorov-Smirnov tests on cumulated data |

W2,A2,U2 | 1 samp | Cramer-von Mises tests on cumulated data |

CHI2-1 | 1 samp | Chi-Sqrd, 1 samp. 2nd column: th. prob. |

WILCO | 1 samp | Wilcoxon (mode=0) |

TS | 1 samp | Student's T (mean=0) |

TR-TP | 1 samp | Test of randomness: turning points |

TR-DS | 1 samp | Test of randomness: difference sign |

TR-RT | 1 samp | Test of randomness: rank test |

R | pairs | Pearson's correlation coefficient |

RHO | pairs | Spearman's Rho rank correlation |

TAU | pairs | Kendall's Tau correlation |

CHI2-2 | pairs | Chi-Sqrd, 2 samples |

TP | pairs | Student's T with paired samples |

KS | 2 samp | Kolmogorov-Smirnov test |

T | 2 samp | Student's T with same variances |

TH | 2 samp | Student's T with different variances |

F | 2 samp | F-Test for different variances |

WMW | 2 samp | Wilcoxon-Mann-Whitney U |

FP | 2 samp | Fligner-Policello standardized U^ |

LEV-MEAN | 2+ samp | Levene equality of variances using means |

LEV-MED | 2+ samp | Levene equality of variances using medians |

KW | 3+ samp | Kruscal-Wallis test on 3+ samples |

CHI2-N | 3+ samp | Multi-columns contingency table analysis |

The tests based on a single sample (1 samp) expect a single column of data. If more columns are provided, the statistics is computed on each column separately.

The tests based on paired observations (pairs) or on two samples (2
samp), require exactly two columns of data. If more columns are
provided, in general the test is performed separately for any couple
of columns. The output is a matrix that contains the result of the
statistics in the lower triangle and, if the option `-p`

is specified,
the relative p-scores in the upper triangle. In this case, data are
passed column first. That is, the value read in the \(i\) th row and \(j\)
th column of the lower triangle is obtained considering the \(j\) th
column of data as the first sample and the \(i\) th column as the second
sample.

The tests based on more than 2 samples (2+ samp or 3+ samp) compute a single statistics using all provided columns.

In the case of paired samples, the number of observations in each column must be the same. In all other cases, this is not required and "NAN" values in the input are automatically ignored by the program.

## Kolmogorov-Smirnov type tests

These tests measure the hypothesis that a set of data
\((x_1,\ldots,x_n)\) are independently drawn from a uniform
distribution. If you want to test that a set of data
\((z_1,\ldots,z_n)\) are independently drawn for a given distribution
\(F(x)\), you have to supply the tests with the transformed data
\((F(z_1),\ldots,F(z_n))\). Before computing the statistics, the data
are sorted in increasing order.. If you already supply sorted data,
consider using the option `-s`

to spare some time. In what follows
the function \([x]\) is the integer part of \(x\) and \([x]^+\) is equal
to \(x\) if \(x>0\) and zero otherwise.

### \(D^+\) statistics (D+) and \(D^-\) statistics (D-)

Consider the maximum positive and maximum negative deviations observed in the sample from the uniform distribution

\[ D^+=\max_{i=1,\ldots,n} [x_i - i/n]^+, \quad D^-=\max_{i=1,\ldots,n} [i/n - x_i]^+. \]

Under the null, both statistics are exactly distributed according to

\[ G(x) = 1-x \sum_{j=1}^{[n (1-x)]} \binom{n}{j} \left( 1-x-j/n \right)^{n-j} \left(x+j/n \right)^{j-1}. \]

Their asymptotic distribution of \(\sqrt{n} D^+\) and \(\sqrt{n} D^-\) for large \(n\) is

\[ H(x) = 1-e^{-2x^2}. \]

>gbrand exponential 1 -c 1 -r 20 | gbfun '1-exp(-x)' | gbtest -p -v 2 D- # loaded 20x1 data table # D- Kolmogorov-Smirnov test # Observed D- 0.0599563 # Sample size 20 # Standardized score 0.268133 # Exact inference: # One-sided P-value: Pr{stat>D-} 0.823889 # One-sided P-value: Pr{stat<D-} 0.176111 # Asymptotic inference: # One-sided P-value: Pr{stat>D-} 0.866069 # One-sided P-value: Pr{stat<D-} 0.133931 #statistic p-score 5.995630e-02 8.238891e-01

In this case we generate \(20\) observations from the exponential distribution and transform them using the correct distribution function, \(F(x)=1-e^{-x}\). As expected, the test does not reject the null. Note that the asymptotic p-score is already quite good, despite the relative small sample.

>gbrand flat 0 1 -c 1 -r 20 | gbfun '1-exp(-x)' | gbtest -p -v 2 D- # loaded 20x1 data table # D- Kolmogorov-Smirnov test # Observed D- 0.367974 # Sample size 20 # Standardized score 1.64563 # Exact inference: # One-sided P-value: Pr{stat>D-} 0.003084 # One-sided P-value: Pr{stat<D-} 0.996916 # Asymptotic inference: # One-sided P-value: Pr{stat>D-} 0.00444396 # One-sided P-value: Pr{stat<D-} 0.995556 #statistic p-score 3.679745e-01 3.084003e-03

In this case, data are from a uniform distribution. The test reject the hypothesis that they are exponentially distributed with a p-score of \(3\, 10^{-3}\).

### D statistics (D)

This is a two-sided version of the previous statistics, as both positive and negative deviations are considered,

\[ D^+=\max_{i=1,\ldots,n} |x_i - i/n|. \]

The test provide both a large \(n\) asymptotic inference and a small sample corrected version.

>gbrand cauchy 2 -c 1 -r 20 | gbfun 'atan(x1/2)/pi+0.5' | gbtest -p -v 2 D # loaded 20x1 data table # D Kolmogorov-Smirnov test # Observed D 0.16291 # Sample size 20 # Asymptotic standardized score 0.728555 # Small-sample corrected score 0.752111 # Small-sample corrected inference: # One-sided P-value: Pr{stat>D} 0.623614 # One-sided P-value: Pr{stat<D} 0.376386 # Asymptotic inference: # One-sided P-value: Pr{stat>D} 0.663323 # One-sided P-value: Pr{stat<D} 0.336677 #statistic p-score 1.629098e-01 6.236143e-01

We generate \(20\) observations independently from a Cauchy distribution with parameter \(2\) and transform them with the correct cumulative distribution function \(F(x)=1/\pi\arctan x/2 + 1/2\). As expected, the \(D\) statistics does not reject the correct hypothesis.

>gbrand gaussian 1 -c 1 -r 20 | gbfun 'atan(x1/2)/pi+0.5' | gbtest -p -v 2 D # loaded 20x1 data table # D Kolmogorov-Smirnov test # Observed D 0.302732 # Sample size 20 # Asymptotic standardized score 1.35386 # Small-sample corrected score 1.39763 # Small-sample corrected inference: # One-sided P-value: Pr{stat>D} 0.0402113 # One-sided P-value: Pr{stat<D} 0.959789 # Asymptotic inference: # One-sided P-value: Pr{stat>D} 0.0511635 # One-sided P-value: Pr{stat<D} 0.948836 #statistic p-score 3.027317e-01 4.021128e-02

If the data are generated from a Normal distribution and then transformed with the Cauchy distribution, then the test recognizes it and rejects the hypothesis with a p-score of \(4\, 10^{-2}\).

### Two-sample Kolmogorov-Smirnov Test (KS)

This is a two-sample version of the previous test. The test checks if two samples \((x_1,\ldots,x_n)\) and \((y_1,\ldots,y_m)\) are generated by independently drawing from the same distribution. Note that the two set of observations need not to have the same size. Starting from the empirical distribution functions, \(\hat{F}_x\) and \(\hat{F}_y\), the relevant statistics is defined as

\[ D = \sup_z | \hat{F}_x(z)- \hat{F}_y(z)|. \]

The test reports a one-sided asymptotic p-score applying the asymptotic inference of the previous test with an effective size sample equal to \(\sqrt{nm/(n+m)}\).

>gbrand gaussian 1 -c 2 -r 20 | gbtest -p -v 2 KS # loaded 20x2 data table # Kolmogorov-Smirnov Test; 2-samples # Observed KS 0.3 # Effective Size 10 # Asymptotic inference: # One-sided P-value: Pr{stat>KS} 0.275269 #statistic p-score 3.000000e-01 2.752689e-01

The two columns of data come from the same Normal distribution. The test does not reject the hypothesis.

>paste <(gbrand -R 1 gaussian 1 -c 1 -r 20) <(gbrand -R 2 cauchy 1 -c 1 -r 30) | gbtest KS -p -v 2 # loaded 30x2 data table # Kolmogorov-Smirnov Test; 2-samples # Observed KS 0.466667 # Effective Size 12 # Asymptotic inference: # One-sided P-value: Pr{stat>KS} 0.00672794 #statistic p-score 4.666667e-01 6.727939e-03

By comparing a sample from a Normal distribution with a sample from a Cauchy distribution, the hypothesis is rejected with a p-score of \(6.7\, 10^{-3}\).

## Tests of Randomness

These tests are performed over a sequence of observations \((x_1,\ldots,x_n)\) to detect a deviation form the null hypothesis that the observations are independent and identically distributed. In what follows the function \(\theta(x)\) is equal to one if \(x>0\) and zero otherwise.

### The turning point test (TR-TP)

The following statistics counts the number of *turning points* in
the sequence

\[ T=\sum_{i=1}^{n} \theta(x_i -x_{i-1}) \theta(x_i -x_{i+1}) + \theta(x_{i-1} - x_i) \theta(x_i -x_{i-1}). \]

Under the null, the statistics is asymptotically normally distributed with mean and variance given by

\[ \bar{T}=2 (n-2)/3, \quad \sigma^2_T = (16 n -29)/90. \]

A positive and large standardized score signal that the sequence oscillates too much to be iid, while a large negative standardized score reveal the present of anomalous local serial correlations. The one side statistics is the absolute value of the standardized score and signal the presence of serial correlation or anti-correlation.

>gbrand gaussian 1 -c 1 -r 100 | gbtest -p -v 2 TR-TP # loaded 100x1 data table # Turning Points Test of Randomness (1 sample) # Observed Turning Points 63 # Expected mean 65.3333 # Expected std. dev. 4.17798 # Standardized score -0.558483 # Asymptotic inference: # One-sided P-value: Pr{stat<obs.} 0.288257 # One-sided P-value: Pr{stat>obs.} 0.711743 # Two-sided P-value: Pr{|stat|>|obs.|} 0.576515 #statistic p-score 6.300000e+01 5.765146e-01

In this case the test is unable to reject the hypothesis of iid observations. This is expected as the sample was made of \(100\) independent observations drawn from a Gaussian distribution.

>gbrand>gbrand gaussian 1 -c 1 -r 100 | gbfun 'x1+sin(x0)' | gbtest -p -v 2 TR-TP # loaded 100x1 data table # Turning Points Test of Randomness (1 sample) # Observed Turning Points 57 # Expected mean 65.3333 # Expected std. dev. 4.17798 # Standardized score -1.99458 # Asymptotic inference: # One-sided P-value: Pr{stat<obs.} 0.0230442 # One-sided P-value: Pr{stat>obs.} 0.976956 # Two-sided P-value: Pr{|stat|>|obs.|} 0.0460885 #statistic p-score 5.700000e+01 4.608848e-02

In this case a non-linear drift was added to the observations and the test reject the iid hypothesis with a p-score of \(4.6\, 10^{-2}\).

### The difference-sign test (TR-DS)

This test is specifically designed to detect trends in data. Consider the statistics

\[ S=\sum_{i=2}^{n} \theta(x_i -x_{i-1}). \]

Under the null, this statistics is asymptotically normally distributed with mean and variance given by

\[ \bar{S}=(n-1)/2, \quad \sigma^2_S = (n=1)/12. \]

A positive and large standardized score signal the presence of a positive trend in data, while a negative large standardized score is associated to negative trends. The one side statistics is the absolute value of the standardized score and signal the presence of a positive or negative trend.

>gbrand gaussian 1 -c 1 -r 100 | gbtest -p -v 2 TR-DS # loaded 100x1 data table # Difference Sign Test of Randomness (1 sample) # Observed Positive Differences 51 # Expected mean 49.5 # Expected std. dev. 2.90115 # Standardized score 0.517036 # Asymptotic inference: # One-sided P-value: Pr{stat<obs.} 0.697435 # One-sided P-value: Pr{stat>obs.} 0.302565 # Two-sided P-value: Pr{|stat|>|obs.|} 0.605131 #statistic p-score 5.100000e+01 6.051307e-01

In this case the test is unable to reject the hypothesis of iid observations. This is expected as the sample was made of \(100\) independent observations drawn from a Gaussian distribution.

>gbrand gaussian 1 -c 1 -r 100 | gbfun 'x1+.2*x0' | gbtest -p -v 2 TR-DS # loaded 100x1 data table # Difference Sign Test of Randomness (1 sample) # Observed Positive Differences 57 # Expected mean 49.5 # Expected std. dev. 2.90115 # Standardized score 2.58518 # Asymptotic inference: # One-sided P-value: Pr{stat<obs.} 0.995134 # One-sided P-value: Pr{stat>obs.} 0.00486637 # Two-sided P-value: Pr{|stat|>|obs.|} 0.00973275 #statistic p-score 5.700000e+01 9.732748e-03

By adding a positive linear trend to the data, the idd hypothesis is rejected by the test with a p-score of \(9.7\, 10^{-3}\).

### The rank test (TR-RT)

This test is particularly suited for detecting a linear trend in the data. The following statistics counts the number of ordered pairs.

\[ P=\sum_{i=1}^{n-1} \sum_{j=i+1}^n \theta(x_j -x_i), \]

If the observations are independent and identically distributed, \(P\) is asymptotically normally distributed with mean and variance given by (see Kendall and Stuart, "The Advanced Theory of Statistics", vol.3, section 45.24)

\[ \bar{P}=n (n-1)/4, \quad \sigma^2_P=n (n-1) (2n+5)/72. \]

A positive and large standardized score signal the presence of an increasing trend, while a negative and large score the presence of a decreasing trend. The one side statistics is based on the absolute value of the standardized score and signal the presence of a trend.

>gbrand gaussian 1 -c 1 -r 100 | gbtest -p -v 2 TR-RT # loaded 100x1 data table # Rank Test of Randomness (1 sample) # Observed increasing couples 2613 # Expected mean 2475 # Expected std. dev. 167.891 # Standardized score 0.82196 # Asymptotic inference: # One-sided P-value: Pr{stat<obs.} 0.79445 # One-sided P-value: Pr{stat>obs.} 0.20555 # Two-sided P-value: Pr{|stat|>|obs.|} 0.4111 #statistic p-score 2.613000e+03 4.110995e-01

In this case the test accept the hypothesis of iid observations, as expected.

>gbrand gaussian 1 -c 1 -r 100 | gbfun 'x1-0.02*x0' | gbtest -p -v 2 TR-RT # loaded 100x1 data table # Rank Test of Randomness (1 sample) # Observed increasing couples 1784 # Expected mean 2475 # Expected std. dev. 167.891 # Standardized score -4.11576 # Asymptotic inference: # One-sided P-value: Pr{stat<obs.} 1.92955e-05 # One-sided P-value: Pr{stat>obs.} 0.999981 # Two-sided P-value: Pr{|stat|>|obs.|} 3.8591e-05 #statistic p-score 1.784000e+03 3.859101e-05

In this case the test reject the hypothesis of absence of any trend with a p-score of \(3.8\, 10^{-5}\) and the hypothesis of the lack of a decreasing trend with a p-score of \(1.9\, 10^{-5}\).

## Measure of correlation

### Pearson correlation (R)

The program computes the statistics

\[ R= \frac{\sum_{i=1}^N (x_i-\bar{x}) (y_i-\bar{y})}{(\sqrt{\sum_{i=1}^N (x_i-\bar{x})^2})(\sqrt{\sum_{i=1}^N (y_i-\bar{x})^2})} \]

where \(N\) is the size of the samples, \(x_i\) and \(y_i\) are the sample data, \(\bar{x}\) an \(\bar{y}\) their respective mean. Notice that the two samples must have the same size.

The provided p-score is of the null hypothesis of zero correlation, and is computed from the Fisher's transformation assuming that the quantity

\[ \frac{\sqrt{N-3}}{2} \log \frac{1+R}{1-R} \]

is normally distributed with unit variance. This is exact if the original variables are normal and uncorrelated, but it is known to be approximately valid also for non normal variables as long as the sample size is large enough and the variance of the variables exists.

As an example, I generate two samples of 100 observations (two columns of 100 rows each) independently drawn from a normal distribution of unit variance saving them in a file

>gbrand gaussian 1 -c 2 -r 100 > data.txt

and then I test their correlation, asking for the associated p-score

>gbtest -p R < data.txt 1.499019e-02 8.826211e-01

If I'm interested in the one-sided inference I can obtain it using
option `-v`

>gbtest -v 2 p R < data.txt loaded 100x2 data table # Pearson's R 2-sample correlation # Size of the sample 100 # Observed Pearson's R 0.0149902 # Asymptotic inference: # Negative correlation P-value: Pr{stat<R} 0.558689 # Positive correlation P-value: Pr{stat>R} 0.441311 # Any correlation P-value: Pr{|stat|>|R|} 0.882621 1.499019e-02 8.826211e-01

## Difference in mean

*t*-test for paired samples (TP)

The Student's *t*-test for paired samples assume two samples of
equal size with paired observations. The pairing will eliminate any
fixed effect of a specific couple, like testing some characteristic
of the subject before and after a given treatment. The statistics
is simply the ratio between the mean and the standard error of the
difference of the pair's values, formally

\[ T = \sqrt{N (N-1)} \, \frac{\bar{x}-\bar{y}}{\sum_i (x_i-y_i-\bar{x}+\bar{y})^2} \]

where \(N\) is the size of the samples, \(x_i\) and \(y_i\) are the sample data, \(\bar{x}\) an \(\bar{y}\) their respective mean. Notice that the two samples must have the same size.

The provided p-score is of the null hypothesis of equal means in
the two samples. It is based on the consideration than if the mean
of the two samples is equal, than the \(T\) statistics is
asymptotically distributed according to a Student's *t*
distribution with \(N-1\) degrees of freedom.

To generate paired samples, I add an independent random variable of zero mean to the observations of the first sample

>gbrand gaussian 1 -c 2 -r 100 | gbfun 'x1','x1+x2' | gbtest -p -v 2 TP # loaded 100x2 data table # Student's T test; paired observations # Observed T -0.726799 # Degrees of freedom 99 # Asymptotic inference: # One-sided P-value: Pr{stat>T} 0.765468 # One-sided P-value: Pr{stat<T} 0.234532 # Two-sided P-value: Pr{|stat|>|T|} 0.469064 -7.267989e-01 4.690642e-01

As expected, the test report no difference in the means. When the observations are normally distributed, the test is quite powerful. With 100 observations it is possible to detect deviation equal to the 10% of the original standard deviation

>gbrand gaussian 1 -c 2 -r 100 | gbfun 'x1','x1+x2+0.1' | gbtest -p -v 2 TP # loaded 100x2 data table # Student's T test; paired observations # Observed T -1.68939 # Degrees of freedom 99 # Asymptotic inference: # One-sided P-value: Pr{stat>T} 0.952854 # One-sided P-value: Pr{stat<T} 0.0471459 # Two-sided P-value: Pr{|stat|>|T|} 0.0942917 -1.689390e+00 9.429174e-02

Don't expect these performances when the observations are more fat-tailed.