Statistics
This chapter describes the statistical functions provided by the PLT Scheme Science Collection. The basic statistical functions include functions to compute the mean, variance, and standard deviation. More advanced functions allow you to calculate absolute deviation, skewness, and kurtosis, as well as the median and arbitrary percentiles. The algorithms use recurrance relations to compute average quantities in a stable way, without large intermediate values that might overflow.
The functions described in this chapter are defined in the statistics.ss file in the science collection and are made available ising the following form:
(require (planet "statistics.ss" ("williams" "science.plt")))
8.1 Mean, Standard Deviation, and Variance
Function:
(mean data) 
Contract: (> (vectorof real?) real?) 

Function:
(variance data mu) (variance data) 
Contract: (case> (> (vectorof real?) real? (>=/c 0.0)) (> (vectorof real?) (>=/c 0.0))) 

Function:
(standarddeviation data mu) (standarddeviation data) 
Contract: (case> (> (vectorof real?) real? (>=/c 0.0)) (> (vectorof real?) (>=/c 0.0))) 

Function:
(variancewithfixedmean data mu) 
Contract: (> (vectorof real?) real? (>=/c 0.0)) 

standarddeviationwithfixedmean
Function:
(standarddeviationwithfixedmean data mu) 
Contract: (> (vectorof real?) real> (>=/c 0.0)) 

8.2 Absolute Deviation
Function:
(absolutedeviation data mu) (absolutedeviation data) 
Contract: (case> (> (vectorof real?) real? (>=/c 0.0)) (> (vectorof real?) (>=/c 0.0))) 

8.3 Higher Moments (Skewness and Kurtosis)
Function:
(skew data mu sigma) (skew data) 
Contract: (case> (> (vectorof real?) real? (>=/c 0.0) real?) (> (vectorof real?) real?))) 

Function:
(kurtosis data mu sigma) (kurtosis data) 
Contract: (case> (> (vectorof real?) real? (>=/c 0.0) real?) (> (vectorof real?) real?))) 

8.4 Autocorrelation
Function:
(lag1autocorrelation data mu) (lag1autocorrelation data) 
Contract: (case> (> nonemptyvectorofreals real? real?) (> nonemptyvectorofreals real?)) 

8.5 Covariance
Function:
(covariance data1 data 2 mu1 mu2) (covariance data1 data2) 
Contract: (case> (>r ((data1 (vectorof real?)) (data2 (and/c (vectorof real?) (lambda (x) (= (vectorlength data1) (vectorlength data2))))) (mu1 real?) (mu2 real?)) real?) (>r ((data1 (vectorof real?)) (data2 (and/c (vectorof real?) (lambda (x) (= (vectorlength data1) (vectorlength data2))))) real?)) 

Function:
(covariancewithfixedmeans data1 data2) 
Contract: (>r ((data1 (vectorof real?)) (data2 (and/c (vectorof real?) (lambda (x) (= (vectorlength data1) (vectorlength data2))))) (mu1 real?) (mu2 real?)) real?) 

8.6 Weighted Samples
Function:
(weightedmean w data) 
Contract: (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data)))))) real?) 

Function:
(weightedvariance w data wmu) (weightedvariance w data) 
Contract: (case> (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (mu real?) (>=/c 0.0)) (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (>=/c 0.0))) 

Function:
(weightedstandarddeviation w data wmu) (weightedstandarddeviation w data) 
Contract: (case> (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (mu real?) (>=/c 0.0)) (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (>=/c 0.0))) 

weightedvariancewithfixedmean
Function:
(weightedvariancewithfixedmean w data wmu) 
Contract: (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (mu real?) (>=/c 0.0)) 

weightedstandarddeviationwithfixedmean
Function:
(weightedstandarddeviationwithfixedmean w data wmu) 
Contract: (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (mu real?) (>=/c 0.0)) 

Function:
(weightedabsolutedeviation w data wmu) (weightedabsolutedeviation w data) 
Contract: (case> (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (mu real?) (>=/c 0.0)) (>r ((w (vectorof real?)) (data (and/c (vectorof real?)+ (lambda (x) (= (vectorlength w) (vectorlength data))))) (>=/c 0.0))) 

Function:
(weightedskew w data wmu wsigma) (weightedskew w data) 
Contract: (case> (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (mu real?) (sigma (>=/c 0.0))) real?) (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) real?)) 

Function:
(weightedkurtosis w data wmu wsigma) (weightedkurtosis w data) 
Contract: (case> (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) (mu real?) (sigma (>=/c 0.0))) real?) (>r ((w (vectorof real?)) (data (and/c (vectorof real?) (lambda (x) (= (vectorlength w) (vectorlength data))))) real?)) 

8.7 Maximum and Minimum
Function:
(maximum data) 
Contract: (> nonemptyvectorofreals? real?) 

Function:
(minimun data) 
Contract: (> nonemptyvectorofreals? real?) 

Function:
(minimunmaximum data) 
Contract: (> nonemptyvectorofreals? (values real? real?)) 

Function:
(maximumindex data) 
Contract: (> nonemptyvectorofreals? naturalnumber?) 

Function:
(minimunindex data) 
Contract: (> nonemptyvectorofreals? naturalnumber?) 

Function:
(minimunmaximumindex data) 
Contract: (> nonemptyvectorofreals? (values naturalnumber? number?)) 

8.8 Median and Percentiles
The median and percentile functions described in this section operate on sorted data. The contracts for these functions enforce this. Also, for convenience we use quantiles measured on a scale of 0 to 1 instead of percentiles (which use a scale of 0 to 100).
Function:
(medianfromsorteddata sorteddata) 
Contract: (> (and/c nonemptyvectorofreals? sorted?) real?) 

Function:
(qualtilefromsorteddata sorteddata f) 
Contract: (> (and/c nonemptyvectorofreals? sorted?) (realin 0.0 1.0) real?) 

The quantile is found by interpolation using the formula
where i is floor((n  1)f) and delta is (n  1)f  1.
8.9 Example
This example generates two vectors of data from a unit Gaussian distribution and a vector of cosine squared weighting data. All of the vectors are of length 1,000. These data are used to test all of the statistics functions.
(require (planet "gaussian.ss" ("williams" "science.plt") "randomdistributions")) (require (planet "statistics.ss" ("williams" "science.plt"))) (require (planet "math.ss" ("williams" "science.plt"))) (define (naivesort! data) (let loop () (let ((n (vectorlength data)) (sorted? #t)) (do ((i 1 (+ i 1))) ((= i n) data) (if (< (vectorref data i) (vectorref data ( i 1))) (let ((t (vectorref data i))) (vectorset! data i (vectorref data ( i 1))) (vectorset! data ( i 1) t) (set! sorted? #f)))) (if (not sorted?) (loop))))) (let ((data1 (makevector 1000)) (data2 (makevector 1000)) (w (makevector 1000))) (do ((i 0 (+ i 1))) ((= i 1000) (void)) ;; Random data from unit gaussian (vectorset! data1 i (randomunitgaussian)) (vectorset! data2 i (randomunitgaussian)) ;; Cos^2 weighting (vectorset! w i (expt (cos ( (* 2.0 pi (/ i 1000.0)) pi)) 2))) (printf "Statistics Example~n") (printf " mean = ~a~n" (mean data1)) (printf " variance = ~a~n" (variance data1)) (printf " standard deviation = ~a~n" (standarddeviation data1)) (printf " variance from 0.0 = ~a~n" (variancewithfixedmean data1 0.0)) (printf " standard deviation from 0.0 = ~a~n" (standarddeviationwithfixedmean data1 0.0)) (printf " absolute deviation = ~a~n" (absolutedeviation data1)) (printf " absolute deviation from 0.0 = ~a~n" (absolutedeviation data1 0.0)) (printf " skew = ~a~n" (skew data1)) (printf " kurtosis = ~a~n" (kurtosis data1)) (printf " lag1 autocorrelation = ~a~n" (lag1autocorrelation data1)) (printf " covariance = ~a~n" (covariance data1 data2)) (printf " weighted mean = ~a~n" (weightedmean w data1)) (printf " weighted variance = ~a~n" (weightedvariance w data1)) (printf " weighted standard deviation = ~a~n" (weightedstandarddeviation w data1)) (printf " weighted variance from 0.0 = ~a~n" (weightedvariancewithfixedmean w data1 0.0)) (printf "weighted standard deviation from 0.0 = ~a~n" (weightedstandarddeviationwithfixedmean w data1 0.0)) (printf " weighted absolute deviation = ~a~n" (weightedabsolutedeviation w data1)) (printf "weighted absolute deviation from 0.0 = ~a~n" (weightedabsolutedeviation w data1 0.0)) (printf " weighted skew = ~a~n" (weightedskew w data1)) (printf " weighted kurtosis = ~a~n" (weightedkurtosis w data1)) (printf " maximum = ~a~n" (maximum data1)) (printf " minimum = ~a~n" (minimum data1)) (printf " index of maximum value = ~a~n" (maximumindex data1)) (printf " index of minimum value = ~a~n" (minimumindex data1)) (naivesort! data1) (printf " median = ~a~n" (medianfromsorteddata data1)) (printf " 10% quantile = ~a~n" (quantilefromsorteddata data1 .1)) (printf " 20% quantile = ~a~n" (quantilefromsorteddata data1 .2)) (printf " 30% quantile = ~a~n" (quantilefromsorteddata data1 .3)) (printf " 40% quantile = ~a~n" (quantilefromsorteddata data1 .4)) (printf " 50% quantile = ~a~n" (quantilefromsorteddata data1 .5)) (printf " 60% quantile = ~a~n" (quantilefromsorteddata data1 .6)) (printf " 70% quantile = ~a~n" (quantilefromsorteddata data1 .7)) (printf " 80% quantile = ~a~n" (quantilefromsorteddata data1 .8)) (printf " 90% quantile = ~a~n" (quantilefromsorteddata data1 .9)) )
Produces the following output:
Statistics Example mean = 0.03457693091555611 variance = 1.0285343857083422 standard deviation = 1.0141668431320077 variance from 0.0 = 1.028701415474174 standard deviation from 0.0 = 1.014249188056946 absolute deviation = 0.7987180852601665 absolute deviation from 0.0 = 0.7987898146946209 skew = 0.043402934671178436 kurtosis = 0.17722452271704014 lag1 autocorrelation = 0.0029930889831972143 covariance = 0.005782911085590894 weighted mean = 0.05096139259270008 weighted variance = 1.0500293763787367 weighted standard deviation = 1.0247094107007786 weighted variance from 0.0 = 1.0510513958491579 weighted standard deviation from 0.0 = 1.0252079768755011 weighted absolute deviation = 0.8054378524718832 weighted absolute deviation from 0.0 = 0.8052440544958938 weighted skew = 0.046448729539282155 weighted kurtosis = 0.3050060704791675 maximum = 3.731148814104969 minimum = 3.327265864298485 index of maximum value = 502 index of minimum value = 476 median = 0.019281803306206644 10% quantile = 1.243869878615807 20% quantile = 0.7816243947573505 30% quantile = 0.4708703241429585 40% quantile = 0.2299309332835332 50% quantile = 0.019281803306206644 60% quantile = 0.30022966479982344 70% quantile = 0.5317978807508836 80% quantile = 0.832291888537874 90% quantile = 1.3061151234700463