#lang scribble/doc
@(require scribble/manual
scribble/struct
scribblings/icons
(for-label racket
(planet williams/science/science-with-graphics)))
@title[#:tag "histograms"]{Histograms}
@local-table-of-contents[]
This chapter describes the functions for creating and using histograms provided by the Science Collection. Histograms provide a convenient way of summarizing the distribution of a set of data. A @italic{histogram} contains a vector of bins that count the number of events falling within a given range. The bins of a histogram can be used to record both integer and non-integer distributions.
The ranges can be either continuous or discrete over a range. For continuous ranges, the width of these ranges can be either fixed or arbitrary. Also, for continuous ranges, both one- and two-dimensional histograms are supported.
@;----------
@; Histograms
@;----------
@section{Histograms}
The histogram functions described in this section are defined in the @filepath{histogram.rkt} file in the Science Collection and are made available using the form:
@defmodule[(planet williams/science/histogram)]
@defproc[(histogram? (x any/c)) boolean?]{
Returns true, @racket[#t], if @racket[x] is a histogram and false, @racket[#f], otherwise.}
@subsection{Creating Histograms}
@defproc[(make-histogram (n exact-positive-integer?)) histogram?]{
Returns a new, empty histogram with @racket[n] bins and @math{n + 1} range entries. The range entries must be set with a subsequent call to @racket[set-histogram-ranges!] or @racket[set-histogram-ranges-uniform!].}
@defproc[(make-histogram-with-ranges-uniform (n exact-positive-integer?)
(x-min real?)
(x-max (>/c x-min)))
histogram?]{
Returns a new, empty histogram with @racket[n] bins. The @math{n + 1} range entries are initialized to provide @racket[n] uniform width bins from @racket[x-min] to @racket[x-max].}
@subsection{Updating and Accessing Histogram Elements}
@defproc[(histogram-n (h histogram?)) exact-positive-integer?]{
Returns the number of bins in the histogram @racket[h].}
@defproc[{histogram-ranges (h histogram?)} (vectorof real?)]{
Returns a vector of ranges for the histogram @racket[h]. The length of the vector is equal to the number of bins in @racket[h] plus one.}
@defproc[(set-histogram-ranges! (h histogram?)
(ranges (and/c (vectorof real?)
(lambda (x)
(= (vector-length ranges)
(+ (histogram-n h) 1))))))
void?]{
Sets the ranges for the histogram @racket[h] according to the given @racket[ranges]. The length of the @racket[ranges] vector must equal the number of bins in @racket[h] plus one. The bins in @racket[h] are also reset.}
@defproc[(set-histogram-ranges-uniform! (h histogram?)
(x-min real?)
(x-max (>/c x-min)))
void?]{
Sets the ranges for the histogram @racket[h] uniformly from @racket[x-min] to @racket[x-max]. The bins in @racket[h] are also reset.}
@defproc[(histogram-bins (h histogram?)) (vectorof real?)]{
Returns the vector of bins for the histogram @racket[h].}
@defproc*[(((histogram-increment! (h histogram?) (x real?)) void?)
((unchecked-histogram-increment! (h histogram?) (x real?)) void?))]{
Increments the bin in the histogram @racket[h] containing @racket[x]. The bin value is incremented by one.}
@defproc*[(((histogram-accumulate! (h histogram?) (x real?) (weight (>=/c 0.0))) void?)
((unchecked-histogram-accumulate! (h histogram?) (x real?) (weight (>=/c 0.0))) void?))]{
Increments the bin in the histogram @racket[h] containing @racket[x] by the specified @racket[weight]. Since in this implementation bin values are non-megative, the @racket[weight] must be non-negative.}
@defproc[(histogram-get (h histogram?)
(i (and/c exact-nonnegative-integer?
(=/c 0.0)]{
Returns the contents of the @racket[i]@superscript{th} bin of the histogram @racket[h].}
@defproc[(histogram-get-range (h histogram?)
(i (and/c exact-non-negative-integer?
(=/c 0.0)]{
Returns the maximum bin value for the histogram @racket[h]. Since in this implementation bin values are non-negative, the maximum value is also non-negative.}
@defproc[(histogram-min (h histogram?)) (>=/c 0.0)]{
Returns the minimum bin value for the histogram @racket[h]. Since in this implementation bin values are non-negative, the minimum value is also non-negative.}
@defproc[(histogram-sum (h histogram?)) (>=/c 0.0)]{
Returns the sum of the data in the histogram @racket[h]. Since in this implementation bin values are non-negative, the sum is also non-negative.}
@defproc[(histogram-mean (h histogram?)) (>=/c 0.0)]{
Returns the mean of the data in the histogram @racket[h]. Since in this implementation bin values are non-negative, the mean is also non-negative.}
@defproc[(histogram-sigma (h histogram?)) (>=/c 0.0)]{
Returns the standard deviation of the data in the histogram @racket[h].}
@subsection{Histogram Graphics}
The histogram graphics functions are defined in the file @filepath{histogram-graphics.rkt} in the Science Collection and are made available using the following form:
@defmodule[(planet williams/science/histogram-graphics)]
@defproc[(histogram-plot (h histogram?) (title string? "Histogram")) any]{
This function returns a plot of the histogram @racket[h] with the specified @racket[title]. If @racket[title] is not specified, @racket["Histogram"] is used. The plot is scaled to the maximum bin value. The plot is produced by the histogram plotting extension to the plot collection provided with Racket.}
@defproc[(histogram-plot-scaled (h histogram?) (title string? "Histogram")) any]{
This function returns a plot of the histogram @racket[h] with the specified @racket[title]. If @racket[title] is not specified, @racket["Histogram"] is used. The plot is scaled to the sum of the bin values. It is most useful for a small number of bins---generally, twn or less. The plot is produced by the histogram plotting extension to the plot collection provided with Racket.}
@subsection{Histogram Examples}
@bold{Example:} Histogram of random variates from the unit Gaussian (normal) distribution.
@racketmod[
racket
(require (planet williams/science/random-distributions/gaussian)
(planet williams/science/histogram-with-graphics))
(let ((h (make-histogram-with-ranges-uniform 40 -3.0 3.0)))
(for ((i (in-range 10000)))
(histogram-increment! h (random-unit-gaussian)))
(histogram-plot h "Histogram of the Unit Gaussian (Normal) Distribution"))]
The following figure shows the resulting histogram:
@image["scribblings/images/unit-gaussian-hist.png"]
@bold{Example:} Scaled histogram of random variates from the exponential distribution with mean 1.0.
@racketmod[
racket
(require (planet williams/science/random-distributions/exponential)
(planet williams/science/histogram-with-graphics))
(let ((h (make-histogram-with-ranges-uniform 10 0.0 8.0)))
(for ((i (in-range 10000)))
(histogram-increment! h (random-exponential 1.0)))
(histogram-plot-scaled h "Histogram of the Exponential Distribution"))]
The following figure shows the resulting histogram:
@image["scribblings/images/histogram-plot-scaled.png"]
@;----------
@; 2D Histograms
@;----------
@section{2D Histograms}
The 2D histogram functions described in this section are defined in the @filepath{histogram-2d.rkt} file in the Science Collection and are made available using the form:
@defmodule[(planet williams/science/histogram-2d)]
@defproc[(histogram-2d? (x any/c)) boolean?]{
Returns true, @racket[#t], if @racket[x] is a 2D histogram and false, @racket[#f], otherwise.}
@subsection{Creating 2D Histograms}
@defproc[(make-histogram-2d (nx exact-positive-integer?)
(ny exact-positive-integer?))
histogram-2d?]{
Returns a new, empty 2D histogram with @racket[nx] bins in the x direction and @racket[ny] bins in the y direction and @math{nx + 1} range entries in the x direction and @math{ny + 1} range entries in the y direction. The range entries must be set with a subsequent call to @racket[set-histogram-2d-ranges!] or @racket[set-histogram-2d-ranges-uniform!].}
@defproc[(make-histogram-2d-with-ranges-uniform (nx exact-positive-integer?)
(xy exact-positive-integer?)
(x-min real?)
(x-max (>/c x-min))
(y-min real?)
(y-max (>/c y-min)))
histogram-2d?]{
Returns a new, empty 2D histogram with @racket[nx] bins in the x direction and @racket[ny] bins in the y direction. The @math{nx + 1} range entries in the x direction are initialized to provide @racket[nx] uniform width bins from @racket[x-min] to @racket[x-max]. The @math{ny + 1} range entries in the y direction are initialized to provide @racket[ny] uniform width bins from @racket[y-min] to @racket[y-max].}
@subsection{Updating and Accessing 2D Histogram Elements}
@defproc[(histogram-2d-nx (h histogram-2d?)) exact-positive-integer?]{
Returns the number of bins in the x direction in the 2D histogram @racket[h].}
@defproc[(histogram-2d-ny (h histogram-2d?)) exact-positive-integer?]{
Returns the number of bins in the y direction in the 2D histogram @racket[h].}
@defproc[{histogram-2d-x-ranges (h histogram-2d?)} (vectorof real?)]{
Returns a vector of ranges in the x direction for the 2D histogram @racket[h]. The length of the vector is equal to the number of bins in the x direction in @racket[h] plus one.}
@defproc[{histogram-2d-y-ranges (h histogram-2d?)} (vectorof real?)]{
Returns a vector of ranges in the y direction for the 2D histogram @racket[h]. The length of the vector is equal to the number of bins in the y direction in @racket[h] plus one.}
@defproc[(set-histogram-2d-ranges! (h histogram?)
(x-ranges (and/c (vectorof real?)
(lambda (x)
(= (vector-length x-ranges)
(+ (histogram-2d-nx h) 1)))))
(y-ranges (and/c (vectorof real?)
(lambda (x)
(= (vector-length y-ranges)
(+ (histogram-2d-ny h) 1))))))
void?]{
Sets the ranges for the 2D histogram @racket[h] according to the given @racket[x-ranges] and @racket[y-ranges]. The length of the @racket[x-ranges] vector must equal the number of bins in the x direction in @racket[h] plus one. The length of the @racket[y-ranges] vector must equal the number of bins in the y direction in @racket[h] plus one. The bins in @racket[h] are also reset.}
@defproc[(set-histogram-2d-ranges-uniform! (h histogram?)
(x-min real?)
(x-max (>/c x-min))
(y-min real?)
(y-max (>/c y-min)))
void?]{
Sets the ranges for the 2D histogram @racket[h] uniformly from @racket[x-min] to @racket[x-max] in the x direction and uniformly from @racket[y-min] to @racket[y-max] in the y direction. The bins in @racket[h] are also reset.}
@defproc[(histogram-2d-bins (h histogram?)) (vectorof real?)]{
Returns the vector of bins for the 2D histogram @racket[h]. The length of the vector is @math{nx * ny}, where @math{nx} is the number of bins in the x direction in @racket[h] and @math{ny} is the number of bins in the y direction in @racket[h]. The @math{(i, j)}@superscript{th} index is computed as @math{(i * ny) + j}.}
@defproc*[(((histogram-2d-increment! (h histogram?) (x real?) (y real?)) void?)
((unchecked-histogram-2d-increment! (h histogram?) (x real?) (y real?)) void?))]{
Increments the bin in the 2D histogram @racket[h] containing (@racket[x], @racket[y]). The bin value is incremented by one.}
@defproc*[(((histogram-2d-accumulate! (h histogram?) (x real?) (y real?) (weight (>=/c 0.0))) void?)
((unchecked-histogram-2d-accumulate! (h histogram?) (x real?) (y real?) (weight (>=/c 0.0))) void?))]{
Increments the bin in the 2D histogram @racket[h] containing (@racket[x], @racket[y]) by the specified @racket[weight]. Since in this implementation bin values are non-megative, the @racket[weight] must be non-negative.}
@defproc[(histogram-2d-get (h histogram?)
(i (and/c exact-nonnegative-integer?
(=/c 0.0)]{
Returns the contents of the (@racket[i], @racket[j])@superscript{th} bin of the histogram @racket[h].}
@defproc[(histogram-2d-get-x-range (h histogram?)
(i (and/c exact-non-negative-integer?
(=/c 0.0)]{
Returns the maximum bin value for the 2D histogram @racket[h]. Since in this implementation bin values are non-negative, the maximum value is also non-negative.}
@defproc[(histogram-2d-min (h histogram-2d?)) (>=/c 0.0)]{
Returns the minimum bin value for the histogram @racket[h]. Since in this implementation bin values are non-negative, the minimum value is also non-negative.}
@defproc[(histogram-2d-sum (h histogram-2d?)) (>=/c 0.0)]{
Returns the sum of the data in the 2D histogram @racket[h]. Since in this implementation bin values are non-negative, the sum is also non-negative.}
@defproc[(histogram-2d-x-mean (h histogram-2d?)) (>=/c 0.0)]{
Returns the mean of the data in the x direction in the 2D histogram @racket[h]. Since in this implementation bin values are non-negative, the mean is also non-negative.}
@defproc[(histogram-2d-y-mean (h histogram-2d?)) (>=/c 0.0)]{
Returns the mean of the data in the y direction in the 2D histogram @racket[h]. Since in this implementation bin values are non-negative, the mean is also non-negative.}
@defproc[(histogram-2d-x-sigma (h histogram-2d?)) (>=/c 0.0)]{
Returns the standard deviation of the data in the x direction in the 2D histogram @racket[h].}
@defproc[(histogram-2d-y-sigma (h histogram-2d?)) (>=/c 0.0)]{
Returns the standard deviation of the data in the y direction in the 2D histogram @racket[h].}
@defproc[(histogram-2d-covariance (h histogram-2d?)) (>=/c 0.0)]{
Returns the covariance of the data in the 2D histogram @racket[h].}
@subsection{2D Histogram Graphics}
The 2D histogram graphics functions are defined in the file @filepath{histogram-2d-graphics.rkt} in the Science Collection and are made available using the following form:
@defmodule[(planet williams/science/histogram-2d-graphics)]
@defproc[(histogram-2d-plot (h histogram-2d?) (title string? "Histogram")) any]{
This function returns a plot of the 2D histogram @racket[h] with the specified @racket[title]. If @racket[title] is not specified, @racket["Histogram"] is used. The plot is scaled to the maximum bin value. The plot is produced by the histogram plotting extension to the plot collection provided with Racket.}
@subsection{2D Histogram Examples}
@bold{Example:} 2D histogram of random variates from the bivariate Gaussian distribution with standard deviation 1.0 in both the @math{x} and @math{y} direction and correlation coefficient 0.0.
@racketmod[
racket
(require (planet williams/science/random-distributions/bivariate)
(planet williams/science/histogram-2d-with-graphics))
(let ((h (make-histogram-2d-with-ranges-uniform
20 20 -3.0 3.0 -3.0 3.0)))
(for ((i (in-range 10000)))
(let-values (((x y) (random-bivariate-gaussian 1.0 1.0 0.0)))
(histogram-2d-increment! h x y)))
(histogram-2d-plot h "Histogram of the Bivariate Gaussian Distribution"))]
The following figure shows the resulting histogram:
@image["scribblings/images/bivariate-gaussian-hist.png"]
@;----------
@; Discrete Histograms
@;----------
@section{Discrete Histograms}
The discrete histogram functions described in this section are defined in the @filepath{discrete-histogram.rkt} file in the Science Collection and are made available using the form:
@defmodule[(planet williams/science/discrete-histogram)]
@defproc[(discrete-histogram? (x any/c)) boolean?]{
Returns true, @racket[#t], if @racket[x] is a discrete histogram and false, @racket[#f], otherwise.}
@subsection{Creating Discrete Histograms}
@defproc*[(((make-discrete-histogram (n1 integer?)
(n2 (and/c integer? (>=/c n1)))
(dynamic? boolean #t))
discrete-histogram?)
((make-discrete-histogram) discrete-histogram?))]{
Returns a new, empty discrete histogram with range @racket[n1] to @racket[n2]. If @racket[dynamic?] is #t or @racket[make-discrete-histogram] is called with no arguments, the resulting discrete histogram will grow dynamically to accomodate subsequent data points.}
@subsection{Updating and Accessing Discrete Histogram Elements}
@defproc[(discrete-histogram-n1 (h discrete-histogram?)) integer?]{
Returns the lower range of the discrete histogram @racket[h].}
@defproc[(discrete-histogram-n2 (h discrete-histogram?)) integer?]{
Returns the upper range of the discrete histogram @racket[h].}
@defproc[(discrete-histogram-dynamic? (h discrete-histogram?)) boolean?]{
Returns true, @racket[#t], if the discrete histogram @racket[h] is dynamic and false, @racket[#f], otherwise.}
@defproc[(discrete-histogram-bins (h discrete-histogram?)) (vectorof real?)]{
Returns the vector of bins for the discrete histogram @racket[h].}
@defproc*[(((discrete-histogram-increment! (h discrete-histogram?) (i integer?)) void?)
((unchecked-discrete-histogram-increment! (h discrete-histogram?) (i integer?)) void?))]{
Increments the bin in the discrete histogram @racket[h] containing @racket[i]. The bin value is incremented by one.}
@defproc*[(((discrete-histogram-accumulate! (h discrete-histogram?) (i integer?) (weight (>=/c 0.0))) void?)
((unchecked-discrete-histogram-accumulate! (h discrete-histogram?) (i integer?) (weight (>=/c 0.0))) void?))]{
Increments the bin in the discrete histogram @racket[h] containing @racket[i] by the specified @racket[weight]. Since in this implementation bin values are non-megative, the @racket[weight] must be non-negative.}
@defproc[(discrete-histogram-get (h discrete-histogram?)
(i (and/c integer?
(>=/c (discrete-histogram-n1 h))
(<=/c (discrete-histogram-n2 h)))))
(>=/c 0.0)]{
Returns the contents of the bin of the discrete histogram @racket[h] containing @racket[i].}
@subsection{Discrete Histogram Statistics}
@defproc[(discrete-histogram-max (h discrete-histogram?)) (>=/c 0.0)]{
Returns the maximum bin value for the discrete histogram @racket[h]. Since in this implementation bin values are non-negative, the maximum value is also non-negative.}
@defproc[(discrete-histogram-min (h discrete-histogram?)) (>=/c 0.0)]{
Returns the minimum bin value for the discrete histogram @racket[h]. Since in this implementation bin values are non-negative, the minimum value is also non-negative.}
@defproc[(discrete-histogram-sum (h discrete-histogram?)) (>=/c 0.0)]{
Returns the sum of the data in the discrete histogram @racket[h]. Since in this implementation bin values are non-negative, the sum is also non-negative.}
@defproc[(discrete-histogram-mean (h discrete-histogram?)) (>=/c 0.0)]{
Returns the mean of the data in the discrete histogram @racket[h]. Since in this implementation bin values are non-negative, the mean is also non-negative.}
@defproc[(discrete-histogram-sigma (h discrete-histogram?)) (>=/c 0.0)]{
Returns the standard deviation of the data in the discrete histogram @racket[h].}
@subsection{Discrete Histogram Graphics}
The discrete histogram graphics functions are defined in the file @filepath{discrete-histogram-graphics.rkt} in the Science Collection and are made available using the following form:
@defmodule[(planet williams/science/discrete-histogram-graphics)]
@defproc[(discrete-histogram-plot (h discrete-histogram?) (title string? "Histogram")) any]{
This function returns a plot of the discrete histogram @racket[h] with the specified @racket[title]. If @racket[title] is not specified, @racket["Histogram"] is used. The plot is scaled to the maximum bin value. The plot is produced by the histogram plotting extension to the plot collection provided with Racket.}
@defproc[(discrete-histogram-plot-scaled (h discrete-histogram?) (title string? "Histogram")) any]{
This function returns a plot of the discrete histogram @racket[h] with the specified @racket[title]. If @racket[title] is not specified, @racket["Histogram"] is used. The plot is scaled to the sum of the bin values. It is most useful for a small number of bins---generally, twn or less. The plot is produced by the histogram plotting extension to the plot collection provided with Racket.}
@subsection{Discrete Histogram Examples}
@bold{Example:} Discrete histogram of random variates from the Poisson distribution with mean 10.0.
@racketmod[
racket
(require (planet williams/science/random-distributions/poisson)
(planet williams/science/discrete-histogram-with-graphics))
(let ((h (make-discrete-histogram)))
(for ((i (in-range 10000)))
(discrete-histogram-increment! h (random-poisson 10.0)))
(histogram-plot h "Histogram of the Poisson Distribution"))]
The following figure shows the resulting histogram:
@image["scribblings/images/poisson-hist.png"]
@bold{Example:} Scaled discrete histogram of random variates from the logarithmic distribution with probability 0.5.
@racketmod[
racket
(require (planet williams/science/random-distributions/logarithmic)
(planet williams/science/discrete-histogram-with-graphics))
(let ((h (make-discrete-histogram)))
(for ((i (in-range 10000)))
(discrete-histogram-increment! h (random-logarithmic 0.5)))
(histogram-plot h "Histogram of the Logarithmic Distribution"))]
The following figure shows the resulting histogram:
@image["scribblings/images/discrete-histogram-plot-scaled.png"]