#lang scribble/doc
@(require
scribble/manual
scribble/struct
scribblings/icons
(for-label racket/base
racket/contract
racket/mpair
plot
(planet williams/simulation/simulation-with-graphics)))
@title[#:tag "data-collection"]{Data Collection}
@local-table-of-contents[]
The purpose of most simulation models is to collect data to analyze to gain insights into the system being simulated. In the simulation collection, (numeric) data that is subject to automatic data collection is stored in variable structure instances (i.e. @deftech{variables}).
Data for a variable may either be collected in a time-dependent manner, specified using the @scheme[accumulate] macro, or in a time-independent manner, specified using the @scheme[tally] macro.
Currently, both statistical data and history data may be automatically collected for a variable. (Both may in turn be either time-dependent or time-independent.) History data allows more more sophisticated analysis to be performed on the data using other analysis tool (e.g. the statistics routines in the science collection). Also, a function to plot history data is provided.
@section{Variables}
A @italic{variable} represents a numeric variable in the model for which data can automatically be collected as specified by the model developer.
@subsection{The @scheme[variable] Structure}
@defstruct*[variable ((initial-value (or/c 'uninitialized real?))
(value (or/c 'uninitialized real?))
(time-last-synchronized (>=/c 0.0))
(statistics (or/c statistics? false/c))
(history (or/c history? false/c))
(continuous boolean?)
(state-index (or/c -1 exact-nonegative-integer?))
(get-monitors list?)
(set-monitors list?))
#:mutable]{
Instances of the @scheme[variable] structure represent variable in the simulation model. The @scheme[variable] structure has the following fields:
@itemize{
@item{@schemefont{initial-value}---The initial value of the variable. This is not currently being used, but may be used in the future to reset variables.}
@item{@schemefont{value}---The current value of the variable.}
@item{@schemefont{time-last-synchronized}---The time the variable was last synchronized. This is used internally to implement time-dependent data collectors.}
@item{@schemefont{statistics}---The statistics data collector for the variable or @scheme[#f].}
@item{@schemefont{history}---The history data collector for the variable or @scheme[#f].}
@item{@schemefont{continuous?}---True, @scheme[#t], if the variable is a continuous variable or false, @scheme[#f], otherwise.}
@item{@schemefont{state-index}---The index for the variable in the state vector or @math{-1} if the variable is not a continuous variable or is not currently allocated to the state vector (i.e., the process owning the continuous variable is not currently in a @scheme[work/continuously]). (See Chapter 10, Continuous Simulation Models)}
@item{@schemefont{get-monitors}---A list of get monitors for the variable.}
@item{@schemefont{set-monitors}---A list of set monitors for the variable.}
}
}
@defproc[(make-variable (initial-value (or/c (symbols uninitialized) real?) 'uninitialized))
variable?]{
Returns a newly created variable with the specified @scheme[initial-value]. If @scheme[initial-value] is not provided, @scheme['uninitialized] is used indicating that the variable has no value.
By default, all variables accumulate statistics on their values. To turn this off, set the @schemefont{statistics} field to @scheme[#f] using @scheme[(set-variable-statistics! _variable #f)].}
@defproc[(variable-initialized? (variable variable?)) boolean?]{
Returns true, @scheme[#t], if @scheme[variable] is initialized (i.e., its value is not @scheme['uninitialized]).}
@defproc[(variable-synchronize! (variable variable?)) void?]{
Synchronizes @scheme[variable] to the current (simulated) time.}
The following functions are shortcuts to the statistics for a variable. They will error if there are no associated variables.
@defproc[(variable-minimum (variable variable?)) real?]{
Returns the minimum value of @scheme[variable].}
@defproc[(variable-maximum (variable variable?)) real?]{
Returns the maximum value of @scheme[variable].}
@defproc[(variable-n (variable variable?)) real?]{
If @scheme[variable] is time-dependent, returns the number of values tallied for @scheme[variable]. Otherwise, returns time over which @scheme[variable] has been accumulated.}
@defproc[(variable-sum (variable variable?)) real?]{
Returns the sum of the values of @scheme[variable].}
@defproc[(variable-mean (variable variable?)) real?]{
Returns the mean of the values of @scheme[variable].}
@defproc[(variable-variance (variable variable?)) real?]{
Returns the variance of the values of @scheme[variable].}
@defproc[(variable-standard-deviation (variable variable?)) real?]{
Returns the standard deviation of the values of @scheme[variable].}
To create continuous variables, see Chapter 10, Continuous Simulation Models.
@subsection{Tally and Accumulate}
The @scheme[tally] and @scheme[accumulate] macros specify the automatic data collection for a variable.
@defform*[#:id tally #:literals (variable-statistics variable-history)
((tally (variable-statistics variable))
(tally (variable-history variable)))]{
Specifies time-independent data collection for the specified @scheme[variable]. @scheme[variable-statistics] specifies that statistics are to be tallied for @scheme[variable]. @scheme[variable-history] specifies that a history is to be tallied for @scheme[variable].
Whenever the value of @scheme[variable] is changed, any tallied data collectors are updated with the new value.}
@defform*[#:id accumulate #:literals (variable-statistics variable-history)
((accumulate (variable-statistics variable))
(accumulate (variable-history variable)))]{
Specified time-dependent data collection for the specified @scheme[variable]. @scheme[variable-statistics] specifies that statistics are to be accumulated for @scheme[variable]. @scheme[variable-history] specifies that a history is to be accumulated for @scheme[variable].
Whenever the value of @scheme[variable] is accessed or before its value is changed, any accumulated data collectors are synchronized with the current value over the time since it was last synchronized.}
@subsection{Variable Statistics}
@defstruct*[statistics ((time-dependent? boolean?)
(minimum real?)
(maximum real?)
(n (>=/c 0))
(sum real?)
(sum-of-squares real?))
#:mutable]{
The @scheme[statistics] structure maintains statistics for a variable.
@itemize{
@item{@schemefont{time-dependent?}---True, @scheme[#t], if the statistics are being accumulated (i.e., are time-dependent) or false, @scheme[#f], if the statistics are being tallied (i.e., are time-independent).}
@item{@schemefont{minimum}---The minimum value the variable has had. (Initial value is @scheme[+inf.0]).}
@item{@schemefont{maximum}---The maximum value the variable has had. (Initial value is @scheme[-inf.0]).}
@item{@schemefont{n}---See table below.}
@item{@schemefont{sum}---See table below.}
@item{schemefont{sum-of-squares}---See table below.}
}
}
The following table shows the statistics that are gathered and how they are computed for both @scheme[tally] and @scheme[accumulate].
@(make-table
'boxed
(list
(list (make-flow (list @t{@bold{Statistic}}))
(make-flow (list @t{@bold{@scheme[tally]}}))
(make-flow (list @t{@bold{@scheme[accumulate]}})))
(list (make-flow (list @t{@scheme[n]}))
(make-flow (list @t{number of samples of @math{X}}))
(make-flow (list @t{@math{time_C - time_0}})))
(list (make-flow (list @t{@scheme[sum]}))
(make-flow (list @t{@math{Σ X}}))
(make-flow (list @t{@math{Σ (X × (time_C - time_L))}})))
(list (make-flow (list @t{@scheme[mean]}))
(make-flow (list @t{@schemefont{sum}/@schemefont{n}}))
(make-flow (list @t{@schemefont{sum}/@schemefont{n}})))
(list (make-flow (list @t{@scheme[sum-of-squares]}))
(make-flow (list @t{@math{Σ X^2}}))
(make-flow (list @t{@math{Σ (X^2 × (time_C - time_L))}})))
(list (make-flow (list @t{@scheme[mean-square]}))
(make-flow (list @t{@schemefont{sum-of-squares}/@schemefont{n}}))
(make-flow (list @t{@schemefont{sum-of-squares}/@schemefont{n}})))
(list (make-flow (list @t{@scheme[variance]}))
(make-flow (list @t{@schemefont{mean-square} - @schemefont{mean}@math{^2}}))
(make-flow (list @t{@schemefont{mean-square} - @schemefont{mean}@math{^2}})))
(list (make-flow (list @t{@scheme[standard-deviation]}))
(make-flow (list @t{@schemefont{variance}@superscript{½}}))
(make-flow (list @t{@schemefont{variance}@superscript{½}})))
(list (make-flow (list @t{@scheme[minimum]}))
(make-flow (list @t{minimum @math{X} for all @math{X}}))
(make-flow (list @t{minimum @math{X} for all @math{X}})))
(list (make-flow (list @t{@scheme[maximum]}))
(make-flow (list @t{maximum @math{X} for all @math{X}}))
(make-flow (list @t{maximum @math{X} for all @math{X}})))
))
where,
@itemize{
@item{@math{time_C} = current simulation time}
@item{@math{time_L} = simulation time the variable was set to its current value}
@item{@math{time_0} = simulation time the variable was created}
@item{@math{X} = variable value}
}
@defproc[(statistics-accumulate! (statistics statistics?)
(value real?)
(time (>=/c 0.0)))
any]{
Accumulates the @scheme[statistics] with the @scheme[value] at the specified @scheme[time].}
@defproc[(statistics-tally! (statistics statistics?)
(value real?))
any]{
Tallies the @scheme[statistics] with the @scheme[value].}
@defproc[(statistics-mean (statistics statistics?)) real?]{
Returns the mean of the values in @scheme[statistics].}
@defproc[(statistics-mean-square (statistics statistics?)) real?]{
Returns the means of the squares of the values in @scheme[statistics].}
@defproc[(statistics-variance (statistics statistics?)) real?]{
Returns the sample variance of the values in @scheme[statistics].}
@defproc[(statistics-standard-deviation (statistics statistics?)) real?]{
Returns the standard deviation of the values in @scheme[statistics].}
@subsection{Variable History}
@defstruct*[history ((time-dependent? boolean?)
(n exact-nonnegative-integer?)
(values mlist?)
(last-value-cell (or/c mpair? false/c))
(durations mlist?)
(last-duration-cell (or/c pair? false/c)))
#:mutable]{
The @scheme[history] structure maintains a history of the values for a variable. For accumulated histories (i.e., those specified using the @scheme[accumulate] macro), the durations for each value are also computed.
@itemize{
@item{@schemefont{time-dependent?}---true, @scheme[#t], if the history is being accumulated (i.e., is time-dependent) or false, @scheme[#f], if the history is being tallied (i.e., is time-independent).}
@item{@schemefont{initial-time}---for a time dependent history, the (simulated) time the history was created.}
@item{@schemefont{n}---the number of entries in the history.}
@item{@schemefont{values}---a mutable list of values for the history.}
@item{@schemefont{last-value-cell}---the last cell in @schemefont{values} or @scheme[#f] if @scheme[values] is empty. (Used to efficiently append to @scheme[values].)}
@item{@schemefont{durations}---for accumulated histories, a mutable list of durations for the history. (Not used for tallied histories.)}
@item{@schemefont{last-value-cell}---the last cell in @schemefont{durations} or @scheme[#f] if @scheme[values] is empty. (Used to efficiently append to @scheme[durations].)}
}
}
@defproc[(history-accumulate! (history history?)
(value real?)
(time (>=/c 0.0)))
any]{
Accumulates the @scheme[history] with the @scheme[value] at the specified @scheme[time].}
@defproc[(history-tally! (history history?)
(value real?))
any]{
Tallies the @scheme[history] with the @scheme[value].}
@subsection{History Graphics}
@defproc[(history-plot (history history?) (title string? "History")) any]{
Plots @scheme[history] using the specified @scheme[title]. The string @scheme["History"] is used if @scheme[title] is not specified.}
@subsection{Variable Monitors}
Variable monitors are discussed in Chapter 9 Monitors.
@section{Example---Tally and Accumulate Example}
This example show how the @scheme[tally] and @scheme[accumulate] macros work. Two variables are created: @schemefont{tallied} and @schemefont{accumulated}. Statistics and history data are collected for each---using @scheme[tally] for the variable @schemefont{tallied} and @scheme[accumulate] for the variable @schemefont{accumulated}. The process @schemefont{test-process} iterates through a list of values and durations, setting each of the variables to the specified value for the specified duration of time. Representative statistice, @schemefont{n}, @schemefont{sum}, and @schemefont{mean}, are printed and the histories plotted for each of the variables.
@schememod[
racket/base
(code:comment "Test Tally and Accumulate")
(require (planet williams/simulation/simulation-with-graphics))
(define tallied #f)
(define accumulated #f)
(define-process (test-process value-duration-list)
(for ((vdl (in-list value-duration-list)))
(let ((value (car vdl))
(duration (cadr vdl)))
(set-variable-value! tallied value)
(set-variable-value! accumulated value)
(wait duration))))
(define (main value-duration-list)
(with-new-simulation-environment
(set! tallied (make-variable))
(tally (variable-statistics tallied))
(tally (variable-history tallied))
(set! accumulated (make-variable))
(accumulate (variable-statistics accumulated))
(accumulate (variable-history accumulated))
(schedule #:at 0.0 (test-process value-duration-list))
(start-simulation)
(printf "--- Test Tally and Accumulate ---~n")
(printf "~n--- Tally ---~n")
(printf "N = ~a~n" (variable-n tallied))
(printf "Sum = ~a~n" (variable-sum tallied))
(printf "Mean = ~a~n" (variable-mean tallied))
(printf "~a~n"
(history-plot (variable-history tallied)))
(printf "~n--- Accumulate ---~n")
(printf "N = ~a~n" (variable-n accumulated))
(printf "Sum = ~a~n" (variable-sum accumulated))
(printf "Mean = ~a~n" (variable-mean accumulated))
(printf "~a~n"
(history-plot (variable-history accumulated)))))
(main '((1 2)(2 1)(3 2)(4 3)))
]
Here are the results of running the program for the following value/duration pairs: @scheme[((1 2)(2 1)(3 2)(4 3))]. That is, each variable will have a value of 1 for 2 units of time (fromr time 0 to time 2), a value of 2 for 1 unit of time (from time 2 to time 3), a value of 3 for 2 units of time (from time 3 to time 5), and a value of 4 for 3 units of time (from time 5 to time 8). The simulation ends at time 8.
The following is the resulting output.
@verbatim{
--- Test Tally and Accumulate ---
--- Tally ---
N = 4
Sum = 10.0
Mean = 2.5}
@image["scribblings/images/tally-and-accumulate-plot-1.png"]
@verbatim{
--- Accumulate ---
N = 8.0
Sum = 22.0
Mean = 2.75}
@image["scribblings/images/tally-and-accumulate-plot-2.png"]
@section{Example---Data Collection}
The examples in previous chapters (Examples 0, 1, and 2) relied on @scheme[printf] statements to print the output of the simulation model. This was sufficient to show how the models worked, but would be impractical for large models. This example is the same simulation model as Example 2 from Chapter 7, Resources (using @scheme[with-resource] instead of the individual calls to @scheme[request] and @scheme[relinquish]), but uses automatic data collection to collect data.
No explicit variables are needed for this example since resources already provide variables for their @schemefont{satisfied} and @schemefont{queue} fields---since they are, in turn, implemented using sets. (See Chapter 10, Sets.)
Note that the statements:
@schemeblock[
(accumulate (variable-statistics
(resource-queue-variable-n attendant)))
(accumulate (variable-statistics
(resource-satisfied-variable-n attendant)))]
are not required since statistics are accumulated for any variable by default. [Although it would work the same if one or both were included.]
@schemeblock[
racket/base
(code:comment "Example 3 - Data Collection")
(require (planet williams/simulation/simulation-with-graphics))
(define n-attendants 2)
(define attendant #f)
(define-process (generator n)
(for ((i (in-range n)))
(wait (random-exponential 4.0))
(schedule #:now (customer i))))
(define-process (customer i)
(with-resource (attendant)
(work (random-flat 2.0 10.0))))
(define (run-simulation n)
(with-new-simulation-environment
(set! attendant (make-resource n-attendants))
(schedule #:at 0.0 (generator n))
(accumulate (variable-statistics (resource-queue-variable-n attendant)))
(accumulate (variable-history (resource-queue-variable-n attendant)))
(start-simulation)
(printf "--- Example 3 - Data Collection ---~n")
(printf "Maximum queue length = ~a~n"
(variable-maximum (resource-queue-variable-n attendant)))
(printf "Average queue length = ~a~n"
(variable-mean (resource-queue-variable-n attendant)))
(printf "Variance = ~a~n"
(variable-variance (resource-queue-variable-n attendant)))
(printf "Utilization = ~a~n"
(variable-mean (resource-satisfied-variable-n attendant)))
(printf "Variance = ~a~n"
(variable-variance (resource-satisfied-variable-n attendant)))
(write-special (history-plot (variable-history
(resource-queue-variable-n attendant))))
(newline)))
(run-simulation 1000)
]
The following is the resulting output.
@verbatim{
--- Example 3 - Data Collection ---
Maximum queue length = 8
Average queue length = 0.9120534884951139
Variance = 2.2453788694193957
Utilization = 1.4320511974417858
Variance = 0.5885107114317054}
@image["scribblings/images/example-3.png"]
This is the first useful example we've shown in the sense that we simulate enough customers to be meaningful and provide statistical output of the simulation.
A few things to note here:
@itemize{
@item{We use the @scheme[with-resource] form here to request and relinquish the attendant. This is generally the easiest way to use a resource.}
@item{The plot produced by @scheme[history-plot] can be printed. This works when DrRacket is used to execute the simulation model.}
}
@section{Data Collection Across Multiple Simulation Runs}
Even as simplistic as our example simulation model is, it is still useful in illustrating some advanced data collection techniques. In particular, in this section we will show how to collect statistics across multiple simulation runs.
@subsection{Open Loop Processing}
@deftech{Open loop processing} is a technique where a resource is considered to have an infinite number of units available for allocation. That is, no process will ever block waiting for such a resource. Statistics on the demand for such resources can be collected by looking at the @scheme[resources-satistied-variable-n] variable. Typically, this is done in a Monte Carlo fashion across multiple simulation runs.
In the simulation collection, we denote an open-loop resource by specifying an infinite number of units when the resource is created. In Racket, @scheme[+inf.0] denotes positive infinity and is used in specifying an open-loop resource.
@subsection{Example---Open Loop Processing}
This example collects statistics on the maximum number of attendants required in the system (a measure of demand) when there is no blocking.
There is an outer simulation environment that exists solely for data collection and a variable @schemefont{max-attendants} to gather statistics on the maximum number of attendants required. Note that these statistics must be tallied at this level because (simulated) time does not exist across multiple simulation runs.
The inner loop created a new simulation environment for each simulation run. This ensures that each run is properly initialized. It is in this inner loop that the attendant resource is created with an infinite number of units using @scheme[(make-resource +inf.0)]. When the simulation in the inner loop terminates, the @schemefont{max-attendants} variable (in the outer loop) is updated with the maximum number of attendants from the simulation. This is done with:
@schemeblock[
(set-variable-value! max-attendants
(variable-maximum
(resource-satisfied-variable-n attendant)))]
Finally, the statistics and histogram of the maximum attendants accross all of the simulation runs is printed.
@schememod[
racket/base
(code:comment "Open Loop Example")
(require (planet williams/simulation/simulation-with-graphics))
(define attendant #f)
(define (generator n)
(for ((i (in-range n)))
(wait (random-exponential 4.0))
(schedule #:now (customer i))))
(define-process (customer i)
(with-resource (attendant)
(wait/work (random-flat 2.0 10.0))))
(define (run-simulation n1 n2)
(with-new-simulation-environment
(let ((max-attendants (make-variable)))
(tally (variable-statistics max-attendants))
(tally (variable-history max-attendants))
(for ((i (in-range n1)))
(with-new-simulation-environment
(set! attendant (make-resource +inf.0))
(schedule #:at 0.0 (generator n2))
(start-simulation)
(set-variable-value! max-attendants
(variable-maximum (resource-satisfied-variable-n attendant)))))
(printf "--- Open Loop Example ---~n")
(printf "Number of experiments = ~a~n"
(variable-n max-attendants))
(printf "Minimum maximum attendants = ~a~n"
(variable-minimum max-attendants))
(printf "Maximum maximum attendants = ~a~n"
(variable-maximum max-attendants))
(printf "Mean maximum attendants = ~a~n"
(variable-mean max-attendants))
(printf "Variance = ~a~n"
(variable-variance max-attendants))
(write-special (history-plot (variable-history max-attendants)
"Maximum Attendants"))
(newline))))
(run-simulation 1000 1000)
]
The following shows the output of the simulation for 1000 runs of 1000 customers each.
@verbatim{
--- Open Loop Example ---
Number of experiments = 1000
Minimum maximum attendants = 6
Maximum maximum attendants = 11
Mean maximum attendants = 7.525
Variance = 0.6653749999999903}
@image["scribblings/images/open-loop-processing.gif"]
This can be interpreted as saying that in order to service all customers with no wait time for any customer, a minimum of six and a maximum of eleven attendants were requires, with a mean of approximately 7.5.
Note the use of @scheme[write-special] to output the history plot (instead of the more convenient @scheme[printf]). This will produce the graphical plot when the output is directed to an editor canvas in GRacket as well as in DrRacket. The @scheme[(newline)] call performs the same function as the @scheme[~n] in @scheme[printf].
@subsection{Closed Loop Processing}
@deftech{Closed loop processing} is the "normal" processing mode in a simulation model where the number of units of a unit is specified and processes are queued (i.e., blocked) when there are not sufficient units of a resource to satisfy a request. Statistics on the contention for such resources can be collected by looking at the @scheme[resource-queue-variable-n] variable. Typically, this is done across multiple simulation runs.
@subsection{Example---Closed Loop Processing}
This example collects statistics on the average attendant queue length in the system (a measure of contention) when there is a specified number of attendants.
There is an outer simulation environment rhat exists solely for data collection and a variable @schemefont{avg-queue-length} to gather statistics on the average attendant queue length. Note that the statistics must be tallied at this level because (simulated) time does not exist across multiple simulation runs.
The inner loop created a new simulation environment for each simulation rn. This ensures that each run is properly initialized. It is in this inner loop that the attendant resource is created with the specified number of units using @scheme[(make-resource n-attendants)]. When the simulation in the inner loop terminates, the @schemefont{avg-queue-length} variable (in the outer loop) is updated with the average attendant queue length from the simulation. This is done with:
@schemeblock[
(set-variable-value! avg-queue-length
(variable-mean
(resource-queue-variable-n attendant)))]
Finally, the statistics and histogram of the average attendant queue length across all of the simulation runs is printed.
@schememod[
racket/base
(code:comment "Closed Loop Example")
(require (planet williams/simulation/simulation-with-graphics))
(define n-attendants 2)
(define attendant #f)
(define-process (generator n)
(for ((i (in-range n)))
(wait (random-exponential 4.0))
(schedule #:now (customer i))))
(define-process (customer i)
(with-resource (attendant)
(work (random-flat 2.0 10.0))))
(define (run-simulation n1 n2)
(let ((avg-queue-length (make-variable)))
(tally (variable-statistics avg-queue-length))
(tally (variable-history avg-queue-length))
(for ((i (in-range n1)))
(with-new-simulation-environment
(set! attendant (make-resource n-attendants))
(schedule #:at 0.0 (generator n2))
(start-simulation)
(set-variable-value! avg-queue-length
(variable-mean (resource-queue-variable-n attendant)))))
(printf "--- Closed Loop Example ---~n")
(printf "Number of attendants = ~a~n" n-attendants)
(printf "Number of experiments = ~a~n"
(variable-n avg-queue-length))
(printf "Minimum average queue length = ~a~n"
(variable-minimum avg-queue-length))
(printf "Maximum average queue length = ~a~n"
(variable-maximum avg-queue-length))
(printf "Mean average queue length = ~a~n"
(variable-mean avg-queue-length))
(printf "Variance = ~a~n"
(variable-variance avg-queue-length))
(print (history-plot (variable-history avg-queue-length)
"Average Queue Length"))
(newline)))
(run-simulation 1000 1000)
]
The following shows the output of the simulation for 1000 runs of 1000 customers each.
@verbatim{
--- Closed Loop Example ---
Number of attendants = 2
Number of experiments = 1000
Minimum average queue length = 0.5792057912006373
Maximum average queue length = 3.182757214703683
Mean average queue length = 1.1123279920475524
Variance = 0.08869696318792064}
@image["scribblings/images/closed-loop-processing.gif"]
This shows that with two attendants, on average, over 1000 runs of 1000 customers, there were 1.1 people in the queue waiting for an attendant.