Tcl Library Source Code

Documentation
Login


[ Main Table Of Contents | Table Of Contents | Keyword Index | Categories | Modules | Applications ]

NAME

math::statistics - Basic statistical functions and procedures

Table Of Contents

SYNOPSIS

package require Tcl 8.5
package require math::statistics 1

::math::statistics::mean data
::math::statistics::min data
::math::statistics::max data
::math::statistics::number data
::math::statistics::stdev data
::math::statistics::var data
::math::statistics::pstdev data
::math::statistics::pvar data
::math::statistics::median data
::math::statistics::basic-stats data
::math::statistics::histogram limits values ?weights?
::math::statistics::histogram-alt limits values ?weights?
::math::statistics::corr data1 data2
::math::statistics::interval-mean-stdev data confidence
::math::statistics::t-test-mean data est_mean est_stdev alpha
::math::statistics::test-normal data significance
::math::statistics::lillieforsFit data
::math::statistics::test-Duckworth list1 list2 significance
::math::statistics::test-anova-F alpha args
::math::statistics::test-Tukey-range alpha args
::math::statistics::test-Dunnett alpha control args
::math::statistics::quantiles data confidence
::math::statistics::quantiles limits counts confidence
::math::statistics::autocorr data
::math::statistics::crosscorr data1 data2
::math::statistics::mean-histogram-limits mean stdev number
::math::statistics::minmax-histogram-limits min max number
::math::statistics::linear-model xdata ydata intercept
::math::statistics::linear-residuals xdata ydata intercept
::math::statistics::test-2x2 n11 n21 n12 n22
::math::statistics::print-2x2 n11 n21 n12 n22
::math::statistics::control-xbar data ?nsamples?
::math::statistics::control-Rchart data ?nsamples?
::math::statistics::test-xbar control data
::math::statistics::test-Rchart control data
::math::statistics::test-Kruskal-Wallis confidence args
::math::statistics::analyse-Kruskal-Wallis args
::math::statistics::test-Levene groups
::math::statistics::test-Brown-Forsythe groups
::math::statistics::group-rank args
::math::statistics::test-Wilcoxon sample_a sample_b
::math::statistics::spearman-rank sample_a sample_b
::math::statistics::spearman-rank-extended sample_a sample_b
::math::statistics::kernel-density data opt -option value ...
::math::statistics::bootstrap data sampleSize ?numberSamples?
::math::statistics::wasserstein-distance prob1 prob2
::math::statistics::kl-divergence prob1 prob2
::math::statistics::logistic-model xdata ydata
::math::statistics::logistic-probability coeffs x
::math::statistics::tstat dof ?alpha?
::math::statistics::mv-wls wt1 weights_and_values
::math::statistics::mv-ols values
::math::statistics::pdf-normal mean stdev value
::math::statistics::pdf-lognormal mean stdev value
::math::statistics::pdf-exponential mean value
::math::statistics::pdf-uniform xmin xmax value
::math::statistics::pdf-triangular xmin xmax value
::math::statistics::pdf-symmetric-triangular xmin xmax value
::math::statistics::pdf-gamma alpha beta value
::math::statistics::pdf-poisson mu k
::math::statistics::pdf-chisquare df value
::math::statistics::pdf-student-t df value
::math::statistics::pdf-gamma a b value
::math::statistics::pdf-beta a b value
::math::statistics::pdf-weibull scale shape value
::math::statistics::pdf-gumbel location scale value
::math::statistics::pdf-pareto scale shape value
::math::statistics::pdf-cauchy location scale value
::math::statistics::pdf-laplace location scale value
::math::statistics::pdf-kumaraswamy a b value
::math::statistics::pdf-negative-binomial r p value
::math::statistics::cdf-normal mean stdev value
::math::statistics::cdf-lognormal mean stdev value
::math::statistics::cdf-exponential mean value
::math::statistics::cdf-uniform xmin xmax value
::math::statistics::cdf-triangular xmin xmax value
::math::statistics::cdf-symmetric-triangular xmin xmax value
::math::statistics::cdf-students-t degrees value
::math::statistics::cdf-gamma alpha beta value
::math::statistics::cdf-poisson mu k
::math::statistics::cdf-beta a b value
::math::statistics::cdf-weibull scale shape value
::math::statistics::cdf-gumbel location scale value
::math::statistics::cdf-pareto scale shape value
::math::statistics::cdf-cauchy location scale value
::math::statistics::cdf-F nf1 nf2 value
::math::statistics::cdf-laplace location scale value
::math::statistics::cdf-kumaraswamy a b value
::math::statistics::cdf-negative-binomial r p value
::math::statistics::empirical-distribution values
::math::statistics::random-normal mean stdev number
::math::statistics::random-lognormal mean stdev number
::math::statistics::random-exponential mean number
::math::statistics::random-uniform xmin xmax number
::math::statistics::random-triangular xmin xmax number
::math::statistics::random-symmetric-triangular xmin xmax number
::math::statistics::random-gamma alpha beta number
::math::statistics::random-poisson mu number
::math::statistics::random-chisquare df number
::math::statistics::random-student-t df number
::math::statistics::random-beta a b number
::math::statistics::random-weibull scale shape number
::math::statistics::random-gumbel location scale number
::math::statistics::random-pareto scale shape number
::math::statistics::random-cauchy location scale number
::math::statistics::random-laplace location scale number
::math::statistics::random-kumaraswamy a b number
::math::statistics::random-negative-binomial r p number
::math::statistics::histogram-uniform xmin xmax limits number
::math::statistics::incompleteGamma x p ?tol?
::math::statistics::incompleteBeta a b x ?tol?
::math::statistics::estimate-pareto values
::math::statistics::estimate-exponential values
::math::statistics::estimate-laplace values
::math::statistics::estimante-negative-binomial r values
::math::statistics::filter varname data expression
::math::statistics::map varname data expression
::math::statistics::samplescount varname list expression
::math::statistics::subdivide
::math::statistics::plot-scale canvas xmin xmax ymin ymax
::math::statistics::plot-xydata canvas xdata ydata tag
::math::statistics::plot-xyline canvas xdata ydata tag
::math::statistics::plot-tdata canvas tdata tag
::math::statistics::plot-tline canvas tdata tag
::math::statistics::plot-histogram canvas counts limits tag

DESCRIPTION

The math::statistics package contains functions and procedures for basic statistical data analysis, such as:

It is meant to help in developing data analysis applications or doing ad hoc data analysis, it is not in itself a full application, nor is it intended to rival with full (non-)commercial statistical packages.

The purpose of this document is to describe the implemented procedures and provide some examples of their usage. As there is ample literature on the algorithms involved, we refer to relevant text books for more explanations. The package contains a fairly large number of public procedures. They can be distinguished in three sets: general procedures, procedures that deal with specific statistical distributions, list procedures to select or transform data and simple plotting procedures (these require Tk). Note: The data that need to be analyzed are always contained in a simple list. Missing values are represented as empty list elements. Note: With version 1.0.1 a mistake in the procs pdf-lognormal, cdf-lognormal and random-lognormal has been corrected. In previous versions the argument for the standard deviation was actually used as if it was the variance.

GENERAL PROCEDURES

The general statistical procedures are:

MULTIVARIATE LINEAR REGRESSION

Besides the linear regression with a single independent variable, the statistics package provides two procedures for doing ordinary least squares (OLS) and weighted least squares (WLS) linear regression with several variables. They were written by Eric Kemp-Benedict.

In addition to these two, it provides a procedure (tstat) for calculating the value of the t-statistic for the specified number of degrees of freedom that is required to demonstrate a given level of significance.

Note: These procedures depend on the math::linearalgebra package.

Description of the procedures

Example of the use:

# Store the value of the unicode value for the "+/-" character
set pm "\u00B1"

# Provide some data
set data {{  -.67  14.18  60.03 -7.5  }
          { 36.97  15.52  34.24 14.61 }
          {-29.57  21.85  83.36 -7.   }
          {-16.9   11.79  51.67 -6.56 }
          { 14.09  16.24  36.97 -12.84}
          { 31.52  20.93  45.99 -25.4 }
          { 24.05  20.69  50.27  17.27}
          { 22.23  16.91  45.07  -4.3 }
          { 40.79  20.49  38.92  -.73 }
          {-10.35  17.24  58.77  18.78}}

# Call the ols routine
set results [::math::statistics::mv-ols $data]

# Pretty-print the results
puts "R-squared: [lindex $results 0]"
puts "Adj R-squared: [lindex $results 1]"
puts "Coefficients $pm s.e. -- \[95% confidence interval\]:"
foreach val [lindex $results 2] se [lindex $results 3] bounds [lindex $results 4] {
    set lb [lindex $bounds 0]
    set ub [lindex $bounds 1]
    puts "   $val $pm $se -- \[$lb to $ub\]"
}

STATISTICAL DISTRIBUTIONS

In the literature a large number of probability distributions can be found. The statistics package supports:

In principle for each distribution one has procedures for:

The following procedures have been implemented:

TO DO: more function descriptions to be added

DATA MANIPULATION

The data manipulation procedures act on lists or lists of lists:

PLOT PROCEDURES

The following simple plotting procedures are available:

THINGS TO DO

The following procedures are yet to be implemented:

EXAMPLES

The code below is a small example of how you can examine a set of data:

# Simple example:
# - Generate data (as a cheap way of getting some)
# - Perform statistical analysis to describe the data
#
package require math::statistics

#
# Two auxiliary procs
#
proc pause {time} {
   set wait 0
   after [expr {$time*1000}] {set ::wait 1}
   vwait wait
}

proc print-histogram {counts limits} {
   foreach count $counts limit $limits {
      if { $limit != {} } {
         puts [format "<%12.4g\t%d" $limit $count]
         set prev_limit $limit
      } else {
         puts [format ">%12.4g\t%d" $prev_limit $count]
      }
   }
}

#
# Our source of arbitrary data
#
proc generateData { data1 data2 } {
   upvar 1 $data1 _data1
   upvar 1 $data2 _data2

   set d1 0.0
   set d2 0.0
   for { set i 0 } { $i < 100 } { incr i } {
      set d1 [expr {10.0-2.0*cos(2.0*3.1415926*$i/24.0)+3.5*rand()}]
      set d2 [expr {0.7*$d2+0.3*$d1+0.7*rand()}]
      lappend _data1 $d1
      lappend _data2 $d2
   }
   return {}
}

#
# The analysis session
#
package require Tk
console show
canvas .plot1
canvas .plot2
pack   .plot1 .plot2 -fill both -side top

generateData data1 data2

puts "Basic statistics:"
set b1 [::math::statistics::basic-stats $data1]
set b2 [::math::statistics::basic-stats $data2]
foreach label {mean min max number stdev var} v1 $b1 v2 $b2 {
   puts "$label\t$v1\t$v2"
}
puts "Plot the data as function of \"time\" and against each other"
::math::statistics::plot-scale .plot1  0 100  0 20
::math::statistics::plot-scale .plot2  0 20   0 20
::math::statistics::plot-tline .plot1 $data1
::math::statistics::plot-tline .plot1 $data2
::math::statistics::plot-xydata .plot2 $data1 $data2

puts "Correlation coefficient:"
puts [::math::statistics::corr $data1 $data2]

pause 2
puts "Plot histograms"
.plot2 delete all
::math::statistics::plot-scale .plot2  0 20 0 100
set limits         [::math::statistics::minmax-histogram-limits 7 16]
set histogram_data [::math::statistics::histogram $limits $data1]
::math::statistics::plot-histogram .plot2 $histogram_data $limits

puts "First series:"
print-histogram $histogram_data $limits

pause 2
set limits         [::math::statistics::minmax-histogram-limits 0 15 10]
set histogram_data [::math::statistics::histogram $limits $data2]
::math::statistics::plot-histogram .plot2 $histogram_data $limits d2
.plot2 itemconfigure d2 -fill red

puts "Second series:"
print-histogram $histogram_data $limits

puts "Autocorrelation function:"
set  autoc [::math::statistics::autocorr $data1]
puts [::math::statistics::map $autoc {[format "%.2f" $x]}]
puts "Cross-correlation function:"
set  crossc [::math::statistics::crosscorr $data1 $data2]
puts [::math::statistics::map $crossc {[format "%.2f" $x]}]

::math::statistics::plot-scale .plot1  0 100 -1  4
::math::statistics::plot-tline .plot1  $autoc "autoc"
::math::statistics::plot-tline .plot1  $crossc "crossc"
.plot1 itemconfigure autoc  -fill green
.plot1 itemconfigure crossc -fill yellow

puts "Quantiles: 0.1, 0.2, 0.5, 0.8, 0.9"
puts "First:  [::math::statistics::quantiles $data1 {0.1 0.2 0.5 0.8 0.9}]"
puts "Second: [::math::statistics::quantiles $data2 {0.1 0.2 0.5 0.8 0.9}]"

If you run this example, then the following should be clear:

Bugs, Ideas, Feedback

This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category math :: statistics of the Tcllib Trackers. Please also report any ideas for enhancements you may have for either package and/or documentation.

When proposing code changes, please provide unified diffs, i.e the output of diff -u.

Note further that attachments are strongly preferred over inlined patches. Attachments can be made by going to the Edit form of the ticket immediately after its creation, and then using the left-most button in the secondary navigation bar.

KEYWORDS

data analysis, mathematics, statistics

CATEGORY

Mathematics