Tcl Library Source Code: Check-in [bfc68668cd]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Comment:	Add a procedure for estimating probability density functions by means of the kernel density estimation method
Downloads:	Tarball \| ZIP archive
Timelines:	family \| ancestors \| descendants \| both \| trunk
Files:	files \| file ages \| folders
SHA1:	bfc68668cd5da876e3629480017441d4b370ae29
User & Date:	markus 2014-01-18 12:13:30.497

Context

2014-01-18
14:20		Added three test cases for the kernel density estimation. Resulted in a few small corrections (deal with missing values) check-in: 2456e0d413 user: markus tags: trunk
12:13		Add a procedure for estimating probability density functions by means of the kernel density estimation method check-in: bfc68668cd user: markus tags: trunk
2014-01-10
00:03		Fix installer breaking on empty doc directories of excluded packages. check-in: e889e36583 user: andreask tags: trunk

Changes

Changes to modules/math/ChangeLog.

Changes to modules/math/pkgIndex.tcl.

Added modules/math/stat_kernel.tcl.

Changes to modules/math/statistics.man.

Changes to modules/math/statistics.tcl.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16	~~[manpage_begin math::statistics n 0.8]~~ [keywords {data analysis}] [keywords mathematics] [keywords statistics] [moddesc {Tcl Math Library}] [titledesc {Basic statistical functions and procedures}] [category Mathematics] [require Tcl 8.4] ~~[require math::statistics 0.8]~~ [description] [para] The [package math::statistics] package contains functions and procedures for basic statistical data analysis, such as: [list_begin itemized]	\| \|	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16	[manpage_begin math::statistics n 0.9] [keywords {data analysis}] [keywords mathematics] [keywords statistics] [moddesc {Tcl Math Library}] [titledesc {Basic statistical functions and procedures}] [category Mathematics] [require Tcl 8.4] [require math::statistics 0.9] [description] [para] The [package math::statistics] package contains functions and procedures for basic statistical data analysis, such as: [list_begin itemized]
︙			︙
413 414 415 416 417 418 419 420 421 422 423 424 425 426	Returns a list of subsamples (their indices) that indeed violate the limits. [list_begin arguments] [arg_def list control] - Control limits as returned by the "control-Rchart" procedure [arg_def list data] - List of observed data [list_end] [para] [list_end] [section "MULTIVARIATE LINEAR REGRESSION"] Besides the linear regression with a single independent variable, the statistics package provides two procedures for doing ordinary	> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >	413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511	Returns a list of subsamples (their indices) that indeed violate the limits. [list_begin arguments] [arg_def list control] - Control limits as returned by the "control-Rchart" procedure [arg_def list data] - List of observed data [list_end] [para] [call [cmd ::math::statistics::test-Kruskal-Wallis] [arg confidence] [arg args]] Check if the population medians of two or more groups are equal with a given confidence level, using the Kruskal-Wallis test. [list_begin arguments] [arg_def float confidence] - Confidence level to be used (0-1) [arg_def list args] - Two or more lists of data [list_end] [para] [call [cmd ::math::statistics::analyse-Kruskal-Wallis] [arg args]] Compute the statistical parameters for the Kruskal-Wallis test. Returns the Kruskal-Wallis statistic and the probability that that value would occur assuming the medians of the populations are equal. [list_begin arguments] [arg_def list args] - Two or more lists of data [list_end] [para] [call [cmd ::math::statistics::group-rank] [arg args]] Rank the groups of data with respect to the complete set. Returns a list consisting of the group ID, the value and the rank (possibly a rational number, in case of ties) for each data item. [list_begin arguments] [arg_def list args] - Two or more lists of data [list_end] [para] [call [cmd ::math::statistics::test-Wilcoxon] [arg sample_a] [arg sample_b]] Compute the Wilcoxon test statistic to determine if two samples have the same median or not. (The statistic can be regarded as standard normal, if the sample sizes are both larger than 10. Returns the value of this statistic. [list_begin arguments] [arg_def list sample_a] - List of data comprising the first sample [arg_def list sample_b] - List of data comprising the second sample [list_end] [para] [call [cmd ::math::statistics::spearman-rank] [arg sample_a] [arg sample_b]] Return the Spearman rank correlation as an alternative to the ordinary (Pearson's) correlation coefficient. The two samples should have the same number of data. [list_begin arguments] [arg_def list sample_a] - First list of data [arg_def list sample_b] - Second list of data [list_end] [para] [call [cmd ::math::statistics::spearman-rank-extended] [arg sample_a] [arg sample_b]] Return the Spearman rank correlation as an alternative to the ordinary (Pearson's) correlation coefficient as well as additional data. The two samples should have the same number of data. The procedure returns the correlation coefficient, the number of data pairs used and the z-score, an approximately standard normal statistic, indicating the significance of the correlation. [list_begin arguments] [arg_def list sample_a] - First list of data [arg_def list sample_b] - Second list of data [list_end] [call [cmd ::math::statistics::kernel-density] [arg data] opt [arg "-option value"] ...]] Return the density function based on kernel density estimation. The procedure is controlled by a small set of options, each of which is given a reasonable default. [nl] The return value consists of three lists: the centres of the bins, the associated probability density and a list of computational parameters (begin and end of the interval, mean and standard deviation and the used bandwidth). The computational parameters can be used for further analysis. [list_begin arguments] [arg_def list data] - The data to be examined [arg_def list args] - Option-value pairs: [list_begin definitions] [def "[option -weights] [arg weights]"] Per data point the weight (default: 1 for all data) [def "[option -bandwidth] [arg value]"] Bandwidth to be used for the estimation (default: determined from standard deviation) [def "[option -number] [arg value]"] Number of bins to be returned (default: 100) [def "[option -interval] [arg "{begin end}"]"] Begin and end of the interval for which the density is returned (default: mean +/- 3*standard deviation) [def "[option -kernel] [arg function]"] Kernel to be used (One of: gaussian, cosine, epanechnikov, uniform, triangular, biweight, logistic; default: gaussian) [list_end] [list_end] [list_end] [section "MULTIVARIATE LINEAR REGRESSION"] Besides the linear regression with a single independent variable, the statistics package provides two procedures for doing ordinary
︙			︙
907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983	[list_end] [para] [call [cmd ::math::statistics::subdivide]] Routine [emph PM] - not implemented yet [para] ~~[call [cmd ::math::statistics::test-Kruskal-Wallis] [arg confidence] [arg args]]~~ ~~Check if the population medians of two or more groups are equal with a~~ ~~given confidence level, using the Kruskal-Wallis test.~~ ~~[list_begin arguments]~~ ~~[arg_def float confidence] - Confidence level to be used (0-1)~~ ~~[arg_def list args] - Two or more lists of data~~ ~~[list_end]~~ ~~[para]~~ ~~[call [cmd ::math::statistics::analyse-Kruskal-Wallis] [arg args]]~~ ~~Compute the statistical parameters for the Kruskal-Wallis test.~~ ~~Returns the Kruskal-Wallis statistic and the probability that that~~ ~~value would occur assuming the medians of the populations are~~ ~~equal.~~ ~~[list_begin arguments]~~ ~~[arg_def list args] - Two or more lists of data~~ ~~[list_end]~~ ~~[para]~~ ~~[call [cmd ::math::statistics::group-rank] [arg args]]~~ ~~Rank the groups of data with respect to the complete set.~~ ~~Returns a list consisting of the group ID, the value and the rank~~ ~~(possibly a rational number, in case of ties) for each data item.~~ ~~[list_begin arguments]~~ ~~[arg_def list args] - Two or more lists of data~~ ~~[list_end]~~ ~~[para]~~ ~~[call [cmd ::math::statistics::test-Wilcoxon] [arg sample_a] [arg sample_b]]~~ ~~Compute the Wilcoxon test statistic to determine if two samples have the~~ ~~same median or not. (The statistic can be regarded as standard normal, if the~~ ~~sample sizes are both larger than 10. Returns the value of this statistic.~~ ~~[list_begin arguments]~~ ~~[arg_def list sample_a] - List of data comprising the first sample~~ ~~[arg_def list sample_b] - List of data comprising the second sample~~ ~~[list_end]~~ ~~[para]~~ ~~[call [cmd ::math::statistics::spearman-rank] [arg sample_a] [arg sample_b]]~~ ~~Return the Spearman rank correlation as an alternative to the ordinary (Pearson's) correlation~~ ~~coefficient. The two samples should have the same number of data.~~ ~~[list_begin arguments]~~ ~~[arg_def list sample_a] - First list of data~~ ~~[arg_def list sample_b] - Second list of data~~ ~~[list_end]~~ ~~[para]~~ ~~[call [cmd ::math::statistics::spearman-rank-extended] [arg sample_a] [arg sample_b]]~~ ~~Return the Spearman rank correlation as an alternative to the ordinary (Pearson's) correlation~~ ~~coefficient as well as additional data. The two samples should have the same number of data.~~ ~~The procedure returns the correlation coefficient, the number of data pairs used and the~~ ~~z-score, an approximately standard normal statistic, indicating the significance of the correlation.~~ ~~[list_begin arguments]~~ ~~[arg_def list sample_a] - First list of data~~ ~~[arg_def list sample_b] - Second list of data~~ ~~[list_end]~~ [list_end] [section "PLOT PROCEDURES"] The following simple plotting procedures are available: [list_begin definitions] [call [cmd ::math::statistics::plot-scale] [arg canvas] \	< < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < <	992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005	[list_end] [para] [call [cmd ::math::statistics::subdivide]] Routine [emph PM] - not implemented yet [para] [list_end] [section "PLOT PROCEDURES"] The following simple plotting procedures are available: [list_begin definitions] [call [cmd ::math::statistics::plot-scale] [arg canvas] \
︙			︙

︙			︙
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27	# Eric Kemp-Benedict, february 2007 # version 0.5: added the population standard deviation and variance, # as suggested by Dimitrios Zachariadis # version 0.6: added pdf and cdf procedures for various distributions # (provided by Eric Kemp-Benedict) # version 0.7: added Kruskal-Wallis test (by Torsten Berg) # version 0.8: added Wilcoxon test and Spearman rank correlation ~~package provide math::statistics 0.~~8.1~~~~ package require math # ::math::statistics -- # Namespace holding the procedures and variables # namespace eval ::math::statistics {	> \|	12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28	# Eric Kemp-Benedict, february 2007 # version 0.5: added the population standard deviation and variance, # as suggested by Dimitrios Zachariadis # version 0.6: added pdf and cdf procedures for various distributions # (provided by Eric Kemp-Benedict) # version 0.7: added Kruskal-Wallis test (by Torsten Berg) # version 0.8: added Wilcoxon test and Spearman rank correlation # version 0.9: added kernel density estimation package provide math::statistics 0.9 package require math # ::math::statistics -- # Namespace holding the procedures and variables # namespace eval ::math::statistics {
︙			︙
1293 1294 1295 1296 1297 1298 1299 ~~1300 1301 1302 1303~~ 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319	if { $range < $rlower } { lappend result $i } if { $range > $rupper } { lappend result $i } } return $result } # # Load the auxiliary scripts # source [file join [file dirname [info script]] pdf_stat.tcl] source [file join [file dirname [info script]] plotstat.tcl] source [file join [file dirname [info script]] liststat.tcl] source [file join [file dirname [info script]] mvlinreg.tcl] source [file join [file dirname [info script]] kruskal.tcl] source [file join [file dirname [info script]] wilcoxon.tcl] # # Define the tables # namespace eval ::math::statistics { variable student_t_table	< < < < >	1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317	if { $range < $rlower } { lappend result $i } if { $range > $rupper } { lappend result $i } } return $result } # # Load the auxiliary scripts # source [file join [file dirname [info script]] pdf_stat.tcl] source [file join [file dirname [info script]] plotstat.tcl] source [file join [file dirname [info script]] liststat.tcl] source [file join [file dirname [info script]] mvlinreg.tcl] source [file join [file dirname [info script]] kruskal.tcl] source [file join [file dirname [info script]] wilcoxon.tcl] source [file join [file dirname [info script]] stat_kernel.tcl] # # Define the tables # namespace eval ::math::statistics { variable student_t_table
︙			︙