Tcl Library Source Code

Check-in [2456e0d413]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Added three test cases for the kernel density estimation. Resulted in a few small corrections (deal with missing values)
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 2456e0d41399a53b8460aed292c54a4ca5011c03
User & Date: markus 2014-01-18 14:20:22.935
Context
2014-01-20
05:17
Corrected use of bandwidth, adjusted a test to take care of pecularities of the kernels check-in: 991cdf14aa user: markus tags: trunk
2014-01-18
14:20
Added three test cases for the kernel density estimation. Resulted in a few small corrections (deal with missing values) check-in: 2456e0d413 user: markus tags: trunk
12:13
Add a procedure for estimating probability density functions by means of the kernel density estimation method check-in: bfc68668cd user: markus tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to modules/math/ChangeLog.
1
2
3
4
5

6
7
8
9
10
11
12
2014-01-18  Arjen Markus <[email protected]>
	* statistics.tcl: Added stat_kernel.tcl
	* stat_kernel.tcl: Implements a straightforward kernel density estimation procedure
	* statistics.man: Describe the kernel denstity estimation procedure, moved the description of several
	                  tests to the general section

	* pkgIndex.tcl: Bumped version of statistics package to 0.9

2013-12-20  Arjen Markus <[email protected]>
	* interpolate.tcl: [Ticket 843c2257d2] Added special case for points coincident with the data points
	* interpolate.test: [Ticket 843c2257d2] Added test case for coincident points

2013-12-17  Andreas Kupries  <[email protected]>





>







1
2
3
4
5
6
7
8
9
10
11
12
13
2014-01-18  Arjen Markus <[email protected]>
	* statistics.tcl: Added stat_kernel.tcl
	* stat_kernel.tcl: Implements a straightforward kernel density estimation procedure
	* statistics.man: Describe the kernel denstity estimation procedure, moved the description of several
	                  tests to the general section
	* statistics.test: Added three tests for the kernel density estimation (note: one result is a bit troublesome)
	* pkgIndex.tcl: Bumped version of statistics package to 0.9

2013-12-20  Arjen Markus <[email protected]>
	* interpolate.tcl: [Ticket 843c2257d2] Added special case for points coincident with the data points
	* interpolate.test: [Ticket 843c2257d2] Added test case for coincident points

2013-12-17  Andreas Kupries  <[email protected]>
Changes to modules/math/TODO.
1





2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
This file records outstanding actions for the math module






dd. 26 october 2005, Arjen Markus

qcomplex.test: extend the tests for cos/sin .. to include
               non-real results. 

dd. 28 september 2005, Arjen Markus

optimize.tcl: linear programming algorithm ignores certain
              constraints (of type x > 0). Needs to be 
              fixed

dd. 22 june 2004, Arjen Markus

interpolate.man: add examples
interpolate.tcl: more consistency in the calling convention
                 checks on arguments (add tests for them)

>
>
>
>
>




|




|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
This file records outstanding actions for the math module

dd. 18 january 2014, Arjen Markus
test cases for kernel-density:
One test case is troublesome - uniform kernel, checking the total density


dd. 26 october 2005, Arjen Markus

qcomplex.test: extend the tests for cos/sin .. to include
               non-real results.

dd. 28 september 2005, Arjen Markus

optimize.tcl: linear programming algorithm ignores certain
              constraints (of type x > 0). Needs to be
              fixed

dd. 22 june 2004, Arjen Markus

interpolate.man: add examples
interpolate.tcl: more consistency in the calling convention
                 checks on arguments (add tests for them)
Changes to modules/math/stat_kernel.tcl.
87
88
89
90
91
92
93




94
95
96
97
98
99
100
    if { $opt(-bandwidth) <= 0.0 } {
        return -code error -errorcode ARG -errorinfo "The bandwidth must be positive: $opt(-bandwidth)"
    }

    if { $opt(-number) <= 0.0 } {
        return -code error -errorcode ARG -errorinfo "The number of bins must be positive: $opt(-number)"
    }





    if { [llength [info proc $opt(-kernel)]] == 0 } {
        return -code error -errorcode ARG -errorinfo "Unknown kernel function: $opt(-kernel)"
    }

    #
    # Construct the weights







>
>
>
>







87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
    if { $opt(-bandwidth) <= 0.0 } {
        return -code error -errorcode ARG -errorinfo "The bandwidth must be positive: $opt(-bandwidth)"
    }

    if { $opt(-number) <= 0.0 } {
        return -code error -errorcode ARG -errorinfo "The number of bins must be positive: $opt(-number)"
    }

    if { [lindex $opt(-interval) 0] == [lindex $opt(-interval) 1] } {
        return -code error -errorcode ARG -errorinfo "The interval has length zero: $opt(-interval)"
    }

    if { [llength [info proc $opt(-kernel)]] == 0 } {
        return -code error -errorcode ARG -errorinfo "Unknown kernel function: $opt(-kernel)"
    }

    #
    # Construct the weights
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
            if { $d != {} } {
                lappend weight [expr {$w / $scale}]
            } else {
                lappend weight {}
            }
        }
    } else {
        set weight [lrepeat $ndata [expr {1.0/$ndata}]]
    }

    #
    # Construct the centres of the bins
    #
    set xbegin [lindex $opt(-interval) 0]
    set xend   [lindex $opt(-interval) 1]
    set dx     [expr {($xend - $xbegin) / $opt(-number)}]
    set xb     [expr {$xbegin + 0.5 * $dx}]
    set xvalue {}
    for {set i 0} {$i < $opt(-number)} {incr i} {
        lappend xvalue [expr {$xb + $i * $dx}]
    }

    #







|







|







121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
            if { $d != {} } {
                lappend weight [expr {$w / $scale}]
            } else {
                lappend weight {}
            }
        }
    } else {
        set weight [lrepeat [llength $data] [expr {1.0/$ndata}]] ;# Note: missing values have weight zero
    }

    #
    # Construct the centres of the bins
    #
    set xbegin [lindex $opt(-interval) 0]
    set xend   [lindex $opt(-interval) 1]
    set dx     [expr {($xend - $xbegin) / double($opt(-number))}]
    set xb     [expr {$xbegin + 0.5 * $dx}]
    set xvalue {}
    for {set i 0} {$i < $opt(-number)} {incr i} {
        lappend xvalue [expr {$xb + $i * $dx}]
    }

    #
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# Note:
#    The standard deviation is 1.
#
proc ::math::statistics::gaussian {x} {
    return [expr {exp(-0.5*$x*$x) / sqrt(2.0*acos(-1.0))}]
}
proc ::math::statistics::uniform {x} {
    if { abs($x) < 1.0 } {
        return 0.5
    } else {
        return 0.0
    }
}
proc ::math::statistics::triangular {x} {
    if { abs($x) < 1.0 } {







|







174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# Note:
#    The standard deviation is 1.
#
proc ::math::statistics::gaussian {x} {
    return [expr {exp(-0.5*$x*$x) / sqrt(2.0*acos(-1.0))}]
}
proc ::math::statistics::uniform {x} {
    if { abs($x) <= 1.0 } {
        return 0.5
    } else {
        return 0.0
    }
}
proc ::math::statistics::triangular {x} {
    if { abs($x) < 1.0 } {
Changes to modules/math/statistics.test.
595
596
597
598
599
600
601















































































602
603
604
} -result -0.175757575758

test "spearman-rank-extended-1.0" "Test extended Spearman rank correlation procedure" -match tolerant -body {
    ::math::statistics::spearman-rank-extended {106  86 100 101  99 103  97 113 112 110} \
                                               {  7   0  27  50  28  29  20  12   6  17}
} -result {-0.175757575758 10 -0.456397284}

















































































# End of test cases
testsuiteCleanup







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
} -result -0.175757575758

test "spearman-rank-extended-1.0" "Test extended Spearman rank correlation procedure" -match tolerant -body {
    ::math::statistics::spearman-rank-extended {106  86 100 101  99 103  97 113 112 110} \
                                               {  7   0  27  50  28  29  20  12   6  17}
} -result {-0.175757575758 10 -0.456397284}

#
# Note: for the uniform kernel the sum is roughly 0.92. It would appear from a visual
# investigation that the sides may influence this, as well as the default bandwidth.
# Leave it for now.
#
test "kernel-density-1.0" "Test various kernel functions" -body {
    set data {1 2 3 4 5 6 7 8 9 10}

    set roughlyOne {}

    foreach kernel {gaussian uniform triangular epanechnikov biweight cosine logistic} {
        set result [::math::statistics::kernel-density $data -kernel $kernel]

        set sum 0.0
        set xbegin [lindex $result 2 0]
        set xend   [lindex $result 2 1]
        set number [llength [lindex $result 0]]
        set dx     [expr {($xend-$xbegin) / $number}]

        #
        # Integral should be roughly one
        #
        set sum 0.0
        foreach v [lindex $result 1] {
            set sum [expr {$sum + $dx * $v}]
        }

        lappend roughlyOne [expr {abs($sum-1.0) < 0.01}]
    }

    return $roughlyOne
} -result {1 0 1 1 1 1 1}

test "kernel-density-1.1" "Test various options - just that they have effect" -body {
    set subResults {}

    set data {1 2 3 4 5 6 7 8 9 10}

    set result [::math::statistics::kernel-density $data -number 20]
    lappend subResults [llength [lindex $result 0]]  ;# Number of bins
    lappend subResults [llength [lindex $result 1]]  ;# Number of density values

    set result [::math::statistics::kernel-density $data -interval {0 20}]
    lappend subResults [lindex $result 2 0]          ;# Beginning of interval
    lappend subResults [lindex $result 2 1]          ;# End of interval
    lappend subResults [expr {[lindex $result 0 0]   > [lindex $result 2 0]}] ;# First bin -- beginning of interval
    lappend subResults [expr {[lindex $result 0 0]   < [lindex $result 2 1]}] ;# First bin -- end of interval
    lappend subResults [expr {[lindex $result 0 end] > [lindex $result 2 0]}] ;# Last bin -- beginning of interval
    lappend subResults [expr {[lindex $result 0 end] < [lindex $result 2 1]}] ;# Last bin -- end of interval

    set result [::math::statistics::kernel-density $data -bandwidth 2]
    lappend subResults [lindex $result 2 end]        ;# Bandwidth

    return $subResults
} -result {20 20 0 20 1 1 1 1 2}

test "kernel-density-1.2" "Dealing with missing values" -body {
    set subResults {}

    set data {1 2 3 4 {} 6 7 8 9 10}

    set result [::math::statistics::kernel-density $data]

    set sum 0.0
    set xbegin [lindex $result 2 0]
    set xend   [lindex $result 2 1]
    set number [llength [lindex $result 0]]
    set dx     [expr {($xend-$xbegin) / $number}]

    #
    # Integral should be roughly one
    #
    set sum 0.0
    foreach v [lindex $result 1] {
        set sum [expr {$sum + $dx * $v}]
    }

    return [expr {abs($sum-1.0) < 0.01}]
} -result 1

# End of test cases
testsuiteCleanup