Tcl Library Source Code

Documentation
Login
[comment {-*- tcl -*- doctools manpage}]
[manpage_begin math::PCA n 1.0]
[keywords PCA]
[keywords statistics]
[keywords math]
[keywords tcl]
[moddesc   {Principal Components Analysis}]
[titledesc {Package for Principal Component Analysis}]
[category  Mathematics]
[require Tcl [opt 8.6]]
[require math::linearalgebra 1.0]

[description]
[para]
The PCA package provides a means to perform principal components analysis
in Tcl, using an object-oriented technique as facilitated by TclOO. It
actually defines a single public method, [term ::math::PCA::createPCA],
which constructs an object based on the data that are passed to perform
the actual analysis.

[para]
The methods of the PCA objects that are created with this command allow one
to examine the principal components, to approximate (new) observations
using all or a selected number of components only and to examine the
properties of the components and the statistics of the approximations.

[para]
The package has been modelled after the PCA example provided by the
original linear algebra package by Ed Hume.

[section "Commands"]
The [term math::PCA] package provides one public command:
[list_begin definitions]

[call [cmd ::math::PCA::createPCA] [arg data] [opt args]]
Create a new object, based on the data that are passed via the [term data] argument.
The principal components may be based on either correlations or covariances.
All observations will be normalised according to the mean and standard deviation
of the original data.

[list_begin arguments]
[arg_def list data] - A list of observations (see the example below).
[arg_def list args] - A list of key-value pairs defining the options. Currently there is
only one key: [term -covariances]. This indicates if covariances are to be used
(if the value is 1) or instead correlations (value is 0). The default is to use
correlations.
[list_end]

[list_end]

The PCA object that is created has the following methods:
[list_begin definitions]

[call [cmd "\$pca using"] [opt number]|[opt "-minproportion value"]]
Set the number of components to be used in the analysis (the number of retained components).
Returns the number of components, also if no argument is given.

[list_begin arguments]
[arg_def int number] - The number of components to be retained
[arg_def double value] - Select the number of components based on the minimum proportion
of variation that is retained by them. Should be a value between 0 and 1.
[list_end]


[call [cmd "\$pca eigenvectors"] [opt option]]
Return the eigenvectors as a list of lists.

[list_begin arguments]
[arg_def string option] - By default only the [emph retained] components are returned.
If all eigenvectors are required, use the option [term -all].
[list_end]


[call [cmd "\$pca eigenvalues"] [opt option]]
Return the eigenvalues as a list of lists.

[list_begin arguments]
[arg_def string option] - By default only the eigenvalues of the [emph retained] components are returned.
If all eigenvalues are required, use the option [term -all].
[list_end]


[call [cmd "\$pca proportions"] [opt option]]
Return the proportions for all components, that is, the amount of variations that each
components can explain.


[call [cmd "\$pca approximate"] [arg observation]]
Return an approximation of the observation based on the retained components

[list_begin arguments]
[arg_def list observation] - The values for the observation.
[list_end]


[call [cmd "\$pca approximatOriginal"]]
Return an approximation of the original data, using the retained components. It is
a convenience method that works on the complete set of original data.


[call [cmd "\$pca scores"] [arg observation]]
Return the scores per retained component for the given observation.

[list_begin arguments]
[arg_def list observation] - The values for the observation.
[list_end]


[call [cmd "\$pca distance"] [arg observation]]
Return the distance between the given observation and its approximation. (Note:
this distance is based on the normalised vectors.)

[list_begin arguments]
[arg_def list observation] - The values for the observation.
[list_end]


[call [cmd "\$pca qstatistic"] [arg observation] [opt option]]
Return the Q statistic, basically the square of the distance, for the given observation.

[list_begin arguments]
[arg_def list observation] - The values for the observation.
[arg_def string option] - If the observation is part of the original data, you may want
to use the corrected Q statistic. This is achieved with the option "-original".
[list_end]

[list_end]

[section EXAMPLE]
TODO: NIST example



[vset CATEGORY PCA]
[include ../common-text/feedback.inc]
[manpage_end]