Tk Library Source Code

View Ticket
Login
Ticket UUID: 1663970
Title: Submitting code for multivar linear regression
Type: RFE Version: None
Submitter: erickb Created on: 2007-02-20 01:30:48
Subsystem: tcllib: request for new module Assigned To: arjenmarkus
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2007-05-08 14:07:33
Resolution: Closed By: arjenmarkus
    Closed on: 2007-05-08 07:07:33
Description:
Hi,

I have written some code for carrying out multivariate linear regressions. I think it would be a good candidate for tcllib's math::statistics module, and when I posted a version of it to the tcler's wiki, I was asked (by AM & LV) to submit it here.

I'm attaching a zip file with the following contents:

 - mvlinreg.tcl : The code for the package
 - pkgIndex.tcl : A basic pkgIndex for the submitted code to work as its own package
 - mvlinreg_test.tcl : A single example that exercises the code in a "main path" way (no exceptions are tested)
 - mvlr.man.txt : A simple man page in plain text format (no *roff)

Some comments:
 - Although I don't offer a test for the wls function directly (it's tested indirectly since it's called from the ols procedure), I've tested it as part of the SWLS code that I posted on the Tcler's wiki, and reproduced published results.
 - For the example included in the zip file, I compared the output from the example against the output for the same data using Excel's "LINEST" function and they match.
 - I'm submitting this using the BSD license, on the understanding that that is the license used for tcllib. But the main point is not BSD - I want to submit the code under a license that is compatible with it being used in tcllib, so let me know if BSD isn't the right choice.

Thanks!
Eric
User Comments: arjenmarkus added on 2007-05-08 14:07:33:
Logged In: YES 
user_id=400048
Originator: NO

Closed the request - the code is included in Tcllib

erickb added on 2007-03-21 04:38:09:
Logged In: YES 
user_id=816411
Originator: YES

That's great! Thank you.

arjenmarkus added on 2007-03-21 04:33:30:
Logged In: YES 
user_id=400048
Originator: NO

Done.

The call to the regression routines now take a single argument: a list of data or an alternating list of
data and weights. This allows us to avoid [eval] and makes the precise formatting of the data much
less important.

arjenmarkus added on 2007-03-20 14:55:35:
Logged In: YES 
user_id=400048
Originator: NO

Okay, we'll do it that way - only a very small change required.
And avoiding eval is - in general - a worthy goal :)

erickb added on 2007-03-19 22:34:34:
Logged In: YES 
user_id=816411
Originator: YES

That sounds right. It also avoids the awkwardness of using "eval" when calling the proc when the argument has been built as a list.

arjenmarkus added on 2007-03-19 22:05:37:
Logged In: YES 
user_id=400048
Originator: NO

I realised what is going wrong without the backslashes: [eval] will then break up the
data into separate lines! 

Hm, perhaps we should use a different API:

   ols $alldata

and 

   wls $alldata_with_weights

instead of separate arguments.

arjenmarkus added on 2007-03-19 15:26:33:
Logged In: YES 
user_id=400048
Originator: NO

The dependency on math::linearalgebra is not the actual problem: I would rather
encourage the reuse of this functionality than duplicating it.

No, the issue is that the test utilities as we use them in Tcllib have to know 
of this dependency too. That is what I was asking about - it is new to me and it
was actually very puzzling.

More important is the issue with the backslashes - I will have a look myself too.

erickb added on 2007-03-19 04:11:17:
Logged In: YES 
user_id=816411
Originator: YES

Thank you for checking and spotting those problems! I should have mentioned the dependency on the linear algebra package. What is the best thing to do in this case? It will not be easy to recreate the features of the linear algebra package that I need, but I could do it.

Regarding reading in data files - you're right, it should allow for that. I see that it's not robust enough. I'll work on that.

Thanks again.

arjenmarkus added on 2007-03-19 02:59:50:
Logged In: YES 
user_id=400048
Originator: NO

I have updated the man page and converted the example to a test in accordance
with tcltest.

However, I came across two nasty/surprising aspects:

- The statistics package now depends on the linear algebra package. I had to 
  add the command "useLocal linalg.tcl math::linearalgebra" at the start to 
  avoid set-up errors (still some spurious messages occur).

- Leaving out the backslashes at the end of the data lines causes an error 
  in mv-wls: it will see but a single data point. This leads me to conclude 
  that the API is not quite appropriate yet: it should be possible to store the 
  data in a list of lists - right now it is more of a string that happens to
  contain some lists.

We will need to look into this more closely

erickb added on 2007-02-28 17:33:50:
Logged In: YES 
user_id=816411
Originator: YES

Thanks! I have one fix to the man page. I realized that I made an incorrect description of what the tstat function does. Sorry about that. The attached man page should be correct.

File Added: mvlr.man.txt

erickb added on 2007-02-28 17:33:49:

File Added - 218128: mvlr.man.txt

arjenmarkus added on 2007-02-28 03:35:07:
Logged In: YES 
user_id=400048
Originator: NO

I have downloaded the zip file and I will incorporate the code and 
the documentation in the math module. This means:
- Changing the namespace for the package to math::statistics
- Adding a standard header
- Transforming the documentation to man page format
- Using tcltest for the test case(s)

erickb added on 2007-02-20 08:30:48:

File Added - 216803: mvlinreg01.zip

Attachments: