Ticket UUID: | 1663970 | |||
Title: | Submitting code for multivar linear regression | |||
Type: | RFE | Version: | None | |
Submitter: | erickb | Created on: | 2007-02-20 01:30:48 | |
Subsystem: | tcllib: request for new module | Assigned To: | arjenmarkus | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2007-05-08 14:07:33 | |
Resolution: | Closed By: | arjenmarkus | ||
Closed on: | 2007-05-08 07:07:33 | |||
Description: |
Hi, I have written some code for carrying out multivariate linear regressions. I think it would be a good candidate for tcllib's math::statistics module, and when I posted a version of it to the tcler's wiki, I was asked (by AM & LV) to submit it here. I'm attaching a zip file with the following contents: - mvlinreg.tcl : The code for the package - pkgIndex.tcl : A basic pkgIndex for the submitted code to work as its own package - mvlinreg_test.tcl : A single example that exercises the code in a "main path" way (no exceptions are tested) - mvlr.man.txt : A simple man page in plain text format (no *roff) Some comments: - Although I don't offer a test for the wls function directly (it's tested indirectly since it's called from the ols procedure), I've tested it as part of the SWLS code that I posted on the Tcler's wiki, and reproduced published results. - For the example included in the zip file, I compared the output from the example against the output for the same data using Excel's "LINEST" function and they match. - I'm submitting this using the BSD license, on the understanding that that is the license used for tcllib. But the main point is not BSD - I want to submit the code under a license that is compatible with it being used in tcllib, so let me know if BSD isn't the right choice. Thanks! Eric | |||
User Comments: |
arjenmarkus added on 2007-05-08 14:07:33:
Logged In: YES user_id=400048 Originator: NO Closed the request - the code is included in Tcllib erickb added on 2007-03-21 04:38:09: Logged In: YES user_id=816411 Originator: YES That's great! Thank you. arjenmarkus added on 2007-03-21 04:33:30: Logged In: YES user_id=400048 Originator: NO Done. The call to the regression routines now take a single argument: a list of data or an alternating list of data and weights. This allows us to avoid [eval] and makes the precise formatting of the data much less important. arjenmarkus added on 2007-03-20 14:55:35: Logged In: YES user_id=400048 Originator: NO Okay, we'll do it that way - only a very small change required. And avoiding eval is - in general - a worthy goal :) erickb added on 2007-03-19 22:34:34: Logged In: YES user_id=816411 Originator: YES That sounds right. It also avoids the awkwardness of using "eval" when calling the proc when the argument has been built as a list. arjenmarkus added on 2007-03-19 22:05:37: Logged In: YES user_id=400048 Originator: NO I realised what is going wrong without the backslashes: [eval] will then break up the data into separate lines! Hm, perhaps we should use a different API: ols $alldata and wls $alldata_with_weights instead of separate arguments. arjenmarkus added on 2007-03-19 15:26:33: Logged In: YES user_id=400048 Originator: NO The dependency on math::linearalgebra is not the actual problem: I would rather encourage the reuse of this functionality than duplicating it. No, the issue is that the test utilities as we use them in Tcllib have to know of this dependency too. That is what I was asking about - it is new to me and it was actually very puzzling. More important is the issue with the backslashes - I will have a look myself too. erickb added on 2007-03-19 04:11:17: Logged In: YES user_id=816411 Originator: YES Thank you for checking and spotting those problems! I should have mentioned the dependency on the linear algebra package. What is the best thing to do in this case? It will not be easy to recreate the features of the linear algebra package that I need, but I could do it. Regarding reading in data files - you're right, it should allow for that. I see that it's not robust enough. I'll work on that. Thanks again. arjenmarkus added on 2007-03-19 02:59:50: Logged In: YES user_id=400048 Originator: NO I have updated the man page and converted the example to a test in accordance with tcltest. However, I came across two nasty/surprising aspects: - The statistics package now depends on the linear algebra package. I had to add the command "useLocal linalg.tcl math::linearalgebra" at the start to avoid set-up errors (still some spurious messages occur). - Leaving out the backslashes at the end of the data lines causes an error in mv-wls: it will see but a single data point. This leads me to conclude that the API is not quite appropriate yet: it should be possible to store the data in a list of lists - right now it is more of a string that happens to contain some lists. We will need to look into this more closely erickb added on 2007-02-28 17:33:50: Logged In: YES user_id=816411 Originator: YES Thanks! I have one fix to the man page. I realized that I made an incorrect description of what the tstat function does. Sorry about that. The attached man page should be correct. File Added: mvlr.man.txt erickb added on 2007-02-28 17:33:49: File Added - 218128: mvlr.man.txt arjenmarkus added on 2007-02-28 03:35:07: Logged In: YES user_id=400048 Originator: NO I have downloaded the zip file and I will incorporate the code and the documentation in the math module. This means: - Changing the namespace for the package to math::statistics - Adding a standard header - Transforming the documentation to man page format - Using tcltest for the test case(s) erickb added on 2007-02-20 08:30:48: File Added - 216803: mvlinreg01.zip |