Tk Library Source Code

View Ticket
Login
Ticket UUID: 1825092
Title: lost char when extracting (some) parts
Type: Bug Version: None
Submitter: nobody Created on: 2007-11-03 10:27:03
Subsystem: mime Assigned To: andreas_kupries
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2007-11-06 06:15:03
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2007-11-05 23:15:03
Description:
The problem is that some extracted files are shorter than the original.

I extracted the attachement with mime::copymessage, and there was one
byte missing at the end of the extracted base64 encoded file.

So the error seems to occur where the mime package builds up its state array where the offset and length of parts are stored.

Same problem in Win ActiveStateTCL 8.5.0.0 beta-6, the last base64 encoded line has 63 chars, the original 64 chars.

####test code:
set token [mime::initialize -file mail20071020-175849-0]

set allparts [mime::getproperty $token parts]
# I know this mail has two parts: one plain text, one attachement
set attachtok [lindex $allparts 1]
# test file name out:
set ts [clock format [clock seconds] -format {%Y%m%d-%H%M%S}]
set ofn a0036-test-${ts}.part

set ofh [open $ofn w]
fconfigure $ofh -translation binary
mime::copymessage $attachtok $ofh
close $ofh
User Comments: andreas_kupries added on 2007-11-06 06:15:03:
Logged In: YES 
user_id=75003
Originator: NO

The bug has been fixed now, and the modification is now in the SF Tcllib CVS, together with an extended testsuite.

Deeper analysis:

The trouble is line 893 mime.tcl

parsepart has parsed a part of the multipart out of it and is about to recurse into this sub-part for further analysis (part headers).

It has start/end offsets, both of which are ok and now computes the size of the part. The general formula is
size = end-start-1 = end-(start+1)

However the actual size of the part is without the last EOL char (sequence). So this changes to

size = end-start-eol-1 = end-(start+eol+1)

Looking at the code we see a 'end-(start+3)', implying that mime assumes eol = 2, i.e. a 2 char eol sequence. Which for our case (plain unix file) is wrong. Luckily for use we can determine the actual length of the eol sequence in line 855f. If a \r is detected and removed eol = 2. Otherwise eol = 1.

Replacing +3 with eol+1 and eol per above gives us a good size for the part and the resulting file is ok, without missing characters.

mstrhh added on 2007-11-05 02:57:22:
Logged In: YES 
user_id=1928531
Originator: NO

Debugging gives this analysis:

the parsepart code has a fixed line ending sequence of "crlf" coded in. Converting the input file to crlf line endings solved the extraction problem.

So expect to have problems on Mac and Unix with this code.

mstrhh added on 2007-11-03 17:32:52:
Logged In: YES 
user_id=1928531
Originator: NO

I submitted the bug.

The bug makes it impossible to run automatic email processing with file attachments.

nobody added on 2007-11-03 17:27:06:

File Added - 252640: mail20071020-175849-0

Attachments: