Ticket UUID: | 1825092 | |||
Title: | lost char when extracting (some) parts | |||
Type: | Bug | Version: | None | |
Submitter: | nobody | Created on: | 2007-11-03 10:27:03 | |
Subsystem: | mime | Assigned To: | andreas_kupries | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2007-11-06 06:15:03 | |
Resolution: | Fixed | Closed By: | andreas_kupries | |
Closed on: | 2007-11-05 23:15:03 | |||
Description: |
The problem is that some extracted files are shorter than the original. I extracted the attachement with mime::copymessage, and there was one byte missing at the end of the extracted base64 encoded file. So the error seems to occur where the mime package builds up its state array where the offset and length of parts are stored. Same problem in Win ActiveStateTCL 8.5.0.0 beta-6, the last base64 encoded line has 63 chars, the original 64 chars. ####test code: set token [mime::initialize -file mail20071020-175849-0] set allparts [mime::getproperty $token parts] # I know this mail has two parts: one plain text, one attachement set attachtok [lindex $allparts 1] # test file name out: set ts [clock format [clock seconds] -format {%Y%m%d-%H%M%S}] set ofn a0036-test-${ts}.part set ofh [open $ofn w] fconfigure $ofh -translation binary mime::copymessage $attachtok $ofh close $ofh | |||
User Comments: |
andreas_kupries added on 2007-11-06 06:15:03:
Logged In: YES user_id=75003 Originator: NO The bug has been fixed now, and the modification is now in the SF Tcllib CVS, together with an extended testsuite. Deeper analysis: The trouble is line 893 mime.tcl parsepart has parsed a part of the multipart out of it and is about to recurse into this sub-part for further analysis (part headers). It has start/end offsets, both of which are ok and now computes the size of the part. The general formula is size = end-start-1 = end-(start+1) However the actual size of the part is without the last EOL char (sequence). So this changes to size = end-start-eol-1 = end-(start+eol+1) Looking at the code we see a 'end-(start+3)', implying that mime assumes eol = 2, i.e. a 2 char eol sequence. Which for our case (plain unix file) is wrong. Luckily for use we can determine the actual length of the eol sequence in line 855f. If a \r is detected and removed eol = 2. Otherwise eol = 1. Replacing +3 with eol+1 and eol per above gives us a good size for the part and the resulting file is ok, without missing characters. mstrhh added on 2007-11-05 02:57:22: Logged In: YES user_id=1928531 Originator: NO Debugging gives this analysis: the parsepart code has a fixed line ending sequence of "crlf" coded in. Converting the input file to crlf line endings solved the extraction problem. So expect to have problems on Mac and Unix with this code. mstrhh added on 2007-11-03 17:32:52: Logged In: YES user_id=1928531 Originator: NO I submitted the bug. The bug makes it impossible to run automatic email processing with file attachments. nobody added on 2007-11-03 17:27:06: File Added - 252640: mail20071020-175849-0 |
Attachments:
- mail20071020-175849-0 [download] added by [email protected] on 2007-11-03 17:27:03. [details]