Ticket UUID: | 1969078 | |||
Title: | Base64 speedup | |||
Type: | Bug | Version: | None | |
Submitter: | andreas_kupries | Created on: | 2008-05-21 19:47:04 | |
Subsystem: | base64 | Assigned To: | andreas_kupries | |
Priority: | 9 Immediate | Severity: | ||
Status: | Closed | Last Modified: | 2008-05-23 22:45:56 | |
Resolution: | Fixed | Closed By: | andreas_kupries | |
Closed on: | 2008-05-23 15:45:56 | |||
Description: |
The current implementation of base64 in tcllib is very slow, even (or especially) when Trf is used. Funny enough, the problem is not the base64 encoding itself, but the reflow to a different width. The reflow is written in Tcl but a seldom example of inefficient code. Below is a patch to tcllib 1.10 with a much faster reflow code, and an omission of reflowing when not needed. Here are a few results comparing the speed of old and new reflow code string length time old time new factor 792800 792800 16638 419 1585600 34025025 33064 1029 3171200 149875843 66856 2241 In other words, for a 3MB file , the improvement is more than a factor of 2000. It should not take more than 2minutes on a recent processor to reflow the encoding of a 3MB file. Note, that the factor of improvement grows linearly with the size of the import. The base64 part of tcllib is used often in OpenACS for mailing files of the content repository, it is a shame to see that the server comes more or less to a halt when base64 encoding is performed. Please, if possible, apply the following patch to tcllib for the next release. Attached is a small program to show the speed differences leading to the table above. all the best -gustaf | |||
User Comments: |
andreas_kupries added on 2008-05-22 06:09:19:
Logged In: YES user_id=75003 Originator: YES Ah, maxlen 0 is special, it means: Do not wrap at all. ... The new reflow code should special case that, the loop can be skipped entirely, only 'string map' to remove \n is needed. andreas_kupries added on 2008-05-22 06:07:39: Logged In: YES user_id=75003 Originator: YES Negative -maxlen values ... Tcl implementation => The wrapchar sequence is added after _every_ regular coding character. Trf => The Reflow algorithm (current) will enter an infinite loop (string length can be reduced to at most 0, and will always be > maxlen, causing the loop to not terminate). The generated result will further not contain anything from the regular output, but only wrapchars. The commands extracting the regular result will return the empty string. The new reflow algorithm will not loop indefinitely, howver the result will be as bogus as before, only wrapchars, and nothing else. Definitely has to be checked. maxlen 0 is already bad. maxlen 1 is the smallest value we can work with. andreas_kupries added on 2008-05-22 03:57:37: Logged In: YES user_id=75003 Originator: YES Notes from further correspondence. (1) We can special maxlen 76 with different wrapchar, that case allows full-speed string map (2) RFC 2045, RFC 3548 section 2.1 state that MIME uses 76 by default, the 60 char default in tcllib is non-optimal. => Change default maxlen to 76 per mime. (4) As wrapchar default is \n this change will cause us to hit the zero-time special case (no reflow needed at all) most of the time, and for the remainder we get fast linear-time algorithms with this patch and per (1). No square-time at all. andreas_kupries added on 2008-05-22 02:50:04: Logged In: YES user_id=75003 Originator: YES An example script was submitted, but cannot be attached here, SF rejects it as too large even when max compressed (bzip2). andreas_kupries added on 2008-05-22 02:47:48: File Added - 278581: base64-tcllib-1-10.patch Logged In: YES user_id=75003 Originator: YES Original submitter of patch is gustaf neuman. File Added: base64-tcllib-1-10.patch |
Attachments:
- base64-tcllib-1-10.patch [download] added by andreas_kupries on 2008-05-22 02:47:48. [details]