Tk Library Source Code

View Ticket
Login
Ticket UUID: 1969078
Title: Base64 speedup
Type: Bug Version: None
Submitter: andreas_kupries Created on: 2008-05-21 19:47:04
Subsystem: base64 Assigned To: andreas_kupries
Priority: 9 Immediate Severity:
Status: Closed Last Modified: 2008-05-23 22:45:56
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2008-05-23 15:45:56
Description:
The current implementation of base64 in tcllib is very slow, even
(or especially) when Trf is used. Funny enough,
the problem is not the base64 encoding itself, but the
reflow to a different width. The reflow is written in Tcl
but a seldom example of inefficient code.

Below is a patch to tcllib 1.10 with a much faster reflow
code, and an omission of reflowing when not needed.

Here are a few results comparing the speed of old and new
reflow code

string length  time old  time new  factor
 792800          792800     16638     419
1585600        34025025     33064    1029
3171200       149875843     66856    2241

In other words, for a 3MB file , the improvement is
more than a factor of 2000. It should not take
more than 2minutes on a recent processor to reflow
the encoding of a 3MB file. Note, that the factor
of improvement grows linearly with the size of the
import.

The base64 part of tcllib is used often in OpenACS
for mailing files of the content repository, it is a shame
to see that the server comes more or less to a halt
when base64 encoding is performed.

Please, if possible, apply the following patch
to tcllib for the next release.

Attached is a small program to show the speed
differences leading to the table above.

all the best
-gustaf
User Comments: andreas_kupries added on 2008-05-22 06:09:19:
Logged In: YES 
user_id=75003
Originator: YES

Ah, maxlen 0 is special, it means: Do not wrap at all. ... The new reflow code should special case that, the loop can be skipped entirely, only 'string map' to remove \n is needed.

andreas_kupries added on 2008-05-22 06:07:39:
Logged In: YES 
user_id=75003
Originator: YES

Negative -maxlen values ...
Tcl implementation => The wrapchar sequence is added after _every_ regular coding character.
Trf => The Reflow algorithm (current) will enter an infinite loop (string length can be reduced to at most 0, and will always be > maxlen, causing the loop to not terminate). The generated result will further not contain anything from the regular output, but only wrapchars. The commands extracting the regular result will return the empty string.

The new reflow algorithm will not loop indefinitely, howver the result will be as bogus as before, only wrapchars, and nothing else.

Definitely has to be checked. maxlen 0 is already bad. maxlen 1 is the smallest value we can work with.

andreas_kupries added on 2008-05-22 03:57:37:
Logged In: YES 
user_id=75003
Originator: YES

Notes from further correspondence.
(1) We can special maxlen 76 with different wrapchar, that case allows full-speed string map
(2) RFC 2045, RFC 3548 section 2.1 state that MIME uses 76 by default, the 60 char default in tcllib is non-optimal.

=> Change default maxlen to 76 per mime.
(4) As wrapchar default is \n this change will cause us to hit the zero-time special case (no reflow needed at all) most of the time, and for the remainder we get fast linear-time algorithms with this patch and per (1). No square-time at all.

andreas_kupries added on 2008-05-22 02:50:04:
Logged In: YES 
user_id=75003
Originator: YES

An example script was submitted, but cannot be attached here, SF rejects it as too large even when max compressed (bzip2).

andreas_kupries added on 2008-05-22 02:47:48:

File Added - 278581: base64-tcllib-1-10.patch

Logged In: YES 
user_id=75003
Originator: YES

Original submitter of patch is gustaf neuman.

File Added: base64-tcllib-1-10.patch

Attachments: