Tk Library Source Code

View Ticket
Login
Ticket UUID: 763731
Title: word_encode makes encoded words that are too long
Type: Bug Version: None
Submitter: gunzel Created on: 2003-07-01 07:44:54
Subsystem: mime Assigned To: andreas_kupries
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2006-10-26 06:19:41
Resolution: Fixed Closed By: andreas_kupries
    Closed on: 2006-10-25 23:19:40
Description:
RFC2047 says that encoded words should be no longer than 
75 characters including all delimiters. The current 
implementation of word_encode places no bounds upon the 
length of encoded word it will attempt to create. If using the 
quoted-printable method it will insert soft newlines at 72 
characters of encoded content which is not permitted. If 
using the base64 method it will wrap the content of the 
encoded word at 60 encoded characters (45 bytes of 
encoded content) which is also not permitted. 

I have attached a patch that fixes these problems and also 
addresses other restrictions of RFC2047, namely that 
encoded words must end with integral characters (no 
encoded character may be spread between two encoded 
words). This requires that the encoding proc knows the 
actual length of each character in the chosen encoding and 
to calculate this it has been necessary to convert the given 
strings back to UTF so that the encoded words can be built 
up character by character in order to meet these 
requirements. As this conversion is quite likely to just have 
happened in the other direction a switch has been added to 
indicate that the string is still in UTF form and conversion 
from the passed in charset is not necessary.

There is also a new switch for the maximum length of the 
resulting encoded word for situations where the maximum 
length must be less than the default (currently set at 66 
which works for the Subject: header) such as in an Address 
field where it must coexist with the email address within the 
76 character limit.

As the passing in of a long string or the requirement for a 
shorter one may cause the generation of more than one 
encoded word they are returned in the "wrapped" format, 
separated by "\n " so they can still br dropped straight into a  
header field and be valid.

I have also fixed some reversed terminology in the docs for 
field_decode.

I have attached a patch with these fixes and a large number 
of tests for both RFC compliance and implementation errors 
made along the way. Some of the tests and implementation  
rely on improvements from Patch #758742 "charset 
mapencoding improvements" and bugfixes from Bug 
#763669 "Problems with qp_encode and encoded_word, 
no_softbreak"

Matthew
User Comments: andreas_kupries added on 2006-10-26 06:19:41:

File Added - 200051: word_encode-2.patch

andreas_kupries added on 2006-10-26 06:19:40:
Logged In: YES 
user_id=75003

Committed the updated patch (test fix, additional test case,
qp_encode fix).

gunzel added on 2003-07-01 14:44:54:

File Added - 54563: word_encode.patch

Attachments: