Ticket UUID: | 763731 | |||
Title: | word_encode makes encoded words that are too long | |||
Type: | Bug | Version: | None | |
Submitter: | gunzel | Created on: | 2003-07-01 07:44:54 | |
Subsystem: | mime | Assigned To: | andreas_kupries | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2006-10-26 06:19:41 | |
Resolution: | Fixed | Closed By: | andreas_kupries | |
Closed on: | 2006-10-25 23:19:40 | |||
Description: |
RFC2047 says that encoded words should be no longer than 75 characters including all delimiters. The current implementation of word_encode places no bounds upon the length of encoded word it will attempt to create. If using the quoted-printable method it will insert soft newlines at 72 characters of encoded content which is not permitted. If using the base64 method it will wrap the content of the encoded word at 60 encoded characters (45 bytes of encoded content) which is also not permitted. I have attached a patch that fixes these problems and also addresses other restrictions of RFC2047, namely that encoded words must end with integral characters (no encoded character may be spread between two encoded words). This requires that the encoding proc knows the actual length of each character in the chosen encoding and to calculate this it has been necessary to convert the given strings back to UTF so that the encoded words can be built up character by character in order to meet these requirements. As this conversion is quite likely to just have happened in the other direction a switch has been added to indicate that the string is still in UTF form and conversion from the passed in charset is not necessary. There is also a new switch for the maximum length of the resulting encoded word for situations where the maximum length must be less than the default (currently set at 66 which works for the Subject: header) such as in an Address field where it must coexist with the email address within the 76 character limit. As the passing in of a long string or the requirement for a shorter one may cause the generation of more than one encoded word they are returned in the "wrapped" format, separated by "\n " so they can still br dropped straight into a header field and be valid. I have also fixed some reversed terminology in the docs for field_decode. I have attached a patch with these fixes and a large number of tests for both RFC compliance and implementation errors made along the way. Some of the tests and implementation rely on improvements from Patch #758742 "charset mapencoding improvements" and bugfixes from Bug #763669 "Problems with qp_encode and encoded_word, no_softbreak" Matthew | |||
User Comments: |
andreas_kupries added on 2006-10-26 06:19:41:
File Added - 200051: word_encode-2.patch andreas_kupries added on 2006-10-26 06:19:40: Logged In: YES user_id=75003 Committed the updated patch (test fix, additional test case, qp_encode fix). gunzel added on 2003-07-01 14:44:54: File Added - 54563: word_encode.patch |