QuotedPrintableEncoder provides two interfaces for performing quoted-printable encoding.
A high-level interface encodes from an input String to an output String. This interface comprises a single member function, encode().
A low-level interface allows encoding by passing multiple buffers to the encoder. The correct procedure for using this interface is described below.
QuotedPrintableEncoder allows you to change certain options, which affect the behavior of the encoder:
Maximum Line Length The value of this option determines the maximum length of a line in the encoder's output. The maximum value allowed for this option is 76, which is the maximum line length allowed by the MIME standard. The default value is 76.
Output CR LF If you set this option to true, then the encoder uses CR LF as the end-of-line characters in the encoded output. If you set this option to false, then the encoder uses LF alone. The default value is true if TextUtil::EOL_CHARS is TextUtil::CRLF.
Suppress Final Newline If you set this option to true, then the encoder does not put a final newline (CR LF, or LF) at the end of the encoded output, unless the input ends with a hard line break. If you set this option to false, then the encoder always adds a newline to the end of the encoded output. If the option is false and the input does not end with a hard line break, then the encoder adds a soft line break (that is, "=\r\\n" or "=\\n") at the end of the encoded output. The default value is false.
Protect From Because of the way many applications store mail messages in a file, if the string "From " occurs at the beginning of a line, it is often changed to ">From ". If this option is true, then when "From " occurs at the beginning of a line, the encoder encodes it as "From=20" -- that is, the space character is encoded using the hex encoding. The default value is true.
Protect Dot Because of the way mail messages are sent in the SMTP and POP3 protocols, a dot (the character '.') at the beginning of a line is treated specially. A single dot on a line by itself indicates the end of the message. For this reason, the protocol implementation must scan every line for a dot at the beginning and "escape" it by adding an extra dot when sending, and "unescape" it by removing the extra dot when receiving. If this option is true, then the encoder encodes every dot at the beginning of a line using the hex encoding. Setting this option may provide better interoperability with SMTP or POP3 implementations that do not correctly handle the escaping or unescaping of the dot character. The default value is true.
Encode Map This option allows you to control precisely the characters that the encoder encodes. Two encoding maps, ENCODE_MAP_LOW_RISK and ENCODE_MAP_HIGH_RISK, are provided by static data members. See setEncodeMap() for the details of how to create your own encode map. The default value is ENCODE_MAP_LOW_RISK, which encodes all the recommended characters listed in RFC 2045 (page 21).
Using the Low-Level Interface
The low-level interface allows you to encode 8-bit text data one buffer at a time; thus you may encode text data of unlimited size using a limited amount of memory. For example, if you want to encode data from an input file to an output file, you may read from the input file one buffer at a time, pass each buffer to the encoder, and write to the output file one buffer at a time.
The low-level interface comprises three member functions: start(), encodeSegment(), and finish(). The procedure is described here:
inBuf, set inBuf.bytes to a char array that contains the data to be encoded, set inBuf.pos to the offset of the beginning of the data in inBuf.bytes, and set inBuf.endPos to the offset of the first byte past the end of the data in inBuf.bytes. To initialize an output buffer named outBuf, set outBuf.chars to a char array, set outBuf.pos to zero, and set outBuf.endPos to the length of the array referenced by outBuf.chars.
outBuf.pos == outBuf.endPos, then the output buffer is full, and you must make room in the output buffer before you call encodeSegment() again. If inBuf.pos == inBuf.endPos, then the input buffer is empty, and you must supply the input buffer with more data before you call encodeSegment() again.
You may use the same encoder object for multiple encode operations.
Public Member Functions | |
| QuotedPrintableEncoder () | |
| Default constructor. | |
| ~QuotedPrintableEncoder () | |
| Destructor. | |
| void | setMaxLineLen (int len) |
| Sets the maximum line length of the encoded output. | |
| int | maxLineLen () |
| Gets the maximum line length of the encoded output. | |
| void | setOutputCrLf (bool b) |
| Sets the CRLF end-of-line characters option. | |
| bool | outputCrLf () |
| Gets the CRLF end-of-line characters option. | |
| void | setSuppressFinalNewline (bool b) |
| Sets the option to suppress a final newline in the output. | |
| bool | suppressFinalNewline () |
| Gets the option to suppress a final newline in the output. | |
| void | setProtectFrom (bool b) |
| Sets the option to protect "From " at the beginning of a line. | |
| bool | protectFrom () |
| Gets the option to protect "From " at the beginning of a line. | |
| void | setProtectDot (bool b) |
| Sets the option to protect a dot at the beginning of a line. | |
| bool | protectDot () |
| Gets the option to protect a dot at the beginning of a line. | |
| void | setEncodeMap (const unsigned char *map) |
| Sets the lookup table that determines how characters are encoded. | |
| void | start () |
| Starts a multiple-buffer encode operation. | |
| void | encodeSegment (ByteBuffer *inBuf, CharBuffer *outBuf) |
| Encodes data from the input buffer to the output buffer. | |
| void | finish (CharBuffer *outBuf) |
| Finishes a multiple-buffer encode operation. | |
| String | encode (const String &decoded) |
| Performs single-step buffer-to-buffer quoted-printable encoding. | |
Static Public Attributes | |
| const unsigned char *const | ENCODE_MAP_LOW_RISK = QpEncoderImpl_LOW_RISK_MAP |
| Low risk encoding map. | |
| const unsigned char *const | ENCODE_MAP_HIGH_RISK = QpEncoderImpl_HIGH_RISK_MAP |
| High risk encoding map. | |
|
|
Default constructor The constructor sets default values for all the options. |
|
|
Destructor. |
|
|
Performs single-step buffer-to-buffer quoted-printable encoding. To perform quoted-printable encoding using this function, create a String containing the data you want to encode and pass it as the function's argument. The returned String contains the encoded output. This member function makes it very simple to perform quoted-printable encoding. The disadvantage of this function is that it requires all the data to be kept in memory for processing. You may use the low-level interface, described in the overview section, to perform quoted-printable encoding of large data using limited memory. This member function uses the low-level interface internally. Any options set for the encoder object have the same effect using either this function or the low-level interface.
|
|
||||||||||||
|
Encodes data from the input buffer to the output buffer. This member function is an essential part of the low-level interface and performs most of the work of encoding for the QuotedPrintableEncoder class. It takes an input buffer and an output buffer as parameters, and encodes data from the input buffer until the input buffer is empty or the output buffer is full. In other words, one of the following conditions is guaranteed to be satisfied when the function returns:
You may call the function multiple times to encode multiple buffers of input data. However, before you call the function, both of the following conditions should be true:
For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder.
|
|
|
Finishes a multiple-buffer encode operation. When you use the low-level interface, the encoder buffers some data internally. Therefore, after you have passed all input data to the encoder, you must call this member function to flush the internal buffer. The following condition must be satisfied when you call the function:
The above condition must also be satisified after the function returns in order to guarantee that all output data has been written to the output buffer. You may need to call finish() more than once before the above condition is satisfied when the function returns. For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder.
|
|
|
Gets the maximum line length of the encoded output.
|
|
|
Gets the CRLF end-of-line characters option.
|
|
|
Gets the option to protect a dot at the beginning of a line.
|
|
|
Gets the option to protect "From " at the beginning of a line.
|
|
|
Sets the lookup table that determines how characters are encoded. The encode map serves as a lookup table for the encoder, determining how characters (that is, bytes or 8-bit characters) are encoded. Two encode maps, QuotedPrintableEncoder::ENCODE_MAP_LOW_RISK and QuotedPrintableEncoder::ENCODE_MAP_HIGH_RISK, are provided as static data members. By setting a user-defined encode map, you can precisely specify the characters that the encoder will encode using the hex encoding. The default is QuotedPrintableEncoder::ENCODE_MAP_LOW_RISK.
An encode map must be a char array of at least 256 chars. Each of the first 256 chars in the array must have a value of The following code example shows how to create and set an encode map that encodes all characters except CR, LF, the digits, and the letters:
static unsigned char myEncodeMap[256];
// First, set all entries to UNSAFE
int i;
for (i = 0; i < 256; ++i) {
myEncodeMap[i] = QuotedPrintableEncoder::UNSAFE;
}
// Set CR and LF to SPECIAL
myEncodeMap[10] = QuotedPrintableEncoder::SPECIAL;
myEncodeMap[13] = QuotedPrintableEncoder::SPECIAL;
// Set all digits to SAFE
for (i = 48; i < 58; ++i) {
myEncodeMap[i] = QuotedPrintableEncoder::SAFE;
}
// Set upper-case letters to SAFE
for (i = 65; i < 91; ++i) {
myEncodeMap[i] = QuotedPrintableEncoder::SAFE;
}
// Set lower-case letters to SAFE
for (i = 97; i < 123; ++i) {
myEncodeMap[i] = QuotedPrintableEncoder::SAFE;
}
encoder.setEncodeMap(myEncodeMap);One possibility that is allowed by setting a user-defined encode map is that you can encode arbitrary binary data. Normally, the quoted-printable encoding is not a good choice for encoding binary data, because the CR LF sequence (or sometimes just LF) is treated as a hard-line break. (This is bad because most binary file formats -- image or sound files, for instance -- don't have line breaks.) Therefore, you can cause the encoder to encode CR and LF using the hex encoding. You do this by setting a user-defined encode map that has the entries for CR (13) and LF (10) set to UNSAFE.
|
|
|
Sets the maximum line length of the encoded output. For MIME-compliant Internet mail, the lines should be no longer than 76 characters, and the library enforces that rule. If the len parameter is larger than 76, the library sets the encoder's maximum line length to 76. The default value is 76.
|
|
|
Sets the CRLF end-of-line characters option. If this option is true, then the encoder uses CR LF as the end-of-line characters in the encoded output. If this option is false, then the encoder uses LF alone. Normally, you do not need to set this option, because the encoder performs correctly by default. When your program starts, and before you create any threads, set TextUtil::EOL_CHARS to either TextUtil::LF or TextUtil::CRLF. (The default is TextUtil::LF.) Then, the quoted-printable encoder sets the value of this option based on the value of TextUtil::EOL_CHARS.
|
|
|
Sets the option to protect a dot at the beginning of a line. Because of the way mail messages are sent in the SMTP and POP3 protocols, a dot (the character ".") at the beginning of a line is treated specially. A single dot on a line by itself indicates the end of the message. For this reason, the protocol implementation must scan every line for a dot at the beginning and "escape" it by adding an extra dot when sending, and "unescape" it by removing the extra dot when receiving. If this option is true, then the encoder encodes every dot at the beginning of a line using the hex encoding. Setting this option may provide better interoperability with SMTP or POP3 implementations that do not correctly handle the escaping or unescaping of the dot character. The default value is true.
|
|
|
Sets the option to protect "From " at the beginning of a line. Because of the way many applications store mail messages in a file, if the string "From " occurs at the beginning of a line, it is often changed to ">From ". If this option is true, then when "From " occurs at the beginning of a line, the encoder encodes it as "From=20" -- that is, the space character is encoded using the hex encoding. The default value is true.
|
|
|
Sets the option to suppress a final newline in the output. If this option is true, then the encoder does not put a final newline (CR LF, or LF) at the end of the encoded output, unless the input ends with a hard line break. If this option is false, then the encoder always adds a newline to the end of the encoded output. If the option is false and the input does not end with a hard line break, then the encoder adds a soft line break (that is, "=\r\\n" or "=\\n") at the end of the encoded output. The default value is false (meaning that the encoder always adds a final newline).
|
|
|
Starts a multiple-buffer encode operation. If you use the low-level interface for multiple-buffer encoding, you must call start() to begin the encode operation. You may use a QuotedPrintableEncoder instance for many encode operations, but you must call start() to begin each operation. For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder. You do not need to call this method if you use the encode() member function for encoding. |
|
|
Gets the option to suppress a final newline in the output.
|
|
|
High risk encoding map. This char array acts as a look-up table that determines precisely which characters the encoder encodes using the hex encoding. The high-risk map specifies that hex encoding should be used for the following categories of characters:
|
|
|
Low risk encoding map. This char array acts as a look-up table that determines precisely which characters the encoder encodes using the hex encoding. The low-risk map specifies that hex encoding should be used for the following categories of characters:
|
Copyright © 2001-2007 Hunny Software, Inc. All rights reserved.