Main Page | Namespace List | Class Hierarchy | Class List | Namespace Members | Class Members

QuotedPrintableEncoder Class Reference

List of all members.

Detailed Description

Quoted-printable encoding encodes 8-bit text into printable ASCII characters for sending through the Internet mail system. The encoding is required because 8-bit characters cannot pass reliably through the Internet mail system. In the quoted-printable encoding, a character that is not in the ASCII 7-bit character set is encoded as an equals sign (the character '=') followed by two hex digits (0-9 and A-F). Certain ASCII characters are also encoded. The quoted-printable encoding also encodes "soft" line breaks, since long lines also cannot pass reliably through the Internet mail system. The details of quoted-printable encoding can be found in RFC 2045.

QuotedPrintableEncoder provides two interfaces for performing quoted-printable encoding.

A high-level interface encodes from an input String to an output String. This interface comprises a single member function, encode().

A low-level interface allows encoding by passing multiple buffers to the encoder. The correct procedure for using this interface is described below.

QuotedPrintableEncoder allows you to change certain options, which affect the behavior of the encoder:

Using the Low-Level Interface

The low-level interface allows you to encode 8-bit text data one buffer at a time; thus you may encode text data of unlimited size using a limited amount of memory. For example, if you want to encode data from an input file to an output file, you may read from the input file one buffer at a time, pass each buffer to the encoder, and write to the output file one buffer at a time.

The low-level interface comprises three member functions: start(), encodeSegment(), and finish(). The procedure is described here:

  1. Call start() to initialize the encoder.

  2. Initialize an input buffer and an output buffer. These buffers are instances of ByteBuffer and CharBuffer, respectively. To initialize an input buffer named inBuf, set inBuf.bytes to a char array that contains the data to be encoded, set inBuf.pos to the offset of the beginning of the data in inBuf.bytes, and set inBuf.endPos to the offset of the first byte past the end of the data in inBuf.bytes. To initialize an output buffer named outBuf, set outBuf.chars to a char array, set outBuf.pos to zero, and set outBuf.endPos to the length of the array referenced by outBuf.chars.

  3. Call encodeSegment() with the input buffer and output buffer as arguments.

  4. Check to see if the output buffer is full or if the input buffer is empty. If outBuf.pos == outBuf.endPos, then the output buffer is full, and you must make room in the output buffer before you call encodeSegment() again. If inBuf.pos == inBuf.endPos, then the input buffer is empty, and you must supply the input buffer with more data before you call encodeSegment() again.

  5. Repeat steps 3 and 4 until the last input buffer is empty.

  6. Call finish() to flush any internally buffered data to the output buffer. If the output buffer is full after finish() returns, you must make room in the output buffer and call finish() again. If finish() returns and the output buffer is not full, then the encoding is finished.

You may use the same encoder object for multiple encode operations.


Public Member Functions

 QuotedPrintableEncoder ()
 Default constructor.
 ~QuotedPrintableEncoder ()
 Destructor.
void setMaxLineLen (int len)
 Sets the maximum line length of the encoded output.
int maxLineLen ()
 Gets the maximum line length of the encoded output.
void setOutputCrLf (bool b)
 Sets the CRLF end-of-line characters option.
bool outputCrLf ()
 Gets the CRLF end-of-line characters option.
void setSuppressFinalNewline (bool b)
 Sets the option to suppress a final newline in the output.
bool suppressFinalNewline ()
 Gets the option to suppress a final newline in the output.
void setProtectFrom (bool b)
 Sets the option to protect "From " at the beginning of a line.
bool protectFrom ()
 Gets the option to protect "From " at the beginning of a line.
void setProtectDot (bool b)
 Sets the option to protect a dot at the beginning of a line.
bool protectDot ()
 Gets the option to protect a dot at the beginning of a line.
void setEncodeMap (const unsigned char *map)
 Sets the lookup table that determines how characters are encoded.
void start ()
 Starts a multiple-buffer encode operation.
void encodeSegment (ByteBuffer *inBuf, CharBuffer *outBuf)
 Encodes data from the input buffer to the output buffer.
void finish (CharBuffer *outBuf)
 Finishes a multiple-buffer encode operation.
String encode (const String &decoded)
 Performs single-step buffer-to-buffer quoted-printable encoding.

Static Public Attributes

const unsigned char *const ENCODE_MAP_LOW_RISK = QpEncoderImpl_LOW_RISK_MAP
 Low risk encoding map.
const unsigned char *const ENCODE_MAP_HIGH_RISK = QpEncoderImpl_HIGH_RISK_MAP
 High risk encoding map.


Constructor & Destructor Documentation

QuotedPrintableEncoder  ) 
 

Default constructor

The constructor sets default values for all the options.

~QuotedPrintableEncoder  ) 
 

Destructor.


Member Function Documentation

String encode const String decoded  ) 
 

Performs single-step buffer-to-buffer quoted-printable encoding.

To perform quoted-printable encoding using this function, create a String containing the data you want to encode and pass it as the function's argument. The returned String contains the encoded output.

This member function makes it very simple to perform quoted-printable encoding. The disadvantage of this function is that it requires all the data to be kept in memory for processing. You may use the low-level interface, described in the overview section, to perform quoted-printable encoding of large data using limited memory.

This member function uses the low-level interface internally. Any options set for the encoder object have the same effect using either this function or the low-level interface.

Parameters:
decoded string containing the data to be encoded
Returns:
string containing the encoded data

void encodeSegment ByteBuffer inBuf,
CharBuffer outBuf
 

Encodes data from the input buffer to the output buffer.

This member function is an essential part of the low-level interface and performs most of the work of encoding for the QuotedPrintableEncoder class. It takes an input buffer and an output buffer as parameters, and encodes data from the input buffer until the input buffer is empty or the output buffer is full. In other words, one of the following conditions is guaranteed to be satisfied when the function returns:

  • inBuf.pos == inBuf.endPos (input buffer empty)
  • outBuf.pos == outBuf.endPos (output buffer full)

You may call the function multiple times to encode multiple buffers of input data. However, before you call the function, both of the following conditions should be true:

  • inBuf.pos < inBuf.endPos (input buffer data available)
  • outBuf.pos < outBuf.endPos (output buffer space available)

For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder.

Parameters:
inBuf input buffer
outBuf output buffer

void finish CharBuffer outBuf  ) 
 

Finishes a multiple-buffer encode operation.

When you use the low-level interface, the encoder buffers some data internally. Therefore, after you have passed all input data to the encoder, you must call this member function to flush the internal buffer.

The following condition must be satisfied when you call the function:

  • outBuf.pos < outBuf.endPos (output buffer space available)

The above condition must also be satisified after the function returns in order to guarantee that all output data has been written to the output buffer. You may need to call finish() more than once before the above condition is satisfied when the function returns.

For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder.

Parameters:
outBuf output buffer

int maxLineLen  ) 
 

Gets the maximum line length of the encoded output.

Returns:
maximum line length of the encoded output
See also:
setMaxLineLen()

bool outputCrLf  ) 
 

Gets the CRLF end-of-line characters option.

Returns:
boolean value of this option
See also:
setOutputCrLf()

bool protectDot  ) 
 

Gets the option to protect a dot at the beginning of a line.

Returns:
boolean value of this option
See also:
setProtectDot()

bool protectFrom  ) 
 

Gets the option to protect "From " at the beginning of a line.

Returns:
boolean value of this option
See also:
setProtectFrom()

void setEncodeMap const unsigned char *  map  ) 
 

Sets the lookup table that determines how characters are encoded.

The encode map serves as a lookup table for the encoder, determining how characters (that is, bytes or 8-bit characters) are encoded. Two encode maps, QuotedPrintableEncoder::ENCODE_MAP_LOW_RISK and QuotedPrintableEncoder::ENCODE_MAP_HIGH_RISK, are provided as static data members. By setting a user-defined encode map, you can precisely specify the characters that the encoder will encode using the hex encoding.

The default is QuotedPrintableEncoder::ENCODE_MAP_LOW_RISK.

An encode map must be a char array of at least 256 chars. Each of the first 256 chars in the array must have a value of SAFE, UNSAFE, or SPECIAL. If a byte is set to UNSAFE, then the corresponding character will be encoded using the hex encoding. If a byte is set to SAFE, then the corresponding character will not be encoded. The value SPECIAL may be used only for CR or LF.

The following code example shows how to create and set an encode map that encodes all characters except CR, LF, the digits, and the letters:

    static unsigned char myEncodeMap[256];
    // First, set all entries to UNSAFE
    int i;
    for (i = 0; i < 256; ++i) {
        myEncodeMap[i] = QuotedPrintableEncoder::UNSAFE;
    }
    // Set CR and LF to SPECIAL
    myEncodeMap[10] = QuotedPrintableEncoder::SPECIAL;
    myEncodeMap[13] = QuotedPrintableEncoder::SPECIAL;
    // Set all digits to SAFE
    for (i = 48; i < 58; ++i) {
        myEncodeMap[i] = QuotedPrintableEncoder::SAFE;
    }
    // Set upper-case letters to SAFE
    for (i = 65; i < 91; ++i) {
        myEncodeMap[i] = QuotedPrintableEncoder::SAFE;
    }
    // Set lower-case letters to SAFE
    for (i = 97; i < 123; ++i) {
        myEncodeMap[i] = QuotedPrintableEncoder::SAFE;
    }
    encoder.setEncodeMap(myEncodeMap);

One possibility that is allowed by setting a user-defined encode map is that you can encode arbitrary binary data. Normally, the quoted-printable encoding is not a good choice for encoding binary data, because the CR LF sequence (or sometimes just LF) is treated as a hard-line break. (This is bad because most binary file formats -- image or sound files, for instance -- don't have line breaks.) Therefore, you can cause the encoder to encode CR and LF using the hex encoding. You do this by setting a user-defined encode map that has the entries for CR (13) and LF (10) set to UNSAFE.

Parameters:
map the encode map to set

void setMaxLineLen int  len  ) 
 

Sets the maximum line length of the encoded output.

For MIME-compliant Internet mail, the lines should be no longer than 76 characters, and the library enforces that rule. If the len parameter is larger than 76, the library sets the encoder's maximum line length to 76.

The default value is 76.

Parameters:
len maximum line length in the encoded output

void setOutputCrLf bool  b  ) 
 

Sets the CRLF end-of-line characters option.

If this option is true, then the encoder uses CR LF as the end-of-line characters in the encoded output. If this option is false, then the encoder uses LF alone.

Normally, you do not need to set this option, because the encoder performs correctly by default. When your program starts, and before you create any threads, set TextUtil::EOL_CHARS to either TextUtil::LF or TextUtil::CRLF. (The default is TextUtil::LF.) Then, the quoted-printable encoder sets the value of this option based on the value of TextUtil::EOL_CHARS.

Parameters:
b true value causes the encoder to output CR LF for the end-of-line characters; false causes it to output LF

void setProtectDot bool  b  ) 
 

Sets the option to protect a dot at the beginning of a line.

Because of the way mail messages are sent in the SMTP and POP3 protocols, a dot (the character ".") at the beginning of a line is treated specially. A single dot on a line by itself indicates the end of the message. For this reason, the protocol implementation must scan every line for a dot at the beginning and "escape" it by adding an extra dot when sending, and "unescape" it by removing the extra dot when receiving. If this option is true, then the encoder encodes every dot at the beginning of a line using the hex encoding. Setting this option may provide better interoperability with SMTP or POP3 implementations that do not correctly handle the escaping or unescaping of the dot character.

The default value is true.

Parameters:
b if true, the encoder encodes "." at the beginning of a line as "=2E"

void setProtectFrom bool  b  ) 
 

Sets the option to protect "From " at the beginning of a line.

Because of the way many applications store mail messages in a file, if the string "From " occurs at the beginning of a line, it is often changed to ">From ". If this option is true, then when "From " occurs at the beginning of a line, the encoder encodes it as "From=20" -- that is, the space character is encoded using the hex encoding.

The default value is true.

Parameters:
b if true, the encoder encodes "From " at the beginning of a line as "From=20"

void setSuppressFinalNewline bool  b  ) 
 

Sets the option to suppress a final newline in the output.

If this option is true, then the encoder does not put a final newline (CR LF, or LF) at the end of the encoded output, unless the input ends with a hard line break. If this option is false, then the encoder always adds a newline to the end of the encoded output. If the option is false and the input does not end with a hard line break, then the encoder adds a soft line break (that is, "=\r\\n" or "=\\n") at the end of the encoded output.

The default value is false (meaning that the encoder always adds a final newline).

Parameters:
b if true, the encoder suppresses a final newline; if false, the encoder always adds a final newline

void start  ) 
 

Starts a multiple-buffer encode operation.

If you use the low-level interface for multiple-buffer encoding, you must call start() to begin the encode operation. You may use a QuotedPrintableEncoder instance for many encode operations, but you must call start() to begin each operation.

For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder.

You do not need to call this method if you use the encode() member function for encoding.

bool suppressFinalNewline  ) 
 

Gets the option to suppress a final newline in the output.

Returns:
boolean value of this option
See also:
setSuppressFinalNewline()


Member Data Documentation

const unsigned char *const ENCODE_MAP_HIGH_RISK = QpEncoderImpl_HIGH_RISK_MAP [static]
 

High risk encoding map.

This char array acts as a look-up table that determines precisely which characters the encoder encodes using the hex encoding. The high-risk map specifies that hex encoding should be used for the following categories of characters:

  • All control characters except HT, CR, and LF (including DEL)
  • All characters with numeric value greater than 127
  • The EQUALS character ('=')

const unsigned char *const ENCODE_MAP_LOW_RISK = QpEncoderImpl_LOW_RISK_MAP [static]
 

Low risk encoding map.

This char array acts as a look-up table that determines precisely which characters the encoder encodes using the hex encoding. The low-risk map specifies that hex encoding should be used for the following categories of characters:

  • All control characters except CR and LF (including HT and DEL)
  • All characters with numeric value greater than 127
  • The EQUALS character ('=')
  • All the ASCII characters that are listed in RFC 2045 as risky (page 21)

Copyright © 2001-2007 Hunny Software, Inc. All rights reserved.