com.hunnysoft.jmime
Class QuotedPrintableEncoder

java.lang.Object
  extended bycom.hunnysoft.jmime.QuotedPrintableEncoder

public final class QuotedPrintableEncoder
extends java.lang.Object

Class that performs quoted-printable encoding.

Quoted-printable encoding encodes 8-bit text into printable ASCII characters for sending through the Internet mail system. The encoding is required because 8-bit characters cannot pass reliably through the Internet mail system. In the quoted-printable encoding, a character that is not in the ASCII 7-bit character set is encoded as an equals sign (the character "=") followed by two hex digits (0-9 and A-F). Certain ASCII characters are also encoded. The quoted-printable encoding also encodes "soft" line breaks, since long lines also cannot pass reliably through the Internet mail system. The details of quoted-printable encoding can be found in RFC 2045.

QuotedPrintableEncoder provides two interfaces for performing quoted-printable encoding.

A high-level interface encodes from an input ByteString to an output ByteString. This interface comprises a single method, encode(ByteString).

A low-level interface allows encoding by passing multiple buffers to the encoder. The correct procedure for using this interface is described below.

QuotedPrintableEncoder allows you to change certain options, which affect the behavior of the encoder:

Using the Low-Level Interface

The low-level interface allows you to encode 8-bit text data one buffer at a time; thus you may encode text data of unlimited size using a limited amount of memory. For example, if you want to encode data from an input file to an output file, you may read from the input file one buffer at a time, pass each buffer to the encoder, and write to the output file one buffer at a time.

The low-level interface comprises three methods: start(), encodeSegment(ByteBuffer,ByteBuffer), and finish(ByteBuffer). The procedure is described here:

  1. Call start() to initialize the encoder.
  2. Initialize an input buffer and an output buffer. These buffers are instances of ByteBuffer. To initialize an input buffer named inBuf, set inBuf.bytes to a byte array that contains the data to be encoded, set inBuf.pos to the offset of the beginning of the data in inBuf.bytes, and set inBuf.endPos to the offset of the first byte past the end of the data in inBuf.bytes. To initialize an output buffer named outBuf, set outBuf.bytes to a byte array, set outBuf.pos to zero, and set outBuf.endPos to the length of the array referenced by outBuf.bytes.
  3. Call encodeSegment(ByteBuffer,ByteBuffer) with the input buffer and output buffer as arguments.
  4. Check to see if the output buffer is full or if the input buffer is empty. If outBuf.pos == outBuf.endPos, then the output buffer is full, and you must make room in the output buffer before you call encodeSegment again. If inBuf.pos == inBuf.endPos, then the input buffer is empty, and you must supply the input buffer with more data before you call encodeSegment again.
  5. Repeat steps 3 and 4 until the last input buffer is empty.
  6. Call finish(ByteBuffer) to flush any internally buffered data to the output buffer. If the output buffer is full after finish returns, you must make room in the output buffer and call finish again. If finish returns and the output buffer is not full, then the encoding is finished.

You may use the same encoder object for multiple encode operations.

See Also:
QuotedPrintableEncoderW, Quoted-printable in RFC 2045

Field Summary
static byte[] ENCODE_MAP_HIGH_RISK
          High risk encoding map.
static byte[] ENCODE_MAP_LOW_RISK
          Low risk encoding map.
static int SAFE
          Named constant for safe characters in the encode map.
static int SPECIAL
          Named constant for special characters in the encode map.
static int UNSAFE
          Named constant for unsafe characters in the encode map.
 
Constructor Summary
QuotedPrintableEncoder()
          Default constructor.
 
Method Summary
 ByteString encode(ByteString decoded)
          Performs single-step buffer-to-buffer quoted-printable encoding.
 void encodeSegment(ByteBuffer inBuf, ByteBuffer outBuf)
          Encodes data from the input buffer to the output buffer.
 void finish(ByteBuffer outBuf)
          Finishes a multiple-buffer encode operation.
 int maxLineLen()
          Gets the maximum line length of the encoded output.
 boolean outputCrLf()
          Gets the CRLF end-of-line characters option.
 boolean protectDot()
          Gets the option to protect a dot at the beginning of a line.
 boolean protectFrom()
          Gets the option to protect "From " at the beginning of a line.
 void setEncodeMap(byte[] map)
          Sets the lookup table that determines how characters are encoded.
 void setMaxLineLen(int len)
          Sets the maximum line length of the encoded output.
 void setOutputCrLf(boolean b)
          Sets the CRLF end-of-line characters option.
 void setProtectDot(boolean b)
          Sets the option to protect a dot at the beginning of a line.
 void setProtectFrom(boolean b)
          Sets the option to protect "From " at the beginning of a line.
 void setSuppressFinalNewline(boolean b)
          Sets the option to suppress a final newline in the output.
 void start()
          Starts a multiple-buffer encode operation.
 boolean suppressFinalNewline()
          Gets the option to suppress a final newline in the output.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SAFE

public static final int SAFE
Named constant for safe characters in the encode map.

See Also:
Constant Field Values

UNSAFE

public static final int UNSAFE
Named constant for unsafe characters in the encode map.

See Also:
Constant Field Values

SPECIAL

public static final int SPECIAL
Named constant for special characters in the encode map.

See Also:
Constant Field Values

ENCODE_MAP_LOW_RISK

public static final byte[] ENCODE_MAP_LOW_RISK
Low risk encoding map.

This byte array acts as a look-up table that determines precisely which characters the encoder encodes using the hex encoding. The low-risk map specifies that hex encoding should be used for the following categories of characters:


ENCODE_MAP_HIGH_RISK

public static final byte[] ENCODE_MAP_HIGH_RISK
High risk encoding map.

This byte array acts as a look-up table that determines precisely which characters the encoder encodes using the hex encoding. The high-risk map specifies that hex encoding should be used for the following categories of characters:

Constructor Detail

QuotedPrintableEncoder

public QuotedPrintableEncoder()
Default constructor.

The constructor sets default values for all the options.

Method Detail

setMaxLineLen

public void setMaxLineLen(int len)
Sets the maximum line length of the encoded output.

For MIME-compliant Internet mail, the lines should be no longer than 76 characters, and the library enforces that rule. If the len parameter is larger than 76, the library sets the encoder's maximum line length to 76.

The default value is 76.

Parameters:
len - maximum line length of the encoded output

maxLineLen

public int maxLineLen()
Gets the maximum line length of the encoded output.

Returns:
maximum line length of the encoded output
See Also:
setMaxLineLen(int)

setOutputCrLf

public void setOutputCrLf(boolean b)
Sets the CRLF end-of-line characters option.

If this option is true, then the encoder uses CR LF as the end-of-line characters in the encoded output. If this option is false, then the encoder uses LF alone.

Normally, you do not need to set this option, because the encoder performs correctly by default. When your program starts, and before you create any threads, set TextUtil.EOL to either TextUtil.LF_EOL or TextUtil.CRLF_EOL. (The default is TextUtil.LF_EOL.) Then, the quoted-printable encoder sets the value of this option based on the value of TextUtil.EOL.

Parameters:
b - true value causes the encoder to output CR LF for the end-of-line characters; false causes it to output LF.

outputCrLf

public boolean outputCrLf()
Gets the CRLF end-of-line characters option.

Returns:
boolean value of this option
See Also:
setOutputCrLf(boolean)

setSuppressFinalNewline

public void setSuppressFinalNewline(boolean b)
Sets the option to suppress a final newline in the output.

If this option is true, then the encoder does not put a final newline (CR LF, or LF) at the end of the encoded output, unless the input ends with a hard line break. If this option is false, then the encoder always adds a newline to the end of the encoded output. If the option is false and the input does not end with a hard line break, then the encoder adds a soft line break (that is, "=\r\n" or "=\n") at the end of the encoded output.

The default value is false (meaning that the encoder always adds a final newline).

Parameters:
b - if true, the encoder suppresses a final newline; if false, the encoder always adds a final newline

suppressFinalNewline

public boolean suppressFinalNewline()
Gets the option to suppress a final newline in the output.

Returns:
boolean value of this option
See Also:
setSuppressFinalNewline(boolean)

setProtectFrom

public void setProtectFrom(boolean b)
Sets the option to protect "From " at the beginning of a line.

Because of the way many applications store mail messages in a file, if the string "From " occurs at the beginning of a line, it is often changed to ">From ". If this option is true, then when "From " occurs at the beginning of a line, the encoder encodes it as "From=20" -- that is, the space character is encoded using the hex encoding.

The default value is true.

Parameters:
b - if true, the encoder encodes "From " at the beginning of a line as "From=20"

protectFrom

public boolean protectFrom()
Gets the option to protect "From " at the beginning of a line.

Returns:
boolean value of this option
See Also:
setProtectFrom(boolean)

setProtectDot

public void setProtectDot(boolean b)
Sets the option to protect a dot at the beginning of a line.

Because of the way mail messages are sent in the SMTP and POP3 protocols, a dot (the character ".") at the beginning of a line is treated specially. A single dot on a line by itself indicates the end of the message. For this reason, the protocol implementation must scan every line for a dot at the beginning and "escape" it by adding an extra dot when sending, and "unescape" it by removing the extra dot when receiving. If this option is true, then the encoder encodes every dot at the beginning of a line using the hex encoding. Setting this option may provide better interoperability with SMTP or POP3 implementations that do not correctly handle the escaping or unescaping of the dot character.

The default value is true.

Parameters:
b - if true, the encoder encodes "." at the beginning of a line as "=2E"

protectDot

public boolean protectDot()
Gets the option to protect a dot at the beginning of a line.

Returns:
boolean value of this option
See Also:
setProtectDot(boolean)

setEncodeMap

public void setEncodeMap(byte[] map)
Sets the lookup table that determines how characters are encoded.

The encode map serves as a lookup table for the encoder, determining how characters (that is, bytes or 8-bit characters) are encoded. Two encode maps, ENCODE_MAP_LOW_RISK and ENCODE_MAP_HIGH_RISK, are provided as class fields. By setting a user-defined encode map, you can precisely specify the characters that the encoder will encode using the hex encoding.

The default is ENCODE_MAP_LOW_RISK.

An encode map must be a byte array of at least 256 bytes. Each of the first 256 bytes in the array must have a value of SAFE, UNSAFE, or SPECIAL. If a byte is set to UNSAFE, then the corresponding character will be encoded using the hex encoding. If a byte is set to SAFE, then the corresponding character will not be encoded. The value SPECIAL may be used only for CR or LF.

The following code example shows how to create and set an encode map that encodes all characters except CR, LF, the digits, and the letters:

    byte[] myEncodeMap = new byte[256];
    // First, set all entries to UNSAFE
    int i;
    for (i = 0; i < 256; ++i) {
        myEncodeMap[i] = QuotedPrintableEncoder.UNSAFE;
    }
    // Set CR and LF to SPECIAL
    myEncodeMap[10] = QuotedPrintableEncoder.SPECIAL;
    myEncodeMap[13] = QuotedPrintableEncoder.SPECIAL;
    // Set all digits to SAFE
    for (i = 48; i < 58; ++i) {
        myEncodeMap[i] = QuotedPrintableEncoder.SAFE;
    }
    // Set upper-case letters to SAFE
    for (i = 65; i < 91; ++i) {
        myEncodeMap[i] = QuotedPrintableEncoder.SAFE;
    }
    // Set lower-case letters to SAFE
    for (i = 97; i < 123; ++i) {
        myEncodeMap[i] = QuotedPrintableEncoder.SAFE;
    }
    encoder.setEncodeMap(myEncodeMap)

One possibility that is allowed by setting a user-defined encode map is that you can encode arbitrary binary data. Normally, the quoted-printable encoding is not a good choice for encoding binary data, because the CR LF sequence (or sometimes just LF) is treated as a hard-line break. (This is bad because most binary file formats -- image or sound files, for instance -- don't have line breaks.) Therefore, you can cause the encoder to encode CR and LF using the hex encoding. You do this by setting a user-defined encode map that has the entries for CR (13) and LF (10) set to UNSAFE.

Parameters:
map - the encode map to set

start

public void start()
Starts a multiple-buffer encode operation.

If you use the low-level interface for multiple-buffer encoding, you must call start to begin the encode operation. You may use a QuotedPrintableEncoder instance for many encode operations, but you must call start to begin each operation.

For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder.

You do not need to call this method if you use the encode(ByteString) method for encoding.


encodeSegment

public void encodeSegment(ByteBuffer inBuf,
                          ByteBuffer outBuf)
Encodes data from the input buffer to the output buffer.

This method is an essential part of the low-level interface and performs most of the work of encoding for the QuotedPrintableEncoder class. It takes an input buffer and an output buffer as parameters, and encodes data from the input buffer until the input buffer is empty or the output buffer is full. In other words, one of the following conditions is guaranteed to be satisfied when the method returns:

You may call the method multiple times to encode multiple buffers of input data. However, before you call the method, both of the following conditions should be true:

For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder.

Parameters:
inBuf - input buffer
outBuf - output buffer

finish

public void finish(ByteBuffer outBuf)
Finishes a multiple-buffer encode operation.

When you use the low-level interface, the encoder buffers some data internally. Therefore, after you have passed all input data to the encoder, you must call this method to flush the internal buffer.

The following condition must be satisfied when you call the method:

The above condition must also be satisified after the method returns in order to guarantee that all output data has been written to the output buffer. You may need to call finish more than once before the above condition is satisfied when the method returns.

For more information on using the low-level interface, see the overview section for QuotedPrintableEncoder.

Parameters:
outBuf - output buffer

encode

public ByteString encode(ByteString decoded)
Performs single-step buffer-to-buffer quoted-printable encoding.

To perform quoted-printable encoding using this method, create a ByteString containing the data you want to encode and pass it as the method's argument. The returned ByteString contains the encoded output.

This method makes it very simple to perform quoted-printable encoding. The disadvantage of this method is that it requires all the data to be kept in memory for processing. You may use the low-level interface, described in the overview section, to perform quoted-printable encoding of large data using limited memory.

This method uses the low-level interface internally. Any options set for the encoder object have the same effect using either this method or the low-level interface.

Parameters:
decoded - byte string containing the data to be encoded
Returns:
byte string containing the encoded data