|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.hunnysoft.jmime.QuotedPrintableEncoder
Class that performs quoted-printable encoding.
Quoted-printable encoding encodes 8-bit text into printable ASCII characters for sending through the Internet mail system. The encoding is required because 8-bit characters cannot pass reliably through the Internet mail system. In the quoted-printable encoding, a character that is not in the ASCII 7-bit character set is encoded as an equals sign (the character "=") followed by two hex digits (0-9 and A-F). Certain ASCII characters are also encoded. The quoted-printable encoding also encodes "soft" line breaks, since long lines also cannot pass reliably through the Internet mail system. The details of quoted-printable encoding can be found in RFC 2045.
QuotedPrintableEncoder provides two interfaces for
performing quoted-printable encoding.
A high-level interface encodes from an input ByteString
to an output ByteString. This interface comprises a single
method, encode(ByteString).
A low-level interface allows encoding by passing multiple buffers to the encoder. The correct procedure for using this interface is described below.
QuotedPrintableEncoder allows you to change certain
options, which affect the behavior of the encoder:
Maximum Line Length The value of this option determines the maximum length of a line in the encoder's output. The maximum value allowed for this option is 76, which is the maximum line length allowed by the MIME standard. The default value is 76.
Output CR LF If you set this option to true,
then the encoder uses CR LF as the end-of-line characters in the
encoded output. If you set this option to false, then the encoder uses
LF alone. The default value is true if TextUtil.EOL is
TextUtil.CRLF_EOL.
Suppress Final Newline If you set this option to true, then the encoder does not put a final newline (CR LF, or LF) at the end of the encoded output, unless the input ends with a hard line break. If you set this option to false, then the encoder always adds a newline to the end of the encoded output. If the option is false and the input does not end with a hard line break, then the encoder adds a soft line break (that is, "=\r\n" or "=\n") at the end of the encoded output. The default value is false.
Protect From Because of the way many applications store mail messages in a file, if the string "From " occurs at the beginning of a line, it is often changed to ">From ". If this option is true, then when "From " occurs at the beginning of a line, the encoder encodes it as "From=20" -- that is, the space character is encoded using the hex encoding. The default value is true.
Protect Dot Because of the way mail messages are sent in the SMTP and POP3 protocols, a dot (the character ".") at the beginning of a line is treated specially. A single dot on a line by itself indicates the end of the message. For this reason, the protocol implementation must scan every line for a dot at the beginning and "escape" it by adding an extra dot when sending, and "unescape" it by removing the extra dot when receiving. If this option is true, then the encoder encodes every dot at the beginning of a line using the hex encoding. Setting this option may provide better interoperability with SMTP or POP3 implementations that do not correctly handle the escaping or unescaping of the dot character. The default value is true.
Encode Map This option allows you to control
precisely the characters that the encoder encodes. Two encoding maps,
ENCODE_MAP_LOW_RISK and ENCODE_MAP_HIGH_RISK, are
provided by class fields. See setEncodeMap(byte[]) for the details of
how to create your own encode map. The default value is
ENCODE_MAP_LOW_RISK, which encodes all the recommended
characters listed in RFC 2045 (page 21).
Using the Low-Level Interface
The low-level interface allows you to encode 8-bit text data one buffer at a time; thus you may encode text data of unlimited size using a limited amount of memory. For example, if you want to encode data from an input file to an output file, you may read from the input file one buffer at a time, pass each buffer to the encoder, and write to the output file one buffer at a time.
The low-level interface comprises three methods: start(),
encodeSegment(ByteBuffer,ByteBuffer), and finish(ByteBuffer). The procedure is described here:
start() to initialize the encoder. ByteBuffer. To initialize an input buffer named
inBuf, set inBuf.bytes to a byte array that
contains the data to be encoded, set inBuf.pos to the
offset of the beginning of the data in inBuf.bytes, and set
inBuf.endPos to the offset of the first byte past the end
of the data in inBuf.bytes. To initialize an output buffer
named outBuf, set outBuf.bytes to a byte
array, set outBuf.pos to zero, and set
outBuf.endPos to the length of the array referenced by
outBuf.bytes. encodeSegment(ByteBuffer,ByteBuffer) with the input
buffer and output buffer as arguments. outBuf.pos == outBuf.endPos, then the output
buffer is full, and you must make room in the output buffer before you
call encodeSegment again. If inBuf.pos ==
inBuf.endPos, then the input buffer is empty, and you must supply
the input buffer with more data before you call
encodeSegment again. finish(ByteBuffer) to flush any internally buffered
data to the output buffer. If the output buffer is full after
finish returns, you must make room in the output buffer and
call finish again. If finish returns and the
output buffer is not full, then the encoding is finished. You may use the same encoder object for multiple encode operations.
QuotedPrintableEncoderW,
Quoted-printable
in RFC 2045| Field Summary | |
static byte[] |
ENCODE_MAP_HIGH_RISK
High risk encoding map. |
static byte[] |
ENCODE_MAP_LOW_RISK
Low risk encoding map. |
static int |
SAFE
Named constant for safe characters in the encode map. |
static int |
SPECIAL
Named constant for special characters in the encode map. |
static int |
UNSAFE
Named constant for unsafe characters in the encode map. |
| Constructor Summary | |
QuotedPrintableEncoder()
Default constructor. |
|
| Method Summary | |
ByteString |
encode(ByteString decoded)
Performs single-step buffer-to-buffer quoted-printable encoding. |
void |
encodeSegment(ByteBuffer inBuf,
ByteBuffer outBuf)
Encodes data from the input buffer to the output buffer. |
void |
finish(ByteBuffer outBuf)
Finishes a multiple-buffer encode operation. |
int |
maxLineLen()
Gets the maximum line length of the encoded output. |
boolean |
outputCrLf()
Gets the CRLF end-of-line characters option. |
boolean |
protectDot()
Gets the option to protect a dot at the beginning of a line. |
boolean |
protectFrom()
Gets the option to protect "From " at the beginning of a line. |
void |
setEncodeMap(byte[] map)
Sets the lookup table that determines how characters are encoded. |
void |
setMaxLineLen(int len)
Sets the maximum line length of the encoded output. |
void |
setOutputCrLf(boolean b)
Sets the CRLF end-of-line characters option. |
void |
setProtectDot(boolean b)
Sets the option to protect a dot at the beginning of a line. |
void |
setProtectFrom(boolean b)
Sets the option to protect "From " at the beginning of a line. |
void |
setSuppressFinalNewline(boolean b)
Sets the option to suppress a final newline in the output. |
void |
start()
Starts a multiple-buffer encode operation. |
boolean |
suppressFinalNewline()
Gets the option to suppress a final newline in the output. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
public static final int SAFE
public static final int UNSAFE
public static final int SPECIAL
public static final byte[] ENCODE_MAP_LOW_RISK
This byte array acts as a look-up table that determines precisely which characters the encoder encodes using the hex encoding. The low-risk map specifies that hex encoding should be used for the following categories of characters:
public static final byte[] ENCODE_MAP_HIGH_RISK
This byte array acts as a look-up table that determines precisely which characters the encoder encodes using the hex encoding. The high-risk map specifies that hex encoding should be used for the following categories of characters:
| Constructor Detail |
public QuotedPrintableEncoder()
The constructor sets default values for all the options.
| Method Detail |
public void setMaxLineLen(int len)
For MIME-compliant Internet mail, the lines should be no longer
than 76 characters, and the library enforces that rule. If the
len parameter is larger than 76, the library sets the
encoder's maximum line length to 76.
The default value is 76.
len - maximum line length of the encoded outputpublic int maxLineLen()
setMaxLineLen(int)public void setOutputCrLf(boolean b)
If this option is true, then the encoder uses CR LF as the end-of-line characters in the encoded output. If this option is false, then the encoder uses LF alone.
Normally, you do not need to set this option, because the
encoder performs correctly by default. When your program starts,
and before you create any threads, set TextUtil.EOL to
either TextUtil.LF_EOL or TextUtil.CRLF_EOL. (The
default is TextUtil.LF_EOL.) Then, the
quoted-printable encoder sets the value of this option based on the
value of TextUtil.EOL.
b - true value causes the encoder to output CR LF for the
end-of-line characters; false causes it to output LF.public boolean outputCrLf()
setOutputCrLf(boolean)public void setSuppressFinalNewline(boolean b)
If this option is true, then the encoder does not put a final newline (CR LF, or LF) at the end of the encoded output, unless the input ends with a hard line break. If this option is false, then the encoder always adds a newline to the end of the encoded output. If the option is false and the input does not end with a hard line break, then the encoder adds a soft line break (that is, "=\r\n" or "=\n") at the end of the encoded output.
The default value is false (meaning that the encoder always adds a final newline).
b - if true, the encoder suppresses a final newline; if false,
the encoder always adds a final newlinepublic boolean suppressFinalNewline()
setSuppressFinalNewline(boolean)public void setProtectFrom(boolean b)
Because of the way many applications store mail messages in a file, if the string "From " occurs at the beginning of a line, it is often changed to ">From ". If this option is true, then when "From " occurs at the beginning of a line, the encoder encodes it as "From=20" -- that is, the space character is encoded using the hex encoding.
The default value is true.
b - if true, the encoder encodes "From " at the beginning
of a line as "From=20"public boolean protectFrom()
setProtectFrom(boolean)public void setProtectDot(boolean b)
Because of the way mail messages are sent in the SMTP and POP3 protocols, a dot (the character ".") at the beginning of a line is treated specially. A single dot on a line by itself indicates the end of the message. For this reason, the protocol implementation must scan every line for a dot at the beginning and "escape" it by adding an extra dot when sending, and "unescape" it by removing the extra dot when receiving. If this option is true, then the encoder encodes every dot at the beginning of a line using the hex encoding. Setting this option may provide better interoperability with SMTP or POP3 implementations that do not correctly handle the escaping or unescaping of the dot character.
The default value is true.
b - if true, the encoder encodes "." at the beginning of a line
as "=2E"public boolean protectDot()
setProtectDot(boolean)public void setEncodeMap(byte[] map)
The encode map serves as a lookup table for the encoder,
determining how characters (that is, bytes or 8-bit characters) are
encoded. Two encode maps, ENCODE_MAP_LOW_RISK and ENCODE_MAP_HIGH_RISK, are provided as class fields. By setting a
user-defined encode map, you can precisely specify the characters
that the encoder will encode using the hex encoding.
The default is ENCODE_MAP_LOW_RISK.
An encode map must be a byte array of at least 256 bytes. Each
of the first 256 bytes in the array must have a value of
SAFE, UNSAFE, or SPECIAL. If
a byte is set to UNSAFE, then the corresponding
character will be encoded using the hex encoding. If a byte is set
to SAFE, then the corresponding character will not be
encoded. The value SPECIAL may be used only for CR or
LF.
The following code example shows how to create and set an encode map that encodes all characters except CR, LF, the digits, and the letters:
byte[] myEncodeMap = new byte[256];
// First, set all entries to UNSAFE
int i;
for (i = 0; i < 256; ++i) {
myEncodeMap[i] = QuotedPrintableEncoder.UNSAFE;
}
// Set CR and LF to SPECIAL
myEncodeMap[10] = QuotedPrintableEncoder.SPECIAL;
myEncodeMap[13] = QuotedPrintableEncoder.SPECIAL;
// Set all digits to SAFE
for (i = 48; i < 58; ++i) {
myEncodeMap[i] = QuotedPrintableEncoder.SAFE;
}
// Set upper-case letters to SAFE
for (i = 65; i < 91; ++i) {
myEncodeMap[i] = QuotedPrintableEncoder.SAFE;
}
// Set lower-case letters to SAFE
for (i = 97; i < 123; ++i) {
myEncodeMap[i] = QuotedPrintableEncoder.SAFE;
}
encoder.setEncodeMap(myEncodeMap)
One possibility that is allowed by setting a user-defined encode map is that you can encode arbitrary binary data. Normally, the quoted-printable encoding is not a good choice for encoding binary data, because the CR LF sequence (or sometimes just LF) is treated as a hard-line break. (This is bad because most binary file formats -- image or sound files, for instance -- don't have line breaks.) Therefore, you can cause the encoder to encode CR and LF using the hex encoding. You do this by setting a user-defined encode map that has the entries for CR (13) and LF (10) set to UNSAFE.
map - the encode map to setpublic void start()
If you use the low-level interface for multiple-buffer encoding,
you must call start to begin the encode operation. You
may use a QuotedPrintableEncoder instance for many
encode operations, but you must call start to begin
each operation.
For more information on using the low-level interface, see the
overview section for QuotedPrintableEncoder.
You do not need to call this method if you use the encode(ByteString) method for encoding.
public void encodeSegment(ByteBuffer inBuf,
ByteBuffer outBuf)
This method is an essential part of the low-level interface and
performs most of the work of encoding for the
QuotedPrintableEncoder class. It takes an input buffer
and an output buffer as parameters, and encodes data from the input
buffer until the input buffer is empty or the output buffer is full.
In other words, one of the following conditions is guaranteed to be
satisfied when the method returns:
inBuf.pos == inBuf.endPos
(input buffer empty) outBuf.pos == outBuf.endPos
(output buffer full) You may call the method multiple times to encode multiple buffers of input data. However, before you call the method, both of the following conditions should be true:
inBuf.pos < inBuf.endPos
(input buffer data available) outBuf.pos < outBuf.endPos
(output buffer space available) For more information on using the low-level interface, see the
overview section for QuotedPrintableEncoder.
inBuf - input bufferoutBuf - output bufferpublic void finish(ByteBuffer outBuf)
When you use the low-level interface, the encoder buffers some data internally. Therefore, after you have passed all input data to the encoder, you must call this method to flush the internal buffer.
The following condition must be satisfied when you call the method:
outBuf.pos < outBuf.endPos
(output buffer space available) The above condition must also be satisified after the method
returns in order to guarantee that all output data has been written
to the output buffer. You may need to call finish more
than once before the above condition is satisfied when the method
returns.
For more information on using the low-level interface, see the
overview section for QuotedPrintableEncoder.
outBuf - output bufferpublic ByteString encode(ByteString decoded)
To perform quoted-printable encoding using this method, create a
ByteString containing the data you want to encode and
pass it as the method's argument. The returned
ByteString contains the encoded output.
This method makes it very simple to perform quoted-printable encoding. The disadvantage of this method is that it requires all the data to be kept in memory for processing. You may use the low-level interface, described in the overview section, to perform quoted-printable encoding of large data using limited memory.
This method uses the low-level interface internally. Any options set for the encoder object have the same effect using either this method or the low-level interface.
decoded - byte string containing the data to be encoded
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||