Most applications represent text as characters from the Unicode character set. When the application exports text, which happens when it sends an email or writes text to a file, it encodes the text into an external encoding, such as UTF-8, ISO 8859-5, GB2312, or some other encoding.
To understand the TextDecoder and TextEncoder classes, it is helpful to think of Unicode text as "decoded" text -- that is, "unencoded" text --, and to think of text that has an external encoding format as "encoded" text. Following this mental model, TextDecoder converts from an external encoding into Unicode -- that is, it decodes an external encoding into Unicode, and TextEncoder converts from Unicode to an external encoding -- that is, it encodes Unicode into an external encoding. Note that it is perfectly okay to use UTF-8 as an external encoding.
There is an important difference between Windows and Unix (and Linux) in the way programs represent text internally. Although both Windows and Unix represent text as Unicode characters, Windows uses the UTF-16 encoding internally, while Unix uses the UTF-8 encoding internally. In addition, in Windows, wchar_t is a 16-bit value, while in Unix, wchar_t is a 32-bit value. Because Hunny MIME++ strives to maintain cross-platform portability, Hunny MIME++ chose UTF-8 as the standard internal Unicode text format, which is convenient for Unix programmers, though an inconvenience for Windows programmers. The rationale for this decision is the fact that UTF-8 is compatible with the standard "C" strings used in many applications.
TextEncoder and TextDecoder depend on external libraries or components to perform the encoding and decoding. Developers may choose an implementation at application start-up time. The implementation options include the following:
Public Member Functions | |
| virtual | ~TextDecoder () |
| Destructor. | |
| virtual int | getCharsLen (const char *enc, int encLen)=0 |
| Gets the length of the decoded text (number of bytes). | |
| virtual int | getChars (const char *enc, int encLen, char *dec, int decMaxLen)=0 |
| Gets the decoded text. | |
Static Public Member Functions | |
| int | initialize () |
| Initializes the class. | |
| void | finalize () |
| Finalizes the class. | |
| TextDecoder * | create (const char *charset) |
| Creates a TextDecoder instance. | |
|
|
Destructor. |
|
|
Creates a TextDecoder instance.
|
|
|
Finalizes the class. You should not call this function. You may call mimepp::Finalize() just before your application exits, which calls TextDecoder::finalize() for you. However, unless your application checks for memory leaks, it is not necessary to call mimepp::Finalize(). |
|
||||||||||||||||||||
|
Gets the decoded text. This function decodes the encoded text in enc and writes the decoded (UTF-8 Unicode) text to the array that dec points to. You may call getCharsLen() first to get the required array size for the decoded text.
|
|
||||||||||||
|
Gets the length of the decoded text (number of bytes). You call this function first to find the length of the array required to contain the decoded text.
|
|
|
Initializes the class. You should not call this function directly. Your application should call the mimepp::Initialize() function at start-up time, which calls TextDecoder::initialize() for you. |
Copyright © 2001-2007 Hunny Software, Inc. All rights reserved.