Concepts
Before using the MimeParser module, it is important to understand a few basic concepts behind the operation of the MIME parser.
First, the MIME parser is stream-based. This means that a MIME message is presented to the parser one buffer at a time. This allows a program to parse very large messages using only a very small amount of memory. For example, you may choose to use a buffer size of 8192 bytes, and present all messages to the parser 8192 bytes at a time. Thus, if a message were very large, say 1,000,000 bytes, you could call the parser's parsing function 122 times with a full 8192-byte buffer and call it one final time with a partial buffer of 576 bytes. (1,000,000 = 122 * 8192 + 576) The amount of memory used by the parser is very small: in addition to the 8192 buffer, only a few thousand additional bytes of memory are required.
Second, the MIME parser is event-based. This means that the parser will report to your program various events that are considered significant as it parses the buffers. It is your program's responsibility to track these events and update its state based on them. You must understand these events to use the parser effectively. Examples of events include: Begin Message, Begin Headers, Begin Body Part, End Body Part, End Headers, End Message, and so on. See the reference documentation below for a complete listing of the events that are reported. Experienced XML programmers will recognize the event-based parser interface as being similar to the SAX interface used for parsing XML documents.
To handle the events, you must define your event handler functions, create an instance of the MimeParserVtable structure, and assign pointers to your functions to the members of the structure. The event handler functions are callback functions. They are documented as members of the MimeParserVtable.h file. Before you being parsing, you assign the vtable structure to the MimeParser object. Note that it is not necessary to allocate a MimeParserVtable structure dynamically: you can define the structure and initialize it statically. See the example code in simple.c.
Using MimeParser
To use the MIME parser, follow these steps:
Call MimeParser_create() to create an instance of the parser.
Call MimeParser_setVtable() to install your callback functions and client data.
Call MimeParser_start() to start a parse operation. When you call this function, the parser calls some of the callback functions that you installed in Step 2.
Call MimeParser_parseBuffer() as many times as necessary to present the entire MIME message to the parser. When you call this function, the parser calls some of the callback functions that you installed in Step 2.
Call MimeParser_finish() to finish the parse operation. When you call this function, the parser calls some of the callback functions that you installed in Step 2.
You may repeat Steps 3 through 5 many times. When you have finished using the parser, call MimeParser_destroy() to destroy the parser object and to free its memory.
All programs that use the MIME parser must follow the steps listed above. To accomplish your programming goals, however, you must create callback functions specific to your application. The callback functions are described in the MimeParserCallback.h reference page.
Tips
There is a certain amount of overhead associated with saving the parser state when leaving MimeParser_parseBuffer() and restoring the parser state when entering MimeParser_parseBuffer(). This overhead is neglible when buffer sizes are large. A good buffer size for file I/O is 8192.
Be careful that the buffer you use is not too large. You will get much better performance if the data you pass to the parser is in the processor's secondary cache. If you use a buffer that is too large, you will exceed the capacity of the secondary cache when you fill the buffer. The processor will then have to wait for the data to be retrieved from main memory, slowing performance.
Because of the way the parser operates, it is inefficient to present the message data to the parser one line at a time. The parser can't send the line-terminating CRLF to the Bytes() callback function until it knows that the CRLF is not part of a multipart boundary that follows the CRLF. Therefore, when you present the message data one line at a time, the parser makes two calls to the Bytes() callback function for every line in the body of a body part.
For best performance, create an instance of the parser and re-use that instance for multiple parse operations. Not only does this avoid the overhead of creating and destroying a parser object -- which can be significant if many small messages must be parsed -- but it also allows the parser object to use its pool of cached objects (these are objects that are used internally by the parser).
Functions | |
| void | MimeParser_setMinimumFieldBodyBufferSize (size_t n) |
| Sets the minimum buffer size for saving a partial header field body. | |
| MimeParser * | MimeParser_create () |
| Creates and returns a new, initialized MimeParser object. | |
| void | MimeParser_setVtable (MimeParser *parser, MimeParserVtable *vtable, void *data) |
| Installs the vtable of callback functions into a MimeParser object. | |
| void | MimeParser_destroy (MimeParser *parser) |
| Destroys a MimeParser object. | |
| void | MimeParser_start (MimeParser *parser) |
| Starts the parsing of a MIME message. | |
| void | MimeParser_parseBuffer (MimeParser *parser, const char *buffer, size_t length) |
| Continues the parsing of a MIME message. | |
| void | MimeParser_finish (MimeParser *parser) |
| Finishes the parsing of a message. | |
| size_t | MimeParser_bytePos (MimeParser *parser) |
| Gets the current byte position in the stream. | |
|
|
Gets the current byte position in the stream. The parser tracks the byte position in the stream. You may call this function to get the current byte position. Currently, this function returns the accurate byte position only when it's called from the Event callback function. Note: The byte position is relative to the beginning of the stream. If the stream is presented to the parser in multiple buffers, the byte position might not be a position in the current buffer. For example, if buffer A contains the bytes at positions 0-8191, and buffer B contains bytes at positions 8192-16383, a byte position reported while the parser processes buffer B may be 8190, which refers to a position in buffer A.
|
|
|
Creates and returns a new, initialized MimeParser object.
The function allocates memory for the object, which you must eventually free by calling
If there is an error allocating memory, the function returns a
|
|
|
Destroys a MimeParser object. The function frees all memory. Failure to call this function when you have finished using a parser object causes a memory leak.
After
|
|
|
Finishes the parsing of a message.
You must call this function after your last call to
While executing
|
|
||||||||||||||||
|
Continues the parsing of a MIME message.
While executing
|
|
|
Sets the minimum buffer size for saving a partial header field body. When MimeParser_parseBuffer() finishes a buffer while it's in the process of parsing header fields, either in the message or a body part, it might need to copy a partial header field body to a buffer it maintains. MimeParser creates and resizes this buffer dynamically, as needed. This parameter determines the minimum initial buffer size. The default value for this parameter is 8192. This parameter is a global value. If you want a value other than the default value, then you should set it once when your application starts. |
|
||||||||||||||||
|
Installs the vtable of callback functions into a MIME parser object.
See the MimeParserVtable.h reference page for detailed descriptions of the callback functions.
Note: The library code does not free the memory pointed to by the
|
|
|
Starts the parsing of a MIME message.
You must call this function before you call
While executing
|
Copyright © 2001-2006 Hunny Software, Inc. All rights reserved.