Main Page | Data Structures | File List | Data Fields | Globals

MimeParser.h File Reference


Detailed Description

The MimeParser module provides an object that can parse MIME messages or documents, including complex nested multipart documents. The parser is highly optimized for speed. It is, perhaps, the fastest MIME parser available.

Concepts

Before using the MimeParser module, it is important to understand a few basic concepts behind the operation of the MIME parser.

First, the MIME parser is stream-based. This means that a MIME message is presented to the parser one buffer at a time. This allows a program to parse very large messages using only a very small amount of memory. For example, you may choose to use a buffer size of 8192 bytes, and present all messages to the parser 8192 bytes at a time. Thus, if a message were very large, say 1,000,000 bytes, you could call the parser's parsing function 122 times with a full 8192-byte buffer and call it one final time with a partial buffer of 576 bytes. (1,000,000 = 122 * 8192 + 576) The amount of memory used by the parser is very small: in addition to the 8192 buffer, only a few thousand additional bytes of memory are required.

Second, the MIME parser is event-based. This means that the parser will report to your program various events that are considered significant as it parses the buffers. It is your program's responsibility to track these events and update its state based on them. You must understand these events to use the parser effectively. Examples of events include: Begin Message, Begin Headers, Begin Body Part, End Body Part, End Headers, End Message, and so on. See the reference documentation below for a complete listing of the events that are reported. Experienced XML programmers will recognize the event-based parser interface as being similar to the SAX interface used for parsing XML documents.

To handle the events, you must define your event handler functions, create an instance of the MimeParserVtable structure, and assign pointers to your functions to the members of the structure. The event handler functions are callback functions. They are documented as members of the MimeParserVtable.h file. Before you being parsing, you assign the vtable structure to the MimeParser object. Note that it is not necessary to allocate a MimeParserVtable structure dynamically: you can define the structure and initialize it statically. See the example code in simple.c.

Using MimeParser

To use the MIME parser, follow these steps:

  1. Call MimeParser_create() to create an instance of the parser.

  2. Call MimeParser_setVtable() to install your callback functions and client data.

  3. Call MimeParser_start() to start a parse operation. When you call this function, the parser calls some of the callback functions that you installed in Step 2.

  4. Call MimeParser_parseBuffer() as many times as necessary to present the entire MIME message to the parser. When you call this function, the parser calls some of the callback functions that you installed in Step 2.

  5. Call MimeParser_finish() to finish the parse operation. When you call this function, the parser calls some of the callback functions that you installed in Step 2.

  6. You may repeat Steps 3 through 5 many times. When you have finished using the parser, call MimeParser_destroy() to destroy the parser object and to free its memory.

All programs that use the MIME parser must follow the steps listed above. To accomplish your programming goals, however, you must create callback functions specific to your application. The callback functions are described in the MimeParserCallback.h reference page.

Tips

There is a certain amount of overhead associated with saving the parser state when leaving MimeParser_parseBuffer() and restoring the parser state when entering MimeParser_parseBuffer(). This overhead is neglible when buffer sizes are large. A good buffer size for file I/O is 8192.

Be careful that the buffer you use is not too large. You will get much better performance if the data you pass to the parser is in the processor's secondary cache. If you use a buffer that is too large, you will exceed the capacity of the secondary cache when you fill the buffer. The processor will then have to wait for the data to be retrieved from main memory, slowing performance.

Because of the way the parser operates, it is inefficient to present the message data to the parser one line at a time. The parser can't send the line-terminating CRLF to the Bytes() callback function until it knows that the CRLF is not part of a multipart boundary that follows the CRLF. Therefore, when you present the message data one line at a time, the parser makes two calls to the Bytes() callback function for every line in the body of a body part.

For best performance, create an instance of the parser and re-use that instance for multiple parse operations. Not only does this avoid the overhead of creating and destroying a parser object -- which can be significant if many small messages must be parsed -- but it also allows the parser object to use its pool of cached objects (these are objects that are used internally by the parser).


Functions

void MimeParser_setMinimumFieldBodyBufferSize (size_t n)
 Sets the minimum buffer size for saving a partial header field body.
MimeParser * MimeParser_create ()
 Creates and returns a new, initialized MimeParser object.
void MimeParser_setVtable (MimeParser *parser, MimeParserVtable *vtable, void *data)
 Installs the vtable of callback functions into a MimeParser object.
void MimeParser_destroy (MimeParser *parser)
 Destroys a MimeParser object.
void MimeParser_start (MimeParser *parser)
 Starts the parsing of a MIME message.
void MimeParser_parseBuffer (MimeParser *parser, const char *buffer, size_t length)
 Continues the parsing of a MIME message.
void MimeParser_finish (MimeParser *parser)
 Finishes the parsing of a message.
size_t MimeParser_bytePos (MimeParser *parser)
 Gets the current byte position in the stream.


Function Documentation

size_t MimeParser_bytePos MimeParser *  parser  ) 
 

Gets the current byte position in the stream.

The parser tracks the byte position in the stream. You may call this function to get the current byte position.

Currently, this function returns the accurate byte position only when it's called from the Event callback function.

Note: The byte position is relative to the beginning of the stream. If the stream is presented to the parser in multiple buffers, the byte position might not be a position in the current buffer. For example, if buffer A contains the bytes at positions 0-8191, and buffer B contains bytes at positions 8192-16383, a byte position reported while the parser processes buffer B may be 8190, which refers to a position in buffer A.

Parameters:
parser the parser object
Returns:
current byte position in the stream

MimeParser* MimeParser_create  ) 
 

Creates and returns a new, initialized MimeParser object.

The function allocates memory for the object, which you must eventually free by calling MimeParser_destroy().

If there is an error allocating memory, the function returns a NULL pointer.

Returns:
pointer to an initialized parser object if successful; NULL pointer if unsuccessful

void MimeParser_destroy MimeParser *  parser  ) 
 

Destroys a MimeParser object.

The function frees all memory. Failure to call this function when you have finished using a parser object causes a memory leak.

After MimeParser_destroy() returns, the parser pointer should be considered invalid.

Parameters:
parser the parser object

void MimeParser_finish MimeParser *  parser  ) 
 

Finishes the parsing of a message.

You must call this function after your last call to MimeParser_parseBuffer() in order to finish any pending processing. Before you use the parser to parse another message, you must call MimeParser_start() again after you call MimeParser_finish().

While executing MimeParser_finish(), the parser calls the callback functions in the installed vtable.

Parameters:
parser the parser object

void MimeParser_parseBuffer MimeParser *  parser,
const char *  buffer,
size_t  length
 

Continues the parsing of a MIME message.

MimeParser_parseBuffer() presents a single buffer of byte data to the parser for processing. You may call this function multiple times after you call MimeParser_start() and before you call MimeParser_finish().

buffer is a pointer to the byte data to be processed by the parser. length is the number of bytes in buffer to be processed.

While executing MimeParser_parseBuffer(), the parser calls the callback functions in the installed vtable.

Parameters:
parser the parser object
buffer pointer to a char array containing bytes to parse
length number of characters in the buffer to parse

void MimeParser_setMinimumFieldBodyBufferSize size_t  n  ) 
 

Sets the minimum buffer size for saving a partial header field body.

When MimeParser_parseBuffer() finishes a buffer while it's in the process of parsing header fields, either in the message or a body part, it might need to copy a partial header field body to a buffer it maintains. MimeParser creates and resizes this buffer dynamically, as needed. This parameter determines the minimum initial buffer size.

The default value for this parameter is 8192.

This parameter is a global value. If you want a value other than the default value, then you should set it once when your application starts.

void MimeParser_setVtable MimeParser *  parser,
MimeParserVtable vtable,
void *  clientData
 

Installs the vtable of callback functions into a MIME parser object.

parser is the MIME parser object into which you want to install the vtable. vtable is the struct containing pointers to callback functions that the parser will call. clientData is the client data that is supplied as the client data argument to the callback functions.

See the MimeParserVtable.h reference page for detailed descriptions of the callback functions.

Note: The library code does not free the memory pointed to by the vtable argument. This means you may create a single static instance of the vtable to be shared by all parser instances. However, if you dynamically allocate memory for a vtable structure, then your code also has the responsibility to free the memory. Similarly, the library code does not free the memory pointed to by the clientData parameter. This makes sense, because the library code treats the client data pointer as an opaque pointer, about which it has no information.

Parameters:
parser the parser object
vtable structure containing pointers to callback functions
clientData opaque data passed back to the client in the callback functions

void MimeParser_start MimeParser *  parser  ) 
 

Starts the parsing of a MIME message.

You must call this function before you call MimeParser_parseBuffer() for the first time. You may use a single parser to parse more than one MIME message, provided that you call MimeParser_start() to start the parsing of each message and call MimeParser_finish() to finish the parsing of each message.

While executing MimeParser_start(), the parser calls the callback functions in the installed vtable.

Parameters:
parser the parser object

Copyright © 2001-2006 Hunny Software, Inc. All rights reserved.