A Simplified Guide to MIME
MIME stands for Multipurpose Internet Mail Extensions. As its name suggests, MIME is a set of extensions to standard Internet email. However, MIME has been around for quite awhile, and it may now be safe to say that MIME is the standard for Internet email.
MIME offers many features that are considered essential for modern email usage. These features are listed here, and explained in more detail later in this article:
Support for character sets other than ASCII, required for sending email in languages other than English.
A content type labeling system, which allows multimedia content to be handled intelligently by computer programs.
Support for content in email messages that is not text, which allows email to contain multimedia content, including images, audio, office documents, and more.
Support for compound documents, which allows a single email message to contain multiple parts (multiple images, file attachments, and so on).
A Brief History of Internet Email
Without common standards, we probably would not have the widespread use of email that we have today. The standards are what enable email products from hundreds of companies to interoperate. You probably know that you can use Microsoft Outlook Express to send an email to someone who uses AOL for their email, or to someone who uses Yahoo Mail. This is possible because Microsoft, AOL, and Yahoo, along with hundreds of other companies, all agree to abide by the same email standards.
The current standard for Internet email dates back 20 years to 1982, the date that RFC 822 was published. RFC 822 (RFC means Request For Comments, in case you were wondering) is just a document, which is available on the Web to anyone, that describes the format of Internet email messages. The document was written by volunteers and approved by the Internet Engineering Task Force (IETF), the organization that even today is responsible for establishing almost all the standards for the Internet.
By 1992, ten years after RFC 822 was published, RFC 822 was holding up well, but many felt that it could be improved. It was at that time that MIME was created, which provided "extensions" to standard Internet email. Perhaps one of the greatest needs filled by MIME was the ability to send email using characters other than those of the basic Roman alphabet, as used in the United States. With the introduction of MIME, email could be sent using the Cyrillic alphabet (used in Russian), the Greek alphabet, or even the ideographic characters of Chinese. Another need filled by MIME was the need to send non-text content, such as images or video clips. In June, 1992, the IETF approved RFC 1341, which described the MIME message format. RFC 1341 has since been revised and republished (the most recent revision is RFC 2045 and RFC 2046), and the MIME standard has seen widespread adoption.
Support for Non-ASCII character sets
For users in the U.S. and the U.K., the original email standard would have been just fine for sending text messages. However, for billions of users around the world whose native language is not English, those old standards are entirely unusable. The problem is in the limited alphabet allowed in plain text email -- basically, only the letters of Roman alphabet are allowed, without any accent marks. MIME solves this problem, allowing other alphabets to be used.
In technical terms, the letters of the Roman alphabet are encoded into the numbers understood by computers according to the American Standard Code for Information Interchange (ASCII). ASCII is what is commonly referred to as a coded character set. There are many standard coded character sets used around the world. BIG5, for example, is the name of a coded character set widely used in Taiwan. ASCII is by far the most commonly used character set in the U.S. In Western Europe and many parts of North and South America, the ISO 8859-1 character set is common -- it extends the basic character set of ASCII to include many of the accented characters used in languages such as Spanish, French, and German.
In an email that follows the MIME format, you will usually find a label that indicates what character set was used to encode the text in the message. Look in the message header for text that looks like this:
Content-Type: text/plain; charset="ISO-8859-1"
This line indicates that the character set used for the text in the message body follows the ISO 8859-1 character set standard. This line is important to the program you use to read email, since it provides the information necessary for your program to correctly display the text. The charset parameter may not always be present; if it is not, then the text will be assumed to be encoded using the ASCII character set.
Note: Most email programs don't display all the header fields, but most do allow you to view them if you really want to. Check with someone who really knows how to use the software to show you how.
Content Type Labeling
Since the World Wide Web became popular, many Internet users have become very comfortable with non-text content, such as images (pictures, diagrams, cartoons), audio (MIDI files, WAVE files, the now popular MP3 music files), and even video. To clearly identify the type of content in a file, most operating systems use a file name extension. The following table lists some of the commonly used file name extensions and a description of the content the files contain:
|.htm||text/html||Styled text in HTML format|
|.jpg||image/jpeg||Picture in JPEG format|
|.gif||image/gif||Picture in GIF format|
|.wav||audio/x-wave||Sound in WAVE format|
|.mp3||audio/mpeg||Music in MP3 format|
|.mpg||video/mpeg||Video in MPEG format|
|.zip||application/zip||Compressed file in PK-ZIP format|
The table also lists the MIME type for the extension. While file name extensions are somewhat limited -- typically they use only three or four characters -- the file types described using MIME types are much more capable. A MIME type consists of a primary type and a subtype. The primary type provides only the most general indication of the type of content, such as whether the file is text, audio, video, or some other general type. The subtype refines the primary type, providing the necessary information for a computer program to "render" the content. For example, there are many types of image files, which all have the primary content type "image". The subtype provides a way to distinguish between these different types of image files. Some examples of image subtypes include jpeg, gif, png, bmp, among others. The complete MIME type is the primary type, plus a slash character, plus the subtype. An example would be "image/jpeg". In this example, "image" is the primary type, "jpeg" is the subtype.
The MIME type is clearly a much better alternative than file name extensions for identifying content types. For one thing, if you don't recognize a file name extension, there is no way that you can guess what kind of content the file contains. In contrast, the primary type of a MIME content type always indicates the kind of content. So, if you don't recognize the MIME type image/foo, at least you know that it is some type of image. If you don't recognize the .foo file name extension, you have no idea whether it contains text, an image, audio content, or still some other kind of content.
As you can probably imagine, MIME types have become an extremely important factor in making the Internet and the World Wide Web run smoothly. Consider the Web for a moment. A web server sends content to a web browser. This content might be HTML text, JPEG or GIF images, a MIDI sound file, a PDF file, or numerous other kinds of files. The web server also sends to your browser an indication of the MIME type for the file. Your web browser understands the MIME type -- it knows to display HTML text and images, to start Adobe Acrobat Reader for a PDF file, or to play a MIDI file through your PC's speakers. In email, MIME types are important, too. Your email program knows how to interpret the MIME types, and what to do with the different kinds of content that can be sent through email.
As a side note, if you are accustomed to sending and receiving email messages that contain styled text -- that is, text with varied fonts, styles (bold, italic), and colors -- then you can be thankful for the content type labeling system that is part of the MIME standard. This styled text has the content type text/html. Unsurprisingly, plain text (non-styled text) has the content type text/plain.
Support for Non-Text Content
Email would be one of the Internet's most popular applications even if only text messages were allowed, as evidenced by the fact that the vast majority email messages sent on the Internet are text messages without any attachments. The widespread adoption of MIME, however, allows office documents, pictures, MP3 files, video clips, and other kinds of non-text content to be sent through the Internet mail system.
The original Internet mail system allowed only ASCII text messages to be sent. The users of the original mail system did invent a way to send arbitrary files, such as office documents, through the mail system by encoding them as text and inserting them into their mail messages. The encoding they used was called uuencode. It was a kludge -- difficult for your average user to use and unreliable. Incidentally, uudecode is still the standard way to post pictures, MP3 files, and other file types to Usenet newsgroups.
The Internet mail system today is still constrained by the legacy of the original mail system. In particular, the mail system is very unfriendly to messages that contain unencoded, non-text content. Consequently, before non-text content is sent through the mail system, it is encoded into text characters using an encoding called base64. Unlike the old system that used uuencode encoding, MIME's base64 encoding offers rock solid reliability. And because most email software today supports MIME, it has become easy even for the casual user to send file attachments.
You might find it interesting, that the constraints imposed by the original Internet mail system is even more stringent than you might think. The system can only reliably transport messages that consist entirely of text encoded in the ASCII character set. Therefore, not even all text context can be reliably sent through the mail system without first encoding it into ASCII characters. Base64 is, of course, one possibility for encoding non-ASCII text content. MIME also provides an alternative encoding called quoted-printable that is suitable for character sets like ISO 8859-1 that are mostly ASCII compatible. Quoted-printable encoding happens behind the scenes in your email program, so it's nothing you need to be aware of. But if you ever see the characters "=20" or just "=" at seemingly random places in the text of a message, especially one that has been forwarded numerous times, those characters are probably a vestige of text that had been incorrectly decoded from the quoted-printable encoding.
Support for Multipart Messages
While support for sending non-text content is an important feature in MIME, without support for multipart messages, you would not be able to send file "attachments". When you send an email with a file attachment, most email programs will create a two part message, the first part containing the text of your message and the second part containing the attached file (base64 encoded, of course!). The MIME standard specifies how these two parts are combined to form a single message.
MIME's support for multipart messages is actually quite impressive. There is no limit to the number of parts a message may have. Not only that, but parts can themselves contain multiple parts. Thus, parts can be nested inside other parts, and there is no limit to how many levels deep the nesting can be. Nesting of parts is not that common. However, it is quite common to have three or more parts. If, for example, you attached three files to your message, your email program would create a four part message: one part for the message text, three additional parts for the file attachments.
The multipart feature of MIME is used in another rather interesting way. While the most popular email programs in use today -- Outlook, Outlook Express, Netscape, Eudora, AOL, and so on -- can compose and display styled text messages (i.e. HTML email), in order to accommodate less capable email programs, text is often sent in styled text form and plain text form. To do this, the originating email program will send a two part message, the first part containing the plain text form of the message and the second part containing the styled text form of the message. If the receiving email program can display styled text, it will display the styled text version from the second part. On the other hand, if the receiving email program cannot display styled text, it will display the plain text version from the first part.
The multipart feature of MIME is useful in areas other than email. Multipart MIME messages can be thought of as compound documents. Office documents are frequently compound documents, containing text, embedded graphics, embedded spreadsheet documents, and so forth. In fact, MIME has already been adapted for use outside of the email realm as a compound document technology for web pages. As you might already know, most web pages are put together from a bunch of files: an HTML file plus a bunch of image files, and maybe other files as well. To save such a web page offline, you must save all of these files. If you use Internet Explorer, you have several options when saving a web page to your computer. (1) You can save just the HTML file. (2) You can save the HTML file and all the image files to a directory. (3) You can save everything as a single file. When you choose the last option -- to save everything as a single file -- Internet Explorer saves the HTML file and all the image files as compound document, which is actually a multipart MIME message (er, I mean document). The file has a ".mht" extension. If you want to see what this compound document file really looks like, open it with a plain text editor, such as notepad. You'll see the multiple parts separated by boundary lines, and you'll notice that the images are encoded with base64.
Internet email has come a long way since RFC 822 was published in 1982. Today, all the mainstream email programs are fully compatible with the MIME standard for email, allowing for some very advanced features and very good interoperability. The user-visible features that depend on MIME include these: styled text, text in non-Roman alphabets, file attachments, multimedia content.