The proper MHTML syntax

October 3rd, 2010. Tagged: CSS, IE, images, performance

Reducing the number of HTTP requests is a must, sprites are cool, but a pain to maintain, so there come data URIs (for all browsers) and MHTML (IE6 and 7). I've talked about these things on this blog to a point where the blog comes up in top 10 results in search engines for queries like "mhtml" and "data url". Therefore I think it's my duty to clarify a point for the good of the mankind :)

MHTML works in IE6 and IE7 even in the deadly IE7/Vista and IE7/Win7 combos

In the community we've long considered MHTML in IE7/Vista a problem and I've personally come up with complex voodoo workarounds how to mitigate the issue and still make use of the technique. Turns out the whole problem all the time was a small syntax glitch.

Pointed by a comment at a previous post all we ever needed was to close the boundary delimiter and add two dashes at the end. The double-dash of doom as I like to call them since I've spent so much time wrestling.

So let's take a look at the syntax.

Update: Also check this comment for additional insight from Vincent, Aaron and _cphr_ regarding double line break of doom

One part

MHTML is a multi-part document. One document containing several parts. One part looks like this:

Content-Location: myimage
Content-Transfer-Encoding: base64

iVBORw0KGgoAAAANSU....U5ErkJggg==

In other words it has headers, base64-encoded content and two empty lines to divide them.

Multi parts

The different parts in the document are divided by a separator string. And at the top of the document you define what this separator is. Anything you like. So

Content-Type: multipart/related; boundary="MYSEPARATOR"

--MYSEPARATOR

[here comes part one]

--MYSEPARATOR

[here's part two]

--MYSEPARATOR--

Did you notice -- at the very end? Yes, this is the double-dash of doom. Forget it and you get IE7/Vista problems (only on cached documents) and permanent hair loss. The thing is that in IE6 and other IE7s you can omit the whole last separator and it's all good. So historically you never needed it, but come Vista and Win7 and problems start.

All together now

Finally, let's see the whole thing, a whole CSS file, including the way you refer to the parts later on in the CSS.

/*
Content-Type: multipart/related; boundary="MYSEPARATOR"
 
--MYSEPARATOR
Content-Location: myimage
Content-Transfer-Encoding: base64
 
iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAMAAADXqc3KAAAD....U5ErkJggg==

--MYSEPARATOR
Content-Location: another
Content-Transfer-Encoding: base64
 
iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAMAAADXqc3KAAAA....U5ErkJggg==

--MYSEPARATOR--
*/
.myclass {
    background-image:url(mhtml:http://example.org/styles.css!myimage);
}
.myotherclass {
    background-image:url(mhtml:http://example.org/styles.css!another);
}

Updated PHP class

Previously when I fought IE7/Vista I came up with a PHP class that would take some images and create "data sprites" on the fly, creating two different versions - one with data URIs and one with MHTML, depending on the browser. The old code is here.

Now, I've updated it (basically just deleted serious portions of it that dealt with Vista) and put it up on github. Right here.

Updated test pages

Thanks for reading, that's about it. Now off I go to correct the older posts, catch ya later :)

Tell your friends about this post: Facebook, Twitter, Google+

26 Responses

  1. Nice. I need to try it.

    Are any of the large sites using this technique?

    Also, do you know what are the data URI limits in the major browsers.

    Thanks

  2. Very cool. Sad that it turned out to be such a small detail but awesome that it was discovered.

  3. Hi, nice to have a clean and bug-free class for generating data uris.

    But, we should not use php browser detection and use conditionnal comments instead. That way we have two separated uris for css.

    Because it would be a lot easier to implement on a large website that uses cache for example (and you better not have a separated cache for ie/others, your sys admin will not like it…).

    Doing browser detection on the server side is not a good idea (nor on the client side). I also understand that your class is more of a proof of concept than a real world example, but just for readers out there, you could modify it and use it like this:

    http://gist.github.com/608421

    Smart code taken from http://duris.ru/. Also do not use query string in css url like my example, use a .htaccess ! :)

  4. Thank you, Stoyan. That’s really useful to know. We’ll add this into the Sqweeze packaging tool.

    Is there a way of avoiding the absolute, external URL? It seems pretty redundant to have this, when the data is contained inline. But I don’t think it can be omitted, or incorrect, can it?

  5. Are you relying on compression to remove the duplication between support for FF/Chrome etc. and IE6/7?

  6. Your original post inspired me to experiment with this last year. That syntax was buried on an old msdn article about MTHML. I didn’t try it on anything but XP, so I was unaware that was the fix for Vista/Win7. In my case, I used it to create an external mht file which I referenced in my CSS instead of inlining the data. That prevented other browsers from downloading the commented text. It would be great if data:uris had that capability so that we could hide them from IE6 and IE7 without using a separate stylesheet. Anyway, thanks for clarifying this solution for those of us who are using this technique.

  7. BREAKING NEWS // DOUBLE LINE BREAK OF DOOM

    When using http://www.phpied.com/files/mhtml/mhtml-fixed.css, I couldn’t make mhtml working.

    I first thought it was a server related problem because your test page worked.. Well NO! your test page doesn’t work for mhtml sorry. (win xp ie7 wpt : http://www.webpagetest.org/result/101116_6b69205c5d48549949df17d75e291588/)

    Because I copy pasted your test page I was all wrong since the beginning (my bad).

    The test page lacks double line break (a blank line) after base64 encoded images in the MHTML part of the CSS.

    The start of the blog post isn’t wrong, it had the blank line. But the “All together now” section is wrong and the test page is wrong also.

    Bonus points:
    - In your test page, the domain name is http://www.phpied.com, in the CSS file your refers to MHTML with no www. Which means it is a different resource for IE and so resulting in double CSS file download.
    The double download also happens if you have a different query string.
    In fact if something changes in the mhtml call from the original css file where the mhtml is, it is a different resource and is downloaded twice.

    - As I did a lot of tests, I can tell you that the double line break is only needed for PNG files, GIF files in mhtml are ok without this double line break (and I do not want to know why)

    - You owe us, and all others readers that just copy pasted your work, beers

    Thanks :
    http://twitter.com/aaronpeters (helped a lot on testing)
    http://twitter.com/_cphr_ (discovered the “blank line” problem when working on the webperf contest website)

    Hours wasted : 4

    (I got a lot of bad words from Aaron but I will not add them here :D )

  8. [...] you can embed base 64 image data in the same CSS file! MHTML is a quite ugly and easy to break with minor issues such as missing carriage returns. Fortunately CSS Embed takes care of this for you. You never even have to look at the [...]

  9. [...] While this post is an interesting study, the problem it solves turns out to be much simpler. The details are here. In resume: you need a closing separator and it all works fine in [...]

  10. Hi Stoyan,

    Thanks! Your orignal post really inspired me.

    Before seeing this blog post, I did wrote a command line tool which is similar with yours:
    http://github.com/josephj/dataurize

    And a web interface for others to test:
    http://josephj.com/lab/dataurize/web/demo.php

    In my opinion, I don’t think it’s necessary to make browser detection because it’s still worthy if CSS files become larger. After encoding my CSS files using Data URIs and MHTML, the total download time and page onload time get obviously decreased. For a real world, however, it’s not suitable to encode large images, the file size will become inacceptable. So I add an option –size-limit (default is 2KB) to avoid encoding large images.

    Now I use this tool during our product deployment process. It seems works well and gets better performance without CSS Sprites. :D

  11. [...] For details on how to use mhtml see the proper mhtml syntax. [...]

  12. it looks like microsoft broke this with “important security update” kb2503658 (details: http://www.microsoft.com/technet/security/bulletin/MS11-026.mspx ).

    after installing kb2503658 on installations of both ie6 and ie7, your test page breaks (the background images are not shown). after uninstalling kb2503658, the test page works again.

  13. +1 with sbrudenells comment. I have problems with MHTML rendering in IE7, which I haven’t experienced before!

  14. [...] Explorer browsers. For more background, Stoyan Stefanov has written a couple of good posts here and here.) Obviously, browser bugs are not the most exciting thing in the world, but as a [...]

  15. IMPORTANT: as of the Jun 11 MHTML security update KB2544893 (also called KB2503658 AIUI), you *MUST* deliver the stylesheet with the MIME type “message/rfc822″, otherwise it will be ignored (including in IE6/7 on XP). That can be done via server configuration or server scripting. On IIS (and poss others) you can simply give your stylesheet the extension “.mht” and that will do the trick.

    IE8 will then also work, FWIW. IE9/10 work too, but present an extra hurdle: the MHTML content must be MIME type “message/rfc822″ (per above), but stylesheets must be “text/css”. That means you can’t have both the MHTML images and CSS rules in the same file – you must use a .css file that references a separate .mht file, then will work perfectly.

    Aside: I tried sniffing the HTTP Accept header and returning a different MIME header accordingly, but IE then treats the file as separate resources and downloads it twice, so better off just using physically separate files.

    Why would we want to use MHTML in IE8+ when “data:” URIs are supported instead? For most use cases, no reason. I do like the fact that you can use images from the cached MHTML pool in inline IMG tags (etc) though – you can’t do that with “data:” URI images (unless use Javascript). e.g.:

    Or how about a script that automatically builds up a collection of the images requested and adds them to the MHTML cache pool:

    Interesting potential; I like the ability to re-use the image wherever you like without JS, just by referencing its name. To emulate this in other browsers, you’d have to use Javascript to cache the base64 data into the new localStorage “super cookie”.

  16. [...] work in IE7, a browser that we have to support (No, we don’t need IE6). I’ve followed Stoyan’s guide and actually gotten it to work, but after a recent Microsoft security update (KB2544893, as [...]

  17. I seem to be running into the same double-dash-of-doom problem in Chrome v20. I’m having heaps of problems generating files that will display at all just now (e.g. need to use Windows line endings), but my initial tests suggest the double dash is one of the problems… Will post more here as I nail it down :)

  18. Really sorry – but I’m having no joy from this at all.

    I’ve noted in the comments the update and the IE9/10 aspects.

    1 HTML file
    1 CSS file
    1 MHT file

    HTML references the CSS
    CSS references the MHT

    IE6 and IE7 both show blank – no background image present.

    IE8 seems to accept the “normal” dataURI … but doesn’t see the MHT version.

    Can anyone confirm and show this working?
    Or should I simply blow IE, leave them with individual image calls, and ensure the DataURLs for better browsers?

  19. [...] careful with the separators here, or you will have issues with Vista / Windows 7. The boundary declaration can be used to define any separator string you want, however be sure to [...]

  20. [...] UPDATE: It’s very important to have a closing separator in the MHTML document, otherwise there are known issues in IE7 on Vista or Windows 7. The details are here. [...]

  21. I tried using the below code but some home the background image is not showing up in my output. Mine is IE 7. Please let me know if I am making some mistakes any where…

    mhtml test page

    /*
    Content-Type: multipart/related; boundary=”_ANY_STRING_WILL_DO_AS_A_SEPARATOR”
    –_ANY_STRING_WILL_DO_AS_A_SEPARATOR
    Content-Location:locoloco
    Content-Transfer-Encoding:base64

    iVBORw0KGgoAAAANSUhEUgAAAAQAAAADCAIAAAA7ljmRAAAAGElEQVQIW2P4DwcMDAxAfBvMAhEQMYgcACEHG8ELxtbPAAAAAElFTkSuQmCC
    –_ANY_STRING_WILL_DO_AS_A_SEPARATOR–
    */

    #test1 {
    background-image: url(“”);
    *background-image: url(mhtml:http://phpied.com/files/mhtml/mhtml.css!locoloco);
    }

    div {
    width: 100px;
    height: 100px;
    font: bold 24px Arial;
    }

    test #1

  22. in the above I included the style as part of my html code itself and not separate .css file. I tried putting style in the separate .css file but with the same result… I mage is not showing up

  23. [...] de le faire dans le CSS. Vous pouvez utiliser cette technique à partir de IE8, et envisager l’équivalent MHTML pour [...]

  24. My HTC Incredible used to be able to upload photos with no problem. Since the latest update I get an error every time I try. I tried reinstalling but am getting the same error.

  25. Wow that was unusual. I just wrote an incredibly long comment but
    after I clicked submit my comment didn’t appear. Grrrr… well I’m not writing all that over again.
    Anyways, just wanted to say superb blog!

  26. hey there! , I like your creating and so a whole lot! percent most of us communicate excess about your article for AOL? I require an authority within this dwelling to resolve the issue. Possibly that is definitely a person! Taking a look forward to see you.

Leave a Reply