Caching vs. inlining / Stoyan's phpied.com

2010 update:
Lo, the Web Performance Advent Calendar hath moved

Dec 10 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the next articles.

Looking back at the life of Page 2.0. and all the opportunities for optimization, the posts I've put up so far as part of this advent calendar have been mostly around optimizing the waterfall - making it shorter by having fewer components in it. This post will be probably the last in this area of optimization. After that let's talk about having smaller components and then move away from the waterfall and talk about what happens next - rendering, user interaction, JavaScript...

Caching

Caching comes into the picture for repeat page visits. When the user comes to your page for the second time, wouldn't it be beautiful if all components are in the cache and the waterfall is really short?

Here are the two views - first and repeat - from visiting the same page.

first view	repeat view

Ideally the repeat views should be with full cache and everything should be loaded from the local disk. That's what many people still assume: "Who cares about CSS? It's in the cache. Images? How do you mean sprites? But it's all in the cache, right?" Wrong. There are surprisingly few visits with full cache that we're used to believe.

This research demonstrates that on an extremely popular site - yahoo.com - about 50% of the visits are empty cache visits and 20% of all page *views*.

You cannot control what the users do what their cache, but you can suggest that their browsers keep a copy of your files for as long as they can.

"Never expire" policy

The never expire policy is to set a far future Expires (or Cache-Control) header effectively saying to the browser - hey this copy is good for a long time, keep it and reuse it.

The way to set the expires header for your static components is to tweak the Apache configuration (in .htaccess if your host gives you no other options) for example like so:

ExpiresActive On  
ExpiresByType application/x-javascript "access plus 10 years"  
ExpiresByType text/css "access plus 10 years"
ExpiresByType image/png "access plus 10 years"

Now if you access a PNG on Dec 10th 2009, Apache will add this header to the HTTP response:

Expires: Sun, 10 Dec 2019 05:47:47 GMT

The browser should never request this file again till 2019, effectively "forever".

The drawback is, of course, that you cannot modify this file anymore, since some users have already cached it forever and ever till the end of 2019. If you need to change the file, you have to save it under a different name and update all references to it. For the new name some use consecutive numbers, some use timestamps, some even hashes of the content, so that the name reflects the actual body of the component.

Caching vs. inlining

External components have a shot at being cached. But on the other hand having inline components (script, styles) means having fewer HTTP requests. Which one is better?

It depends from one case to the next but generally inlining is better for first visits (for homepages) and external components are better for repeat visits. But homepages are so 1998, with the modern search engines people find deep content pages and don't browse from the homepage. There are exceptions, of course. Some pages are common user homepages (the first page when the browser loads).

Anyway - inline vs. external? How about both?

Inlining, then caching

During first visits you can inline some components, then once the page is loaded, lazy-load the same components but this time as external files (and proper far-future Expires date).

For repeat visits, you only link to the external components as they are already in the cache!

So how do you determine first visits? How can you tell that the user's cache is empty?

You can use cookies and assume missing cookie means empty cache. Drawbacks: 1. someone can delete cache without deleting cookies and 2. what happens when the external component updates

Both of these scenarios are not so bad, because the page will still work, only less optimally. And you can take care of 2. by introducing (complexity and) a component version information in the cookie.

Oh, and drawback 3. - writing longer term cookies increases the size of your headers.

Another option is to track sessions only and assume empty cache for all new sessions. This way you don't need longer-term (and unreliable, see drawback 1.) cookies. In this case the problem might be that you may be missing out on optimizing the first visit in the session. The user may have visited the site yesterday, closed the browser and there she comes today with the full bag of cached components. But you assume first visit = empty handed visit, so you don't benefit from the components already in the cache during the first page view.

Which case is the right for you? Only your logs and experimentation can tell.

For an example in the wild, check Bing's search results. The first visit in the session you get a number of inline stylesheets, which then get lazy-loaded as external files. Second search in the same session and you get the stylesheets as external files.

Thanks for reading!

And that concludes the "fewer HTTP requests" part of the calendar. Tomorrow's topic will be about making smaller those components that slip through the cracks of the HTTP reduction effort.

Comments? Find me on BlueSky, Mastodon, LinkedIn, Threads, Twitter