The techniques used to make it faster are simple and effective: better resource packaging, reducing number of requests, inlining CSS, reducing the amount of CSS and JS by untangling dependencies, cleaning up, and sometimes simply rewriting. You can read more about these in the previous posts.
Now the results. The before-after comparison is accurate because it uses snapshots taken at the same time. This is possible because we kept an old endpoint serving the old code path. The official URL is /plugins/recommendations.php but we kept the legacy URL /widgets/recommendations.php pointing to the old code for a little while.
The total payload change is drastic. Mainly due to better packaging. And better JS modules and dependencies.
The number of requests is reduced by 1/3. Not as drastic, but not too bad either. Most of the requests are images, which is ok. They don't block anything and whenever they arrive, they are welcome. But they are not on the critical path. The reduced requests are all JavaScript. We already had a previous optimization so CSS was already inlined. But thanks to rewrites, now the HTML payload (including inline CSS) is 6.2K gzipped, down from 9.4K. Which means the initial paint can start sooner.
The initial paint (render start) now happens in half the time. And, even better, the initial paint is a complete plugin, except for the images. While before it was just a partial content. This is because CSS is here early and all JS is out of the way (async loaded, also take a peek here)
The fully loaded time is not all that important since the user has a usable list of recommendations already delivered with the initial paint. But it's still 2x faster which makes me happy.
Just want to take a second to mention how good it feels to be working on such high-impact performance optimizations. These social plugins are everywhere on the web. By making them faster I am fortunate to have the opportunity to make the whole web faster. Meaning make millions of sites faster, affecting the live of billions of people, every day.
What can I say, Facebook is a great place to work. The people, the impact. Every line you write matters. It's also up to you to pick what do you want to work on and where your talents and interests will have the greatest impact. And then there are the hackathons and hackamonths which means even more freedom.
I recently finished a hackamonth project, which explains why I've been silent here and on Twitter and everywhere (yet, thanks to O'Reilly folks, even though I missed a few deadlines, we were able to push this baby out the door). Let me tell you - a hack-a-month is better than vacation. Being left alone for a month to explore a completely new (to you) territory - priceless!
(Oh, if that sounds something you'd like to do, hit me up on ssttoo at ymail with your resume. FB now has engineering offices in NYC, Seattle and London, so if moving was a problem, now there are more options)
This excellent Google I/O talk mentions that Chrome for Android moves the CSS animations off of the UI thread, which is, of course, a great idea. Playing around with it, here's what I found:
In non-supporting browsers, which is most of them, the kill switch kills all the animations. Business as usual.
In the supporting browsers (All Safaris and Andriod Chrome) the kill only affects the blue button, the one that animates a CSS property, as opposed to using a CSS transform. But the animations that use a transform keep on going!
Take aways
Rejoice! The future is here! Drink and dance uncontrollably around the campfire!
After you sober up, make sure your CSS animations use transform: where possible
Problem: too much JavaScript in your page to handle 3rd party widgets (e.g. Like buttons) Possible solution: a common piece of JavaScript to handle all third parties' needs
What JavaScript?
If you've read the previous post, you see that the most features in a third party widget are possible only if you inject JavaScript from the third party provider into your page. Having "a secret agent" on your page, the provider can take care of problems such as appropriately resizing the widget.
Why is this a problem?
Third party scripts can be a SPOF (an outage), unless you load them asynchronously. They can block onload, unless the provider lets you load it in an iframe (and most don't). There can be security implications because you're hosting the script in your page with all permissions associated with that. And in any case, it's just too much JavaScript for the browser to parse and execute (think of mobile devices)
If you include the most common Like, Tweet and +1 buttons and throw in Disqus comments, you're looking at well over 100K (minified, gzipped) worth of JavaScript (wpt for this jsbin)
This is more than the whole of jQuery, which previousexperiments show can take the noticeable 200ms just to parse and evaluate (assuming it's cached) on an iPhone or Android.
What does all of this JS do?
The JavaScript used by third parties is not always all about social widgets. The JS also provides API call utilities, other dialogs and so on. But the tasks related to social widgets are:
Find html tags that say "there be widget here!" and insert an iframe at that location, pointing to a URL hosted by the third party
Listen to requests from the new iframes fulfill these requests. The most common request is "resize me, please"
Now, creating an iframe and resizing it doesn't sound like much, right? But every provider has to do it over and over again. It's just a wasted code duplication that the browser has to deal with.
Can't we just not duplicate this JavaScript? Can we have a common library that can take care of all widgets there are?
C3PO draft
Here's a demo page of what I have in mind. The page is loading third party widgets: like, tweet, +1 and another one I created just for illustration of the messaging part.
It has a possible solution I drafted as the c3po object. View source, the JS is inline.
What does c3po do?
The idea is that the developer should not have to make any changes to existing sites, other than remove FB, G, Tw, etc JS files and replace with the single c3po library. In other words, only the JS loading part should be changed, not the individual widgets code.
c3po is a small utility which can be packaged together with the rest of your application code, so there will be no additional HTTP requests.
Parsing and inserting iframes
The first task for c3po is to insert iframes. It looks for HTML tags such as
The only additional parameter passed to the third party URL is cpo-guid=..., a unique ID so that the iframe can identify itself when requesting services.
The parsing and inserting frames works today, as the demo shows. The only problem is you don't know how big the iframes should be. You can guess, but you'll be wrong, given i18n labels and different layouts for the widgets. It's best if the widget tells you (tells c3po) how big it should be by sending a message to it.
X-domain messaging
What we need here is the iframe hosted on the provider's domain to communicate with the page (and c3po script) hosted on your page. X-domain messaging is hard, it requires different methods for browsers and I'm not even going to pretend I know how it works. But, if the browser supports postMessage, it becomes pretty easy. At the time of writing 94.42% of the browsers support it. Should we let the other 5% drag us down? I'd say No!
c3po is meant to only work in the browsers that support postMessage, which means for IE7 and below, the implementers can resort to the old way of including all providers' JS. Or just have less-than-ideally-resized widgets with reasonable defaults.
When the widget wants something, it should send a message, e.g.
The c3po code that handles the message will check the GUID and the origin of the message and if all checks out it will do something with the iframe, e.g. resize it.
As you see in the demo, only the example widget is resized properly. This is because it's the only one that sends messages that make sense to c3po.
Next step will be to have all widget providers agree on the messages and we're good to go! The ultimate benefit: one JS for all your widget-y needs. One JS you can package with your own code and have virtually 0 cost during initial load. And when you're ready: c3po.parse() and voila! - widgets appear.
Of course, this is just a draft for c3po, I'm surely missing a lot of things, but the idea is to have soemthing to start the dialogue and have this developed in the open. Here's the github repo for your forking pleasure.
or "How to help your users share your content on Facebook and not hurt performance"
Facebook's like button is much much faster now than it used to be. It also uses much fewer resources. And lazy-evaluates JavaScript on demand. And so on. But it's still not the only option when it comes to putting a "share this article on Facebook" widgety thing on your site.
The list of options is roughly listed in order of faster (and least features) to slowest (and most features).
#1: A share link
Note that this feature has been deprecated but it still does work. And you see it all over the place.
A simple link to sharer.php endpoint is all it takes. The u parameter is your URL. E.g.:
<ahref="https://www.facebook.com/sharer/sharer.php?u=phpied.com"target="_blank">
Share on Facebook
</a>
The above is a hardcoded URL. You can, of course, spit the current URL on the server side. A JS-only client-side solution could be to take the document.location. You can also pop a window. And use a button, or an image. Say something like:
This is just a link you host in your HTML or bit of JavaScript you can inline or package with your own JavaScript (it is, after all, your own JavaScript)
#2: Feed dialog
The feed dialog a next incarnation of the share popup.
You need a redirect_uri which can be something like a thank you page. But instead of "thank you", you can simply go back to the article by making redirect_uri and link point to the same URL
Again, a client-only solution could be something like:
But this feed dialog can also be a popup. You do this by adding &display=popup. This hides the FB chrome. And you can also make the "thank you" page just a simple page that closes the window.
Try it:
The result:
The other required thing is the app id. You need one. But that's actually cool because it has side benefits. For example better error messages for you (the app admin) that the users don't see. It also gives you a little "via phpied.com" attribution linked to the App URI which is a nice traffic boost hopefully as your sharer's friends see the story in their newsfeed or timeline and click the "via".
Additionally there's a bunch of other params you can pass to the feed dialog to control how the story is displayed. You can provide title, description, image, etc. Full list here.
Method #2's performance price: none
Feed dialog has the same (non-existing) performance requirements as the share links. It's all inline. Any content coming from Facebook is only on user interaction.
BTW, this is the method youtube currently uses.
#3: Feed dialog via JS SDK
Now we move on from simple links and popups to using the JavaScript SDK.
As you can see, this is now a real properly resized popup. No FB chrome, nice and clean. In general the JS SDK makes everything better. But you need to load it first - the performance price you pay for all the magic.
Method #3's performance price: an async JS
Opening the feed dialog this way requires you to load the Facebook JavaScript SDK. It's one JS file with a short expiration time (20 mins). When it loads, it also makes two additional requests required for cross-domain communication. These requests are small though and with long-expiration caching headers. Since the JS SDK is loaded many times during regular user's surfing throughout the web, these two additional requests have a very high probability of being cached. So is the JSSDK itself. If not cached, at least it's a conditional requests with likely a 304 Not Modified response.
Here's the waterfall of loading the jsbin test page where you can see the JS SDK loading (all.js) and the two x-domain thingies (xd_arbiter.php)
Note that by default the JS SDK sends an additional request checking whether the user is logged in. If you don't need that, make sure you set the login status init property to false, as shown in the test page, like:
FB.init({appId: 179150165472010, status: false});
When loading the JS SDK you must absolutely make sure it's loaded asynchronously, and even better - in an iframe, so the onload of your page is never blocked.
#4: Like button in an iframe
We're coming to the Like button. There are two ways to load it: either you create an iframe and point it to /plugins/like.php or you include the JS SDK and let the SDK create the iframe. Let's take a look at the you-create-iframe option first.
The integration is straightforward: You go to the help page, use the "wizard" configurator found there and end up with something like:
The button comes in three layouts: standard (biggest), box_count and button_count
Try it:
Standard
Box count
Button count
As you can see, you get quite a bit more features here, e.g. number of likes and social context (who else has liked) in the standard layout. Also in the standard layout you get a little comment input. You don't get one in the other layouts because there's no space in the little iframe. You define the iframe and the code inside the iframe cannot break out of it and do something wild (or useful), e.g. open a big commenting dialog. Or make the iframe bigger because the word "Like" may be significantly longer in some languages. When you "trap" the iframe in your dimensions, it stays there.
Method #4's performance price: iframe content
In this method every time someone loads your page, they also visit a page (like.php) hosted by facebook.com. Now, this page is highly optimized: it only has html, sprite and async lazy-executed JS (which doesn't block onload). 3 requests in total. Maybe some faces (profile photos), depending on the layout and whether the user's friends have liked the URL.
As you probably know, every iframe's onload blocks the parent window's onload. So, if you feel so inclined you can always do any old lazy-load trick in the book. E.g. create the iframe after window.onload, or "double-frame" it, or (for the webkits out there) write the iframe src with a setTimeout of 0.
Another thing to consider is to always load the iframe via https, so there's no http-https redirect if the user has opted to always use facebook via https.
#5: Like button via SDK
This is building on what you already know about #3 and #4: You load the SDK. You sprinkle <fb:like> (or <div class="fb-like">) where you want buttons to appear. The SDK finds these and replaces them with iframes.
<!-- all defaults --><fb:like></fb:like><!-- layout, send button --><divclass="fb-like"data-send="true"></div>
If you don't need to specify the URL to like, it's the current page.
Try it:
Standard
box count
button count
This is the most full-featured button implementation. It will resize the button as required by content and i18n. It will always present a comment dialog. (When people share with their own comment, these stories do better, because it's always nice to see a friend's comment attached to a URL, right?)
The good thing about this method is that you can load any other FB plugin (e.g. follow button by just adding an fb:follow in the HTML) without re-loading the SDK, it's already there and can handle all the plugins, dialogs and API requests.
Combining the features of methods #3 and #4 also combines their perfromance impact. Again, the like.php iframe is heavily optimized and tiny. Also the SDK has a chance of being cached from the users visit on another page. And, of course, you always load the SDK asynchronously so it's impact on your initial page loading is minimal. Or load the SDK in an iframe so the impact is virtually 0.
So the total cost in terms of number of requests in empty cache view is 6. 3 from the iframe + 3 from the SDK. Full cache view should be 1 request - just the like.php frame with the current count, faces and so on.
But again, to minimize the impact, you just load the SDK in an iframe (so the whole widget doesn't block onload and doesn't SPOF) or asynchronously (so it doesn't SPOF and doesn't block onload in IEs)
Summary
#
Method
Features
Cost
1
Share link
link opens popup, no like count, no social context
none
2
Feed dialog
link opens page, no like count or context. You can pass customized description, image, etc for the story. Up to you to do a "thank you" page.
none
3
Feed via SDK
properly resized popup, JS control over the flow. No like count or context
Loading JS SDK
4
Like button in your frame
like count, social context, but no i18n resizing, comment option only sometimes
like.php iframe (3 requests)
5
Like button via SDK
All features plus proper resizing, comment dialog, easier to implement via fb:like tags in HTML
like.php + SDK
I mentioned a few times in the article but let me repeat once again for the TL;DR folks. If you're loading the JS SDK, it's absolutely mandatory that you make sure it's either loaded asynchronously to avoid SPOF, or even better - in an iframe to avoid blockingonload.
1. You write a new test to confirm a JavaScript-related performance speculation
2. You click
3. Your test runs in a bunch of browsers
Glossary
JSperf.com is the site where all you JavaScript performance guesswork should go to die or be confirmed. You know how the old wise people say "JSperf URL or it didn't happen! Now off my lawn!". Yup, that jsperf.com
WebPagetest.org (WPT) is the site where you get answers to the ol' question: "Why do people say my oowsome site is slow? And what should I do about it?"
Bookmarklet is a little piece of JavaScript you conveniently access from your browser bookmarks and inject into other non-suspecting sites.
Bookmaker tool makes a bookmarklet from a .js file URL (probably hosted on github)
Trouble in paradise
These days we're so happy and spoiled with all these amazing tools around us. And yet, when you create a JSPerf test, you have to open all these browsers and run the test everywhere. Even IE. And, when on Mac, IE is usually not readily available. Plus it comes in a bunch versions - from almost-but-not-quite-forgotten IE6, all the way to IE10 The Greatest - and they have different, sometimes contradicting, performance characterics.
To the rescue: WPT
WebPagetest has: a/ ability to run in a bunch of browsers and b/ an API
It starts by inquiring about your WPT API key. I know, you have to get one. You can read the API docs on how to get one, but let me save you the trip: you just need to ask pmeenan@[the tool's domain].org for a key. Politely. Tell him I sent you. Promise not to abuse.
Looks like something somewhere on jsperf is doing window.prompt = function(){}, same for window.open and probably others. Makes sense, you don't want popup-y stuff (by the thousands) while running a test a gazilion times. So the bookmarklet has to go the window.__proto__ for the original prompt
Moving on.
Setting up the constant params of the API call. The variable param will be the location which will tell what browser to use. We also give the (undocumented) time a value of 60s, so that the test has time to run. We also want only one run and just the first run (no full cache run).
The URL to test will be the current page loaded in jsperf.com which is where you run the bookmarklet. And we'll append #run for autorun.
1. Go to any jsperf test, e.g. http://jsperf.com/array-proto-vs/3
2. Click the bookmarklet
3. Observe 5 new tabs with 5 IE versions running your test!
More browsers
In addition to the browsers (locations) I've defined you can always add more, like Chrome and Firefox. However you probably have these already handy so no need to kill WPT's servers. But the option is there, just edit your localStorage.wpt_locations
This post brought to you via Facebook engineers Jeff Morrison and Andrey Sukhachev, who discovered and helped isolate the issue.
Use case
Think a "single page app" use case. You click a button. Content comes via XHR. But content is complex (and app is as lazy-loading as possible) and content requires extra CSS. In an external file.
Only when the external CSS arrives should the app show the content. Otherwise content will be weirdly styled.
Execution
Two "modules" (or "widgets") of the app require two different CSS files. Both modules are requested at about the same time. We listen to onload of the CSS files. Expected behavior: whenever a module and its CSS dependency arrive - show that module. Asynchronously. No one cares which module shows first, as long as they show up as soon as possible.
Experimentation
Two modules. Two CSS files. 1st CSS happens to take one second. The second CSS takes 5 seconds.
You know that browsers batch layout and paints tasks because these tend to be expensive. For example they wait for all CSS (even useless print and other @media stylesheets) to arrive and block the rendering of the page. (More on these topics: here, here)
So turns out that here webkit also waits for both CSS files to arrive before rendering anything.
You see in the console we know (in JavaScript) that CSS has arrived. But webkit (chrome, safari, mobile safari) doesn't paint anything, waiting for the second CSS. Bummer!
Issue #2: painting unstyled content
While issue #1 is just a bummer that can be done better for the progressive feedback-y user experience, #2 is a bug. This is the issue that Jeff and Andrey found and were floored.
If there's a paint going on between the two stylesheets, the browser dumps the unstyled content on the page. Ugly stuff.
This was only happening sometimes, but after forming and testing an hypothesis, I was able to distill a reproducible test case. The only change is: after the load of the first CSS, flush the rendering queue by requesting a style information. e.g.
I was faced with "registration wall" trying to file a webkit bug, hence this post. Someone please file a bug and feel free to use the provided test cases.
The solution, IMO, is to make webkit behave like FF. No waiting for all CSS. This solves both issues. In the worst case, at least the unstyled bug (issue #2) should be addressed.
Interim solution for web developers: inline CSS required by the module together with the module content.
Problemo: there's no such information in HTTPArchive. However there's table requests with a list of URLs as you can see in the previous post.
Solution: Get a list of 1000 random jpegs (mimeType='image/jpeg'), download them all and run imagemagick's identify to figure out the percentage.
How?
You have a copy of the DB as described in the previous post. Now connect to mysql (assuming you have an alias by now):
$ mysql -u root httparchive
Now just for kicks, let's get one jpeg:
mysql> select requestid, url, mimeType from requests \
where mimeType = 'image/jpeg' limit 1;
+-----------+--------------------------------------------+------------+
| requestid | url | mimeType |
+-----------+--------------------------------------------+------------+
| 404421629 | http://www.studymode.com/education-blog....| image/jpeg |
+-----------+--------------------------------------------+------------+
1 row in set (0.01 sec)
Looks promising.
Now let's fetch 1000 random images, while at the same time dump them into a file. For convenience let's make this file a shell script so it's easy to run. And the contents will be one curl command per line. Let's use mysql to do all the string concatenation.
One way to do web performance research is to dig into what's out there. It's a tradition dating back from Steve Souders and his HPWS where he was looking at the top 10 Alexa sites for proof that best practices are or aren't followed. This involves loading each pages and inspecting the body or the response headers. Pros: real sites. Cons: manual labor, small sample.
I've done some image and CSS optimization research grabbing data in any way that looks easy: using the Yahoo image search API to get URLs or using Fiddler to monitor and export the traffic and loading a bazillion sites in IE with a script. Or using HTTPWatch. Pros: big sample. Cons: reinvent the wheel and use a different sampling criteria every time.
Today we have httparchive.org which makes performance research so much easier. It already has a bunch of data you can export and dive into immediately. It's also a common starting point so two people can examine the same data independently and compare or reproduce each other's results.
Let's see how to get started with the HTTP archive's data.
(Assuming MacOS, but the differences with other OS are negligible)
1. Install MySQL
2. Your mysql binary will be in /usr/local/mysql/bin/mysql. Feel free to create an alias. Your username is root and no password. This is of course terribly insecure but for a local machine with no important data, it's probably tolerable. Connect:
$ /usr/local/mysql/bin/mysql -u root
You'll see some text and a friendly cursor:
...
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
While you're there get the latest DB dump. That would be the link that says IE. Today it says Dec 15 and is 2.5GB. So be prepared. Save it and unzip it as, say, ~/Downloads/dump.sql
Hm, I wonder what are common mime types these days. Limiting to 10000 or more occurrences of the same mime type, because there's a lot of garbage out there. If you've never looked into real web data, you'd surprised how much misconfiguration is going on. It's a small miracle the web even works.
So the web is mostly made of JPEGs. GIFs are still more than PNGs despite all best efforts. Although OTOH (assuming these are comparable datasets), PNG is definitely gaining compared to this picture from two and a half years ago. Anyway.
It's you time!
So this is how easy it is to get started with the HTTPArchive. What experiment would you run with this data?
Asynchronous JS is cool but it still blocks window.onload event (except in IE before 10). That's rarely a problem, because window.onload is increasingly less important, but still...
At my Velocity conference talk today Philip "Log Normal" Tellis asked if there was a way to load async JS without blocking onload. I said I don't know, which in retrospect was duh! because I spoke about Meebo's non-onload-blocking frames (without providing details) earlier in the talk.
Stage fright I guess.
Minutes later in a moment of clarity I figured Meebo's way should help. Unfortunately all Meebo docs are gone from their site, but we still have their Velocity talk from earlier years (PPT). There are missing pieces there but I was able to reconstruct a snippet that should load a JavaScript asynchronously without blocking onload.
The demo page is right here. It loads a script (asyncjs1.php) that is intentionally delayed for 5 seconds.
Features
loads a javascript file asynchronously
doesn't block window.onload nor DOMContentLoaded
works in Safari, Chrome, Firefox, IE6789 *
works even when the script is hosted on a different domain (third party, CDN, etc), so no x-domain issues.
no loading indicators, the page looks done and whenever the script arrives, it arrives and does its thing silently in the background. Good boy!
* The script works fine in Opera too, but blocks onload. Opera is weird here. Even regular async scripts block DOMContentLoaded which is a shame.
Drawback
The script (asyncjs1.php) runs is in an iframe, so all document and window references point to the iframe, not the host page.
There's an easy solution for that without changing the whole script. Just wrap it in an immediate function and pass the document object the script expects:
(function(document){document.getElementById('r')... // all fine})(parent.document);
How does it work
create an iframe without setting src to a new URL. This fires onload of the iframe immediately and the whole thing is completely out of the way
style the iframe to make it invisible
get the last script tag so far, which is the snippet itself. This is in order to glue the iframe to the snippet that includes it.
Say hello to the 3PO extension for YSlow. It checks your site for integration with popular 3rd parties, such as Facebook, Twitter widgets, Google Analytics and so on.
3PO (3rd party optimization) extension currently has 5 checks: two of them generic to all 3rd parties and three specific to Facebook plugins. I'm looking forward to adding more checks and more specific to a particular provider's best practices.
The extension is currently available as a bookmarklet, but since YSlow is a platform available on many platforms, it can be built as a Firefox or Chrome extension, command line tool, etc.
Install
Click this link to test, or drag to your bookmarks to install
Here's a the list of checks along with some explanation.
Load 3rd party JS asyncrhonously
Category: Common
Use the JavaScript snippets that load the JS files asynchronously in order to speed up the user experience. Most providers offer you an asynchronous version of the script you're including on your page. If they don't, let them know and meanwhile do it yourself
If you don't include the script asynchronously, you create a SPOF (Single Point of Failure) and your site effectively goes down when the 3rd party goes down. See for yourself.
Load the 3rd party JS only once
Category: Common
Loading the 3rd party JS files more than once per page is not necessary and slows down the user experience. Sometimes people copy-paste snippets multiple times on the page, e.g. when you have one widget per blog post in a blog post listing. The script only needs to load once and serve multiple widgets.
Define XML namespace
Category: Facebook
If you use tags like <fb:like> you need to define an XML namespace to make the plugin work in old IE versions. Same for any tag that has :
Add an #fb-root element
Category: Facebook
The Facebook JS SDK needs an element with id="fb-root". So add this to your page, before you include the Facebook JS SDK
<div id="fb-root">
Include OG (Open Graph) meta tags
Category: Facebook
Open graph tags let you better describe your content. To learn more, see the documentation. And run the tool to validate your page.
All the HTML tags are in there. But do you have all the tags in the page? Unlikely. So there's excess CSS even at the very base. It usually gets much worse from here. Whole features may or may not be in the page or combined in different ways, but the CSS to handle all combinations is always there, omnipresent.
To style="" attrib
I saw today that Mailchimp has this CSS inliner tool. (Because mail clients often strip <style>). It takes the <style> tags in the markup, strips them and adds style="" attributes where applicable.
I decided to give it for a spin with Facebook like and Google search's HTML. Remember: these are two already highly optimized pages.
Assuming the tool works correctly, the results were pretty impressive.
Like: 8,133 bytes from 10,115 (20% reduction, 23% after `gzip -9`)
Search: 63,508 from 90,846 (30% reduction, 27% post gzip)
I know, I know what you'll say: inline style="" is an abomination. Should we bring <font> back? What about the cascade? Is this transformation needed on every page view with dynamic content, how's that scalable? What if there's a lot of content with the same class, lot of duplicates?
I know, I know.
But, but... look at the results. 25% reduction of the HTML payload!
With web development moving more and more toward transformations and compilation (css preprocessors, coffee script, monification, etc) it may not be unthinkable.
Back to Earth
On more realistic note, just reduce the CSS to under 2K or thereabouts, inline it in the head, send it with the the first server flush (even before any data fetching) and you'll be in a good place already!
There are two concepts to remember when working on your YSlow extensions and customizations:
rules (or "recommendations" if you will, or "best practices" or simply "lint checks"), and
rulesets which are lists of rules
An example rule is "Reduce HTTP requests". An example ruleset is "Small site or blog" (which is less strict than the default ruleset, because it assumes a small site has no CDN budget for example)
YSlow has a number of rules defined. How many? Easy to check once you have your setup from the last blog post. Open the console and go:
The weights define what is the relative importance of each rule in the final score. And the rules contain rule-name => rule-config pairs. Because each rule is configurable. For an example configuration consider the "Thou shalt use CDN" rule. The patterns that match CDN hostnames are configurable. So is the number of points subtracted from the score for each violation.
(I can talk more about scores, but it's not all that important. The thinking was that people might be offended by and disagree with the scores. So we should let them customize the scoring algo)
Alrighty, enough talking, let's create one new custom ruleset.
New ruleset from the UI
Click "Edit" next to the rulesets dropdown. A list of rules appear each with a helpful hint on mouseover and a friendly checkbox for your checking pleasure
Click "New Set" to clear all default checks
Check the most "duh!" rules, those that require no effort and are just sanity
Click "Save ruleset as..."
Type a name, like "Duh", save
Congratulations! You have a new ruleset.
If that wasn't the bookmarklet version, YSlow would remember this new ruleset. But YSlow doesn't (yet) remember settings in bookmarklet version. (Try another YSlow run in a different tab if you don't believe it).
But you can still save your ruleset, and even share it with others in your team.
Coded ruleset
This above was all-UI way of creating the ruleset. Behind the UI there's a simple JS object (that can be serialized to JSON for future use) that defines the ruleset as explained above.
Now just take this JSON string, paste into your mystuff/stuff.js (from the previous post), clean it up a little and add a call to the YSlow API to register this new rule.
$ make bookmarklet config="config-phpied.js"; \
scp build/bookmarklet/* \
username@perfplanet.com:~/phpied.com/files/yslow
So we have our own rule and we can run it and it can spit out reports.
(Note: Small correction from the previous post: in the Makefile your mystuff.js should go before the bookmarklet controller, which is responsible for the initialization. Because you want your registerRuleset() call to run before the initialization)
(Another Note: Disable Chrome's cache if you're testing with Chrome, because it's pretty aggressive in this bookmarklet scenario)
If we decide to tweak the scores and weights a little bit (take out 50 out 100 points for a single non-gzipped component and increase the rule's relative weight), we can do:
As promised, let's setup for YSlow development using the easiest option - the bookmarklet version. The journey of conquering the world with your rules and extensions... starts with the first step.
Checkout
First you need to get teh codez. Go to the Github repository and click that big ol' Fork button. Then checkout the repository somewhere.
Alternatively, if you don't have a github account and don't care to install and deal with git on your computer, this is OK. Just download the repository as a 1.1MB zip file from:
For this next step you need make. Good luck if you're on Windows. On Mac, seems like the most "blessed" version you can by installing this package called Command Line Tools for Xcode. Which (I'm not sure but probably) also requires Xcode. Xcode in the App Store. It's about 1.5GB. You go, I'll wait.
In the top directory of the code you've downloaded, there's a readme and /src (where it gets interesting) and a Makefile.
Since we're building the bookmarklet we'll go like:
$ make bookmarklet
But. Not yet. First things first.
The bookmarklet consists of one largish JS file and one smallish CSS. The bookmark that you'll click in the browser will load the JS file. Then the JS needs to know where to find the CSS. So you need a big of config.
If you look in /src/bookmarklet you'll see some config-*.js files. You need a new one for you too.
If you've already installed the YSlow bookmarklet in your browser, you can go and edit the location of the JS file. If not, visit http://yslow.org/mobile for the instructions.
All you need to do is change yslow.org to your location, e.g. phpied.com/files/yslow.
Then bookmark the page.
Then edit the bookmark and remove everything up to and including the hash # (http://yslow.org/mobile/#)
Run
Go to a page of your choosing
Click the bookmarklet
See YSlow UI appears
It works so well that you even need to look at a network analyzer to believe it's really using your own hosted version.
And your own version is just a big javascript really, so there's nothing new and nothing extension-y to learn like XUL or manifest.json. You can just start tinkering immediately. You can even edit that .js file directly and make it like a real tight web programming loop: save-refresh. Or you can edit source files and rebuild, repush combining the make and scp commands. Let's do that.
Console: the best friend
YSlow takes extra care to run unobtrusively to the page. In an iframe, not leaving any globals behind. Meh, I want to play in the console. So I want to access the two globals: YUI and YSLOW. Let's see how you add your codes to YSlow. That's as good an exercise as any.
Create a new file in a new dir like: mystuff/stuff.js with this in it:
Since version 2.0, YSlow is no longer just a tool, it's a platform. You can create your own rules (performance or otherwise), combine them into rulesets, tweak scores to your liking and so on.
Once Marcel took over and did version 3.0. YSlow can now run in many many environments: as a Firebug extension (like version 1.0 did), as a Firefox extension, Chrome extension, command-line and so on... including running as a bookmarklet in any browser (including mobile browsers). Funny aside is that YSlow version 0.XYZ was originally just a bookmarklet. Now it's a bookmarklet among everything else.
Now, setting up for browser extension development can be intimidating if you've never done it. But worry not, I want to show you how you can create YSlow extensions and customizations knowing nothing but JavaScript.
We'll be using the bookmarklet version for development.
What's even lovelier is that YSlow is open source now on Github.
Stay tuned
I wish I could tell you more, but it's father's day and the backyard BBQ party (including a rare live appearance from Anaconda Limousine) starts in an hour. And something tells me I won't feel very bloggy after the party. So YSlow would have to wait.
So I was flipping through recent slides from Steve Souders and came across a reference to a nice post from Pat Meenan explaining how he setup blackhole.webpagetest.org and how you can edit your hosts file to send third party scripts to the black hole simulating a firewall-blocked or down third party and the effect on your site. (whew, long sentence)
I thought to would be nice to make that easier and have people see (and demonstrate to bosses and clients) how damaging frontend SPOF (Single Point Of Failure) can be. A browser extension maybe. A Chrome extension, because I've never made one. The idea marinated undisturbed for a few days and last night all of a sudden I got to work.
Well, yeah. What happens to your site when a 3rd party goes down? Does it still work?
Is it true that your site is only down when it's down? Or it's down when:
It's down or
Facbeook is down or
Google is down or
Twitter is blocked in your office or
code.jquery.com is down ...and so on and so on.
This extension helps you check what happens with a click of button.
What 3PO#fail does
Very simple: it's looking for scripts from a list of suspects (api.google.com, platform.twitter.com, etc) and redirects them to blackhole.webpagetest.com
Install the extension. Load your page. Or mashable.com for example. Then this happens:
It's a button with # on it. Click it. It turns red.
The extension now listens to script requests made to one of the suspect domains.
Now shift-reload the page. If a 3rd party script is found, it's redirected to the black hole and then a counter appears.
Observe whether or not the page is usable when a third party is down. Enjoy and demo to your boss. Tell them: sites do go down, companies ban social networking sites, and btw what do you think will happen when you visit China and load our site?
If you're looking for a page to try, go to mashable or business insider or, ironically, test the extension's page in Chrome web store. Turns out they include Google+'s button synchronously.
Dupe
Here come the LOLz. I blasted this extension out to Steve Souders and back he came with: doh, Pat Meenan also did a Chrome extension to do just this.
Bwahaha. What? You snooze, you miss a whole new tool by Pat Meenan himself.
Here's Pat's extension: SPOF-O-Matic. Try it, use it. It looks more thought out than mine definitely. And there's more code. Maybe Pat spend more time than a night on it. Or maybe he didn't, he's an amazing hacker and half! I mean, uh, webpagetest, hello!
I'll definitely "borrow" his list of 3rd parties which has more entries than mine.
Oh well, you live, you learn (to write Chrome extensions)
Chrome extensions
Creating a Chrome extension was a first for me and was mostly frictionless. Well documented, plenty of samples (try to browse the samples in the repository, because downloading ZIP files is too many clicks). Debugging the extension in the same web inspector is a big plus! Overall I think it's easier to write a Chrome extension than a FF one. Although the last I checked, FF has improved a lot.
Now for the nitpicks.
The API is sometimes irritating. I mean things like
setTitle({title: "My title"});
or
setBadgeText({text: "My text"});
Doplicating title, title, title is annoying. Sometimes it's title, sometimes text, or path or name. Method name appears short but in fact you have to remember one more thing - a property name in a config object. Sounds more like setTitleWithTitle(title) which is just as ridiculous (and popular in Obj-C it seems). Anyway.
The web store asks you for 5 bucks to register and submit an extension. Credit card and all. I didn't like that.
My extension was held for a review which doesn't always happen. The help section says 2-3 business days, but it turned out to be only hours for me. Got a nice email saying the extension is approved and also an explanation why it was held for review. Nice touch.
Code
The code is here: https://github.com/stoyan/3PO-fail. There's not a lot of it. A manifest file and an script that listens to specific URLs and request types in a onBeforeRequest event.
Stripping away UI stuff here's all there is to it.
There's no logic here because the API allows you to let the browser do request inspection and filtering for you. Here all you do is return an object with a redirectUrl property.
You specify your callback to be invoked only for script requests and only those that match a URL in the url array (see above)
The end to the SPOF
All you have to do is load third party scripts synchronously. See here the BFF function for an example. Yet, so many sites are not doing it. There's a need for people to understand this problem. Let's call it demand for advocacy. And now there's supply of 2 brand new tools that make it in-you-face obvious what the damaging effects are.
Random
I went over some of the pages that Steve has listed in his calendar blog post: Business Insider and O'Reilly. O'Reilly is better now and it uses my BFF script (nice, 'scuse me there's something in my eye). Business Insider is almost there. The social stuff is async now, but code.jquery.com is still a SPOF. Funny enough they have a blocking script tag pointing to twitter, but it has a class "post-load". So a script kicks in before this tag and replaces it with async loading. I wonder: why the trouble and not just go async to begin with?
In the spirit of the true high-performance non-blocking asynchronous delivery, we'll have the Web Performance Daybook volume 2 published before volume 1. I hope you'll enjoy reading the book as much as I enjoyed working on it and rubbing (virtual) shoulders with some of the brightest people in our industry.
Back in December of 2009 I wanted to give an overview of the web performance optimization (WPO) discipline. I decided on a self-imposed deadline of an-article-a-day from December 1st to 24th: the format of an advent calendar similar to 24ways.org. As it turned out, 24 articles in a row was quite a challenge and so I was happy and grateful to accept the offers for help from a few friends from the industry: Christian Heilmann (Mozilla), Eric Goldsmith (AOL) and two posts from Ara Pehlivanian (Yahoo!).
The articles were warmly accepted by the community and then on the following year, in December 2010, the calendar was already something people were looking forward to reading. The calendar also got a new home at http://calendar.perfplanet.com as a subdomain of the “Planet Performance” feed aggregator. And this time around also more people were willing to help. Developers of all around our industry were willing to contribute their time, to share and spread their knowledge, announce new tools, and this way create a much better set of 24 articles than a single person could. This is what soon will become volume 1 of the series of Daybooks.
Then came December 2011 and we had so much good content and enthusiasm that we kept going past December 24, all the way to December 31st and even publishing 2 articles on the last day. This is the content that you have in your hands in a book format as Web Performance Daybook vol.2.
Our WPO community is young, small, but growing, and in need of nourishment in the form of community building events such as the advent calendar. That's why it was exciting to have the opportunity to collaborate on this title with O'Reilly and all 32 authors. I'm really happy with the result and I know that both volumes will serve as a reference and introduction to performance tools, research, techniques and approaches for years to come. There's always the risk with outdated content in offline technical publications but I see references to the calendar articles in the latest conferences today all the time, so I'm confident this knowledge is to remain fresh for quite a while and some of it is even destined to become timeless.
Enjoy the book, prepare to learn from the brightest in the industry and, most of all, be ready to make the Web a better place for all of us!
(So far expected publication date is Velocity US, end this month, fingers crossed!)
Oh, and authors' royalties are donated to a charity and a WPO charity at that. (Stay tuned for the announcement.) So feel free to get a copy for everyone in your org, it's for a good cause.
Back when I was still actively into speaking at public events (way, way back, something like year and a half ago (which strangely roughly coincides with the time I joined Facebook, hmmm (hmm? (huh? what's with the parentheses? sure all of them are closed at this point?)))) I remember showing this slide:
Scott observed that (with one happy Opera nonsense exception) all browsers will load all this junk, all this CSS that they don't need.
(BTW, Opera 11.64 loaded nonsense css for me too)
Blocking rendering?
Having recently remembered how browsers block rendering because of print stylesheets, I speculated that all the nonsense media will also block rendering. Unfortunately I was right.
So not only browsers download useless bytes, but they also block the rendering of the page (or block window.onload, or both) until all the crap is downloaded. And by blocked rendering I mean showing a white page of death. Most browsers wait until all CSS is loaded because they don't like doing extra layouts and painting (except Opera).
The correct browser behavior should be:
1. load only the CSS you need
2. render
3. fire onload
Maybe even:
0. render if step 1. takes too long
Instead, randomness ensues: Firefox treats us to a white page for 10 seconds while downloading nonsense. Chrome takes 15 seconds to fire onload. (see the print CSS post for more)
Browsers download CSS they don't need, e.g. print, tv, device-ratio... And most browsers (except Opera and Webkit) block rendering because of these too
Sometimes CSS blocks the other downloads too (not just block rendering, but block images and scripts that follow):
When building high-performance pages we want to stay off the critical path. Critical is the path from the user following a link to the first impression and then the working experience. That's why we load javascript asynchronously and so on.
But I argue that CSS is not only on the critical path, it is the critical path. And because it's a jungle (network, 3g, edge) out there, anything on the critical path will fail. Guaranteed.
Think about this: you have an HTML page and then you have components. Without the HTML, there is no path really. Game over. Without images? Depends on the page, but you can live without images most of the time. Without JavaScript? Well you should build the pages so the important stuff, links, forms, content works without javascript. Without webfonts? You're kidding me, I don't need no stinkin' fonts when I'm late and running to the airport and checking in for the flight on the damn phone with the spotty mobile network while Wifi wants to connect and I have to say no, because if I say yes I'll wait for another page where I have to say "I accept" and aim at a miniscule checkbox with these sweaty fat fingers or worse I have to enter usernameandpassword, and omg-omg-OMG mobile.southwest.com wants to look like native iPhone and won't let me click until mountains of JS arrive, so no, don't talk to me about no damn fonts!
What's left on the critical path is CSS. Not only the page is ugly without CSS, we can live with that, but there is no page without CSS because the browser waits and waits and takes forever to timeout showing us a blank white page.
Get the CSS out of the way
So if you worry about performance, you should get the CSS out of the way as soon as possible. Get off the critical path. Make CSS small, minify, compress, load from the same hostname even (no DNS) and inline, if small enough. Yup, inline.
Take a look at these highly optimized experiences...
Look ma, no CSS!
Yes, these pages make no CSS requests whatsoever.
If your CSS is not puny enough to be all inline (Guy has some observations on what puny means) it should at least be a single file, way at the top of the document, with the first flush. Just get it over with. Your users will love you and praise you and use words like smooth and snappy.
NOTE: This is late night mumbling from a week or two ago, leaving here for posterity. Don't read it. Meanwhile Steve wrote up a proper blog post which is highly recommended: Self-updating scripts. Read his post instead!
tl;dr: Load the component in an iframe, then reload()
Backstory
Steve Souders and I were chatting last week about his blog post and he was expressing his disappointment with the status quo of the short expiration time of those omnipresent third party scripts like the Facebook SDK (which loads Like button among others). We need short expiration on those because we need to be able to push quick fixes to critical bugs. And we cannot ask webmasters to keep up with versioned file names.
So the question is how do you set far-future Expires header of a static URL (like http://connect.facebook.net/en_US/all.js) and at the same time be able to update the content of this file without changing its filename.
Ahaa!
Today I had this sudden moment of clarity that we can use location.reload() of an iframe that contains the resource in question.
So I can load the component in an iframe and then reload() the iframe. Cross-origin restrictions kick in when the static resource is on a CDN domain. This can be solved by loading an HTML page in an iframe and reloading it. All resources referred to by this page will be revalidated with a conditional GET.
The versioncheck.html does nothing but include the static resource. Here it is.
Results
Safari, IE9, FF, Chrome are all consistent - when you reload() the versioncheck.html, conditional GETs go out to the HTML and all the resources in it. Then the server can reply 304 if the static resource hasn't changed. Or send a new file if the contents has changed.
Not entirely trusting the browser tools, also validated the 200 and 304 responses tailing the access log file on the server.
Conclusion
Plus: you can update the content of static URL with far-future Expires fairly easily.
Minus:
1. Additional 304s. Two if CDN domain is different than the page, 1 otherwise.
2. The users sees the updated resource on the next page view, sorta like App Cache
So for those omnipresent third party JavaScripts... They currently have short expiration times - between 20 mins and 2 hours. In that time if the user needs the script again, no requests go out.
But if you use the technique outlined, you can set their Expires header to 10 years in the future and for every single page view make two more conditional requests with likely a 304 Not Modified response. These extra requests, of course will not interfere with the user experience, because they will be lazy.
So you get faster user experience with more HTTP requests and delayed-till-next-page-view update.
In the experiment I have screen.css delayed with 5 seconds and print.css delayed 10 seconds.
Results 5 years ago
Browsers blocked rendering waiting for print.css. Some took 10 seconds (downloading print.css and screen.css in parallel), some took 15. Why, oh why? It's a print CSS, you don't need this sh...eet.
Results today
Good guy Opera, doesn't even wait for screen.css. After some timeout, O renders unstyled page and restyles after screen.css arrives. Yes, brave O takes rendering risks this way, all others wait for the screeen.css before styling anything. Still, onload fires in ~10s, so this is bad. All your onload JS code is blocked on a useless print.css
FF blocks rendering on the print.css. Boo! Nothing renders for 10 seconds. And it fires onload after ~10s. Boo-boo! At least it loads both stylesheets in parallel. Galaxy (Android) waits for print.css too. How often do you print from a mobile device? Same in IE8 and IE9. Even more retarded in IE is DOMContentLoaded event also waiting for 10 seconds. speech=less.
Safari, Chrome, Mobile Safari - render after 5 seconds, meaning only after screen.css. There is hope for the humanity. However the onload fires in 15 seconds. So the two CSS files are downloaded sequentially. Kinda makes sense, print.css is low priority and should give way to everything else. Still could start earlier if there are no other downloads competing for precious resources.
So on the wall of shame - IE worst, FF yuck, Webkit bad, O least bad.
Recommendation
Ditch media="print" if you have one! (Hey why isn't this a yslow/pagespeed rule?). Ditch it because in the best case scenario it will only block onload. In the worst case it will block initial paint, onload and DOMContentLoaded. Sitting in front of a white page with no feedback is the worst possible user experience.
Put all (should be minimal anyway) print rules inline in your normal screen stylesheet.
The web performance and operations conference, Velocity Europe, is just around the corner. This always sold-out event is making its EU debut this year in Berlin.
Get your ticket now (with 20% discount no less) or punish your users and clients with slow user experiences
TL;DR: Loading JavaScript asynchronously is critical for the performance of your web app. Below is an idea how to do it for the most common social buttons out there so you can make sure these don't interfere with the loading of the rest of your content. After all people need to see your content first, then decide if it's share-worthy.
Facebook now offers a new asynchronous snippet to load the JavaScript SDK, which lets you load social plugins (e.g. Like button) among doing other more powerful things.
It has always been possible to load the JS SDK asynchronously but since recently it's the default. The code looks pretty damn nice (I know, right!), here's how it looks like (taken from here):
immediate (self-invoking) function so not to bleed vars into global namespace
pass oft-used objects (document) and strings ("script", "facebook-jssdk") to the immediate function. Sort of rudimentary manual minification, while keeping the code readable
append script node by using the first available script element. That's 99.99% guaranteed to work unless all your code is in body onload="..." or img onload or something similar (insanity, I know, but let's allow generous 0.01% for it)
assign an ID to the node you append so you don't append it twice by mistake (e.g. like button in the header, footer and article)
All buttons' JS files
Other buttons exist, most notably the Twitter and Google+1 buttons. Both of these can be loaded with async JavaScript whether or not this is the default in their respective configurators.
So why not make them all get along and shelter them under the same facebook immediate function? We'll save some bytes and extra script tags in the HTML. For G+/T buttons all we need is a new script node. Google+'s snippet has some additional attribs such as type and async, but these are not really needed. Because type is always text/javascript and async is always true. Plus we kinda take care of the async part anyways.
Next is actually advising the scripts where the widgets should be rendered. Facebook offers XFBML syntax, with tags such as <fb:like>, but it also offers pure HTML(5) with data-* attributes. Luckily, so do all others.
Here's an example:
<!-- facebook like --><divclass="fb-like"data-send="false"data-width="280"></div><!-- twitter --><aclass="twitter-share-button"data-count="horizontal">Tweet</a><!-- g+ --><divclass="g-plusone"data-size="medium"></div>
G+ requires a div element (with g-plusone class name), Twitter requires an a (with a twitter-share-button class name). Facebook will take any element you like with a fb-like class name (or fb-comments or fb-recommendations or any other social plugin you may need)
Also very important to note that you can (and should) load the JS files once and then render as many different buttons as you need. In Facebook's case these can be any type of plugin, not just like buttons. Economy of scale - on JS file, many plugins.
All together now
So here's the overall strategy for loading all those buttons.
Copy the JS above at the bottom of the page right before /body just to be safe (G+ failed to load when the markup is after the JS). This will also help you make sure there should be only one place to load the JS files, although the snippet takes cares of dedupe-ing.
sprinkle plugins and buttons any way you like anywhere on your pages using the appropriate configurator to help you deal with the data-* attributes (FB, G+, Tw)
Enjoy all the social traffic you deserve!
To see it all in action - go to my abandoned phonydev.com blog. Yep, those buttons play nice in mobile too.
#1 This guest post from Billy Hoffman is the last post in the Velocity countdown series. Velocity starts first thing tomorrow! Hope you enjoyed the ride and please welcome Billy Hoffman!
Billy Hoffman (@zoompf) is the founder and CEO of Zoompf, a web performance startup whose scanning technology helps website owners find and fix performance issues which are slowing down their sites. Previously Billy was a web security researcher at SPI Dynamics and managed a research team at HP. He can open a Coke can without using his hands.
(tl;dr: Images make up the majority of the Internet, yet we consistently fail to apply the most basic of optimizations. Even big sites like Twitter are completely screwing this up. Furthermore, there are huge unexplored areas when it comes to image optimization which would provide significant savings. We should stop worrying about esoteric performance optimizations when there is so much other low hanging fruit.)
Images constitute the bulk of content on the Internet, both in terms of content size and number of resources. Using data from the wonderful HTTP Archive we see that 60% of the bytes that make up an Alexa Top 1000 website are images. The average webpage references 81 external resources, and 64% of these are images. And the dominance of images is growing. In the last 6 months, total page content size increased by 70 kB. 75% of that increase (52 kB) came from images.
We know that lossless image optimization tools reduce content size anywhere from 5-20%. Occasionally you will see 70% savings or more, but that only happens when the image contains an embedded thumbnail. That level of savings doesn't sound all that impressive. After all HTTP compression can save 60-70% on real world text resources like HTML, JavaScript, or CSS. However text resources only make up on average 188 kB, or 24% of total content size. Saving 66% on 24% of content saves about as much as 5-20% savings on 60% of the content. In fact, if you could reduce images by 25%, that would have more of an effect on reducing total content size than using HTTP compression!
If you work in front-end performance, none of this should be a surprise. Obviously any front-end performance strategy needs to include image optimizations. Image optimization is an old topic. Shouldn't we instead be focusing on more esoteric optimizations, like refactoring CSS rules so that external fonts render faster on Blackberry Webkit? No, we shouldn't, because sadly we collectively suck at optimizing images.
Give PNG a Chance? Nope.
One of the most basic image optimizations that you can make is converting GIFs to PNGs. PNGs can do everything that GIFs can do and more, and the browser issues with PNGs are larger a problem of the past. Even without applying additional lossless optimization tools on the PNG, converting a GIF file will almost without exception create a smaller PNG. This is because the fundamental way graphics data is compressed in a file PNG, using the DEFLATE algorithm, is more efficient than GIF's LZW compression scheme. Once you apply lossless tools on the converted PNG they get even smaller. Animated GIFs are the exception here, as PNGs are not animated and alternatives for simple animations (MNG, Flash) are either not widely supported or result in larger files. So what is the break down of image formats on the web today?
37% of images on the Alexa Top 1000 websites are GIFs. That makes no sense given what we know about PNGs over GIFs. 37% of the images on the Internet are not animated "Under construction" icons or Ajax status thumper animations. People are not being intelligent about file formats they use for images.
The Internet, now with more bloat!
Applying lossless image optimization tools is one of the simplest optimizations to do. Take an image, run a program, get an optimized image. Stoyan and I love optimizations like this because they are so easy to automate. Just add a step to the website build process or to your staging-to-production publishing process that automatically optimizes images. It should be transparent, something you setup once and forget about. So how are we doing?
82% of Alexa Top 1000 websites contain images which were not losslessly optimized. Apply lossless optimizations across all the images from the Alexa top 1000 would reduce file size by an additional 15%.
Surely there are just a few number of smaller sites which aren't properly optimizing images which are pulling down the statistics right? Sadly no. Twitter, the ninth largest website in the world by traffic, doesn't losslessly optimize any of their images. 33% of total page load bytes could be eliminated solely by applying lossless image optimization. Let me phrase that a different way: 1 byte of our every 3 bytes Twitter sends you is unnecessary! This is an incredible waste.
Unplowed Fields
It's clear we are not applying the image optimizations we already know about. However there is much more work to be done with images. This is a largly unresearched or unadvocated area which needs more attention.
Consider choosing the correct image format. Are people saving images as a PNG when they should be saved as a JPEG? Indeed they are. Tumblr's background image is a 76 kB PNG image and it would be 33 kB (55% smaller) if it was a JPEG. This is better than their old 827 kB PNG background image, which would be 47 kB (94% smaller) if it was a JPEG. Unfortunately I know of no other tool besides Zoompf's free performance scan which identifies PNG candidate images for conversion to JPEG.
What about JPEGs saved with a high quality setting? This is a large enough topic for its own blog post. To quickly summarize, JPEG "quality" is an arbitrary, non-linear scale, quality is not a percentage of anything, and "quality of 80" does not mean "discard 20% of graphics data." Thought leaders like Adobe recommend a quality setting of 70-80 for JPEGs published on the web. Zoompf found that 36% of Alexa Top 1000 images have a quality setting over 80, and reducing them to quality 70 would on average reduce image size by 48%! While all of these images might not be able to be reduced in quality, surely some of them can. Again, this is an area that needs more attention, more best practices and guidance, and more tools to help validate.
Not "Instead of" but "In addition too"
I am not saying other performance optimizations are not important. Zoompf checks for over 380 performance issues and we are adding more all the time. Many of them are esoteric and low impact. We flag things like duplicate cookies, unnecessary HTTP headers, and even when your <META> contains duplicate keywords. However these checks are for when you have handled all the other important checks. Image optimizations, and research into new image optimization techniques should be not done instead of other work, but in addition to it. Just remeber to prioritize what you are working on so that it will affect the most number of people in the largest possible way.
Conclusions
Images are a huge component of the web and modern web performance. This importance is only growing. Sadly, there are only one or two widely recognized image optimizations techniques. Unfortunately, these most basic optimizations are ignored, forgotten, and not uniformly applied by even the largest of websites today. Additionally, there are a lot of unexplored areas of image optimization, including lossy image optimization, with no clear recommendations or best practices and virtually no tool support. Some areas for further research include:
Lossy image optimizations
Comparison of JPEG encoders
PNG-to-JPEG and GIF-to-JPEG best practices, recommendations, and processes
Image quality for Desktop vs. mobile browsing experiences
I will be discussing many of these topics this week during my presentation Take it all Off! Lossy Image Optimization at Velocity 2011 on Wednesday. I hope you all can make it.
With only 2 days to Velocity, it's time to drop in the quality of these posts (but the one tomorrow will be great, I promise) with today's announcement of the immediate availability of the project called http://sultansofspeed.com.
I think we've had enough of experts, gurus, ninjas, jedis, pirates and overloards. Time for the sultans to step in!
So there: a slideshow of bios and photos of a number of Web Performance Sultans.
The background music is my heavy metal cover (sorry!) of "Sultans of Swing" by Dire Straits.
The Sultans you see there are the people who have written for the Perfplanet Calendar. But this is just the initial seed. (And because these are the bios/photos I have easy access to.)
Are you a sultan? Add/delete/edit your bio in the Github repo in the sultans.js file.
Without further ado, please point your browser to the newborn bookofspeed.com.
It's a free (public domain), online, open-source, not yet finished, book about web performance.
Contributions welcome
The source files are on Github - https://github.com/stoyan/Book-of-Speed. I'll be glad to receive any errata, technical mistakes, requests, grammar checks, anything really. Just edit the stuff in /src and send a patch. /src is the text for the chapters alone, then what you see on the site and in the main directory - TOC and chapters - are generated by a build script (of course a javascript).
How did we end up here
Year and half ago I did this Performance advent calendar experiment (since moved to a new home), writing an article a day for 24 days (sounds vaguely familiar?). PeachPit press approached me about publishing a book based on those. PeachPit publishes mostly web design books (like Designing with Web Standards) and I thought designers should know about performance. Also business folks, product managers. So why not write something more accessible and less technical?
Fast forward... I kept missing deadlines (a favorite thing, ask Douglas Adams) until eventually after 5 and a half chapters out of 9, the publisher decided to cancel the project. Fair enough. Wasn't meant to be. We're grown ups, no hard feelings. (Well, I did try to save the project by suggesting Marcel Duran who now works on YSlow to finish it, to which PeachPit expressed interest initially but then didn't bother to follow up with a comment or explanation)
So instead of letting PeachPit keep the content and maybe publish it on their site, I decided to keep the chapters and return them the money for the royalty advance they have given me. After all, I did wanted to try self-publishing for some time .
Fast forward again... I didn't do anything further. Changing computers, failing disks and non-existing backups convinced me I should let this content free sooner. "Information wants to be free". So I managed to restore from emails (but not the images, had to copy images from Word) and thought the Velocity countdown is a good excuse to release this thing.
I mentioned to my good friend and designer Yavor about the project two days ago, he had a few free cycles and sent me a mock. Awesome! the only "brief" I gave him was "it's to be a free online book, like diveintohtml5.com and eloquentjavascript.net". And here's what he came up with, how cool is that! (oh and I gave him a turtle drawing, see below)
(As you can see, he's so humble he doesn't want any credit on the site. But this is my blog and I can give credit as much as want now, can't I? )
So last night between writing last night's post and today, I turned this mock into HTML (not fully complete, missing ego-header and pagination) and converted the 5 chapters I have so far from word docs to HTML.
Audience
If you follow my blog there isn't much new for you. Like I said, the audience was to be less technical. But there are a few new never-before seen bits and pieces.
Assuming the html-writing part of the PeachPit audience will be still very attached to XHTML, I decided to do what I generally tend to avoid - closing tags, using type="text/javascript" etc. Further edits should convert these to more compact html5-allowed syntax.
In the markup for the site though, in the rush to convert everything I started not closing P and LI to save time Feel free to send a patch.
No credits
I was planning on having one round of credit-giving either as footnotes or appendix once the book is done. But the books is not done, so forgive me if I havent given you credit where it was due.
No links
It's silly to have no links in an online publication, but given the rush, I didn't edit the content at all to add them. Again I was planning on appendix, or actually a companion site. Will do. Will accept a patch
On editing
My editor from PeachPit sent me notes and edits. These are not in the online edition. Partly because I don't think it's fair (what's in it for them?) and partly because, trivially, I didn't have the time.
On reviewing
I got technical reviews from Marcel Duran and Sergey Chikuyonok while working on the book. I haven't incorporated their feedback. Will do (Sergey said my chapter on image optimization was too basic It is, especially compared with his articles on smashing magazine and his blog )
But Annie Sullivan from Google went way above and beyond any review I have seen. She actually read the chapter with her husband (not technical) and explained to him what's going on. So I had very eye-opening observations and I'm grateful and indebted for this.
(As you guessed, the feedback is not yet reflected in the text)
PageSpeed
PageSpeed runs on Dreamhost where the site is. So I though I should check the "use pagespeed" check in DH's panel. Not bad, not bad at all. Having your images and other stuff taken care of for you automagically. I have 99/100 Page Speed score and 94/100 YSlow.
I do minify CSS myself though and inline it, because it's small
Turtle
I couldn't use the turtle (nor the title) from Speed Matters. But my kid drew a turtle in drawing class so I thought I should use it. Here's what it looked like before the my designer friend took over:
Happy reading!
Like I metioned, regulars on this blog won't find much new information, but feel free to send your junior team members to learn from the free source.
And don't forget to send patches - book editing via GitHub sounds pretty nice to me.
I'm working on tomorrow's kind of big thing, so will take it easy today, with a stroll down memory lane.
I was clearing up my space at home few days ago and came across this oldish notepad. In there (among the usual amount of lists of todos and ideas in the spirit of i-wanna-do-this-tool/site/experiment!) I found these early sketches of what has since become YSlow 2.0. These are all still pretty relevant, so why not take a minute to review them and get acquainted with the YSlow internals.
Back at the time Steve Souders and I had just released YSlow 0.9 and Steve had moved to Google. It was the right time to have a quick bugfix-or-two release of YSlow 1.0 and in parallel get cranking on a complete YSlow 2.0 rewrite.
The motivation behind the total rewrite was (aside from the usual "I didn't make this mess" ego-driven desire to start fresh and do a better job the second time around) was that we were getting a lot of "meh, these are Yahoo's problems/rules, not yours". For example a normal mere mortal blog with no CDN budget should still try to get an A in most other checks. Another, somewhat forward thinking, as opposed to reactive reason was that I was a big fan of letting others contribute rules and checks of their own. The idea was to make YSlow your own tool, not only Yahoo's. For example if you want to set a rule that there should be no more than 5 images on a page, you should be able to codify this into a rule. And share the rule with the rest of team or the world. (Here's an example). Another thing was also to decouple the tool from Firebug. Make it work without Firebug and even without Firefox. Go back to having a bookmarklet version (Steve's original very first version) and versions for other browsers. (Thanks to Marcel Duran this is also becoming a reality now)
So the new architecture (a big name for a bunch of objects) was conceived on these sketches while en route a red-eye flight to Bulgaria. My little kids were asleep taking over my seat as well, so here I was standing up in the aisle on the plane or sitting on the seat's handrail, scribbling these notes.
The main idea was divide and conquer. Split this monolithic piece of code into smaller components.
When you run YSlow, it starts by "peeling off" the page, extracting all possible information. Hence the Peeler singleton.
Peeler has methods such as getJavaScript() and getDocuments() (as in document + any frames). This can work most anywhere (bookmarklet too). Then if YSlow is running inside Firebug and has access to Net Panel (or any other browser or environment that lets you access stuff happening on the network, not only DOM crawling), it can also find things such as XHR requests or image beacons, which are not part of the DOM, using a NetMonitor listener object of some sorts.
Whatever Peeler finds, it sticks into a ComponentSet which is just an array of components along with some convenience methods such as getComponentsByType('css').
Moving on, the ComponentSet contains Component objects which have all the data, like headers, type, content, URL, the whole thing.
K, now we have a bunch of components waiting and willing to be inspected. To make this inspection as lego-like as possible, there's no big-ass inspector, but there are many little Rule objects. Each Rule object has a bunch of properties like name, URL with more info, etc, but the main thing is - it needs to implement a lint() method. The lint() method takes a reference to the ComponentSet and then returns a Result object.
The Result objects are fairly simple - they have a grade/score, message and optionally a list of offending components (e.g. images without Expires header). A bunch of result objects make a ResultSet which has methods to get the final total score.
A bunch of Rule objects go into a RuleSet. The idea is to mash those up as you wish. So a Rule object is for example "Use CDN". (it's also configurable, e.g. how many score points to take away for each offender). Also within a RuleSet you can define what is the relative weight of each Rule. E.g. is F on "Expires" rule as bad as F on "CSS expressions". You can create your own RuleSets (e.g. "Small blog") including an configuring any of the existing rules you like and also add more custom Rules. It's one big happy pool of Rules to pick from and configure. In fact YSlow 2.0 shipped with three rulesets - the new one with more rules, the old yslow1 and a "small site or blog"
At the end there is one central lint() method which takes a RuleSet, loops over the Rules in it, calls each Rule's lint() and collects the results into a ResultSet.
From there it's a question of rendering the ResultSet, grades, offenders, etc. Additionally there are tools that can run on the ComponentSet (e.g. JSLint) and stats. In addition to the YSlow UI, you should be able to render these results in any way you like, including exporting a JSON or whathaveyou.
Whew!
I may have missed some details but that's about all there is to the core of YSlow 2.0
Here's also a presentation that talks about these things and offers some diagrams that hopefully clarify even further
That was it for today, only 4 days to go to Velocity. Hope you learned something you can use and you're ready to start coding your own rules and create rulesets to customize what YSlow can do for you.