When you care about performance, or SEO (or just doing a good job as web dev) an interesting data point is the ratio of page content vs. the markup used to present this content. Or... how much crap we put in HTML in order to present what the users want to see - the content.
So I played tonight with a bookmarklet to provide this piece of stats.
Install
Right-click, add to favourites/bookmarks. Or simply click to see the ratio of this page.
How it works
Since the scripts on the page may modify the content and markup, the bookmarklet makes an Ajax request to get a fresh copy of the page from the server. Then it runs a few regular expressions ("borrowed" from prototype.js) to strip all tags and the scripts/styles content. The first metric it provides is the size of the stripped content divided by the size of the original markup.
Then the bookmarklet tries to be more fair and count the alt
, title
and value
attributes as content, including the size of attribute names themselves. And this is the second, "fair", metric. The content attributes are inspected using DOM methods, not regexp, so they can be affected by any javascript that has modified the page. Oh well, life's not fair.
Code
The bookmarklet code is served from here. The code is also on github.
Results
Here are some random results of running the bookmarklet on different sites.
http://www.cnn.com:
Total size: 92004 bytes
Content size: 11475 bytes
Content-to-markup ratio: 0.12
Fair ratio * : 0.16
http://www.sitepoint.com
Total size: 65989 bytes
Content size: 16199 bytes
Content-to-markup ratio: 0.25
Fair ratio * : 0.60
Article on http://en.wikipedia.org:
Total size: 21648 bytes
Content size: 3315 bytes
Content-to-markup ratio: 0.15
Fair ratio * : 0.35
https://www.phpied.com
Total size: 31899 bytes
Content size: 7933 bytes
Content-to-markup ratio: 0.25
Fair ratio * : 0.48
http://www.google.com SERP
Total size: 29963 bytes
Content size: 3351 bytes
Content-to-markup ratio: 0.11
Fair ratio * : 0.14