Content-to-markup ratio bookmarklet

March 5th, 2009. Tagged: Ajax, bookmarklets, JavaScript, performance, SEO

When you care about performance, or SEO (or just doing a good job as web dev) an interesting data point is the ratio of page content vs. the markup used to present this content. Or... how much crap we put in HTML in order to present what the users want to see - the content.

So I played tonight with a bookmarklet to provide this piece of stats.


Right-click, add to favourites/bookmarks. Or simply click to see the ratio of this page.


How it works

Since the scripts on the page may modify the content and markup, the bookmarklet makes an Ajax request to get a fresh copy of the page from the server. Then it runs a few regular expressions ("borrowed" from prototype.js) to strip all tags and the scripts/styles content. The first metric it provides is the size of the stripped content divided by the size of the original markup.

Then the bookmarklet tries to be more fair and count the alt, title and value attributes as content, including the size of attribute names themselves. And this is the second, "fair", metric. The content attributes are inspected using DOM methods, not regexp, so they can be affected by any javascript that has modified the page. Oh well, life's not fair.


The bookmarklet code is served from here. The code is also on github.


Here are some random results of running the bookmarklet on different sites.
Total size: 92004 bytes
Content size: 11475 bytes
Content-to-markup ratio: 0.12
Fair ratio * : 0.16
Total size: 65989 bytes
Content size: 16199 bytes
Content-to-markup ratio: 0.25
Fair ratio * : 0.60

Article on
Total size: 21648 bytes
Content size: 3315 bytes
Content-to-markup ratio: 0.15
Fair ratio * : 0.35
Total size: 31899 bytes
Content size: 7933 bytes
Content-to-markup ratio: 0.25
Fair ratio * : 0.48 SERP
Total size: 29963 bytes
Content size: 3351 bytes
Content-to-markup ratio: 0.11
Fair ratio * : 0.14

Tell your friends about this post on Facebook and Twitter

Sorry, comments disabled and hidden due to excessive spam.

Meanwhile, hit me up on twitter @stoyanstefanov