Archive for the 'SEO' Category

Content-to-markup ratio bookmarklet

Thursday, March 5th, 2009

When you care about performance, or SEO (or just doing a good job as web dev) an interesting data point is the ratio of page content vs. the markup used to present this content. Or... how much crap we put in HTML in order to present what the users want to see - the content.

So I played tonight with a bookmarklet to provide this piece of stats.

Install

Right-click, add to favourites/bookmarks. Or simply click to see the ratio of this page.

content/markup

How it works

Since the scripts on the page may modify the content and markup, the bookmarklet makes an Ajax request to get a fresh copy of the page from the server. Then it runs a few regular expressions ("borrowed" from prototype.js) to strip all tags and the scripts/styles content. The first metric it provides is the size of the stripped content divided by the size of the original markup.

Then the bookmarklet tries to be more fair and count the alt, title and value attributes as content, including the size of attribute names themselves. And this is the second, "fair", metric. The content attributes are inspected using DOM methods, not regexp, so they can be affected by any javascript that has modified the page. Oh well, life's not fair.

Code

The bookmarklet code is served from here. The code is also on github.

Results

Here are some random results of running the bookmarklet on different sites.

http://www.cnn.com:
Total size: 92004 bytes
Content size: 11475 bytes
Content-to-markup ratio: 0.12
Fair ratio * : 0.16

http://www.sitepoint.com
Total size: 65989 bytes
Content size: 16199 bytes
Content-to-markup ratio: 0.25
Fair ratio * : 0.60

Article on http://en.wikipedia.org:
Total size: 21648 bytes
Content size: 3315 bytes
Content-to-markup ratio: 0.15
Fair ratio * : 0.35

http://www.phpied.com
Total size: 31899 bytes
Content size: 7933 bytes
Content-to-markup ratio: 0.25
Fair ratio * : 0.48

http://www.google.com SERP
Total size: 29963 bytes
Content size: 3351 bytes
Content-to-markup ratio: 0.11
Fair ratio * : 0.14

 

The “best programmer ever”

Monday, September 17th, 2007

Go ahead, do a Yahoo search for "best programmer ever". Not surprisingly #1 result is the blog of yours truly :)

best-programmer-ever.png

For some inexplicable reason, I'm not #1 in Google search results for the same query. Bizarre, isn't it? Not even on the first page. But hey, there a difference between being yet another piece of software that mines an insane amount of pages, giving matches to a query and being a smart piece of software that mines insane amounts of pages.

So, yeah, sweet stuff, and let me return the compliment with some link love back - thank you, best search engine ever. :D

 

No! to pagerank

Sunday, July 22nd, 2007

Joining an initiative started here.

Basically the idea is to stop worrying about page rank. I think the web will become a much better place if webmasters stop reading, discussing, worrying about their pagerank. If they altogether forget pagerank ever existed. Better for the webmasters (they'll have more time to think about their live visitors), better for the visitors (of course), better for the search engines (they can concentrate on their primary business which is providing best search results). I believe it would be easier for the search engines to tell spammers from decent webmasters, if the later stop worrying about page rank and stop experimenting what's good and what's not for the page rank. If we stop worrying about things like "is this new AJAXy thing with hidden divs will be considered spam or not" and give SE some time to catch their breath, they'll figure it out. They're smart, they have to be, it's their business. Think about it this way - if Google allows to be tricked by spammers, it will start provide bad results, people will stop using it (just as quickly as they started) and ... do you need to worry about the page rank of a dead search engine?

So if we all concentrate on providing quality web applications or content and use some common sense (friendly urls and semantic markup - h tags and such) then the rest will follow.

***SAY NO TO PAGERANK, YES TO LIFE WEBMASTERS LIST***

Buzz Marketing Blog
Young Entrepreneurs Blog
No Nonsense Business Advice
Sell your blog
CS Developer
Aplliance Journal
Madkane Humor Blog
Find New Leads
PHPied
UK SEO Directory
Bikinifigur: Abnehmen ohne Hunger
Bob Meets World
BlueJar Webmasters Guide
All Sux Dot Com
CodingPad
The Next Post
Tech Blog
Reality Wired
Tom Wilson Google Blog
Price Filter
a few loose screws
Clickon Web Design
Woody Maxim
Crystals Quest
What Simply Works
Affiliate profit center
Internet Marketing Blueprints
Cash 4 blogging
Hits USA
Wendy Haney
Sheterk Marketing
How To Build a website blog
The Block Party
Search4article
Best Online Earning Strategies
Online Security Authority
1nf0rmat10n.com
gems4friends

***List Ends Here**

(Learn how to help spreading the meme while getting a few backlinks in the way.)

 

Marquee search-engine spam

Thursday, November 10th, 2005

Marquee (<marquee>) - does anybody remember this IE-only HTML tag? Does anybody still use it? This sooo ooold, pre-historic, 20th century, Web1.0-ish tag :) . Thinking about marquee and falling into a nostalgic mood, how about the blink tag, eh? OK, I simply cannot resist the temptation of using them here.

marquee - Look ma, stuff moves without JS!
This blinks, doesn't it?

How cool is that! :D

Actually I was a bit surprised to find out that Firefox supports marquee. I wonder when did this non-standard, behavioural and otherwise totally plain wrong tag slip through the cracks. Anyway, that's not the point.

Today I was visiting a site (no URL, sorry) and I noticed some weirdness at the bottom of the page. Viewing the source had a surprise in store - a 1 pixel wide marquee tag full of keywords, many keywords, repeated keywords, search engine keywords... you get the point. Apparently this was aiming at the poor googlebot and the other search engine spiders, trying to convince them that this page is worth more (in keyword weight) than it actually is. Spammers!

Here's an example of the spam technique.
<marquee width="1">Blah, blah, some keywords and more keywords</marquee>

Oh well... :(