Archive for the 'ydn' Category

YSlow 2.0: the first sketches

Saturday, June 11th, 2011

#4 This post is part of the Velocity countdown series. Stay tuned for the articles to come.

I'm working on tomorrow's kind of big thing, so will take it easy today, with a stroll down memory lane.

I was clearing up my space at home few days ago and came across this oldish notepad. In there (among the usual amount of lists of todos and ideas in the spirit of i-wanna-do-this-tool/site/experiment!) I found these early sketches of what has since become YSlow 2.0. These are all still pretty relevant, so why not take a minute to review them and get acquainted with the YSlow internals.

Back at the time Steve Souders and I had just released YSlow 0.9 and Steve had moved to Google. It was the right time to have a quick bugfix-or-two release of YSlow 1.0 and in parallel get cranking on a complete YSlow 2.0 rewrite.

The motivation behind the total rewrite was (aside from the usual "I didn't make this mess" ego-driven desire to start fresh and do a better job the second time around) was that we were getting a lot of "meh, these are Yahoo's problems/rules, not yours". For example a normal mere mortal blog with no CDN budget should still try to get an A in most other checks. Another, somewhat forward thinking, as opposed to reactive reason was that I was a big fan of letting others contribute rules and checks of their own. The idea was to make YSlow your own tool, not only Yahoo's. For example if you want to set a rule that there should be no more than 5 images on a page, you should be able to codify this into a rule. And share the rule with the rest of team or the world. (Here's an example). Another thing was also to decouple the tool from Firebug. Make it work without Firebug and even without Firefox. Go back to having a bookmarklet version (Steve's original very first version) and versions for other browsers. (Thanks to Marcel Duran this is also becoming a reality now)

So the new architecture (a big name for a bunch of objects) was conceived on these sketches while en route a red-eye flight to Bulgaria. My little kids were asleep taking over my seat as well, so here I was standing up in the aisle on the plane or sitting on the seat's handrail, scribbling these notes.

The main idea was divide and conquer. Split this monolithic piece of code into smaller components.

When you run YSlow, it starts by "peeling off" the page, extracting all possible information. Hence the Peeler singleton.

Peeler has methods such as getJavaScript() and getDocuments() (as in document + any frames). This can work most anywhere (bookmarklet too). Then if YSlow is running inside Firebug and has access to Net Panel (or any other browser or environment that lets you access stuff happening on the network, not only DOM crawling), it can also find things such as XHR requests or image beacons, which are not part of the DOM, using a NetMonitor listener object of some sorts.

Whatever Peeler finds, it sticks into a ComponentSet which is just an array of components along with some convenience methods such as getComponentsByType('css').

Moving on, the ComponentSet contains Component objects which have all the data, like headers, type, content, URL, the whole thing.

K, now we have a bunch of components waiting and willing to be inspected. To make this inspection as lego-like as possible, there's no big-ass inspector, but there are many little Rule objects. Each Rule object has a bunch of properties like name, URL with more info, etc, but the main thing is - it needs to implement a lint() method. The lint() method takes a reference to the ComponentSet and then returns a Result object.

The Result objects are fairly simple - they have a grade/score, message and optionally a list of offending components (e.g. images without Expires header). A bunch of result objects make a ResultSet which has methods to get the final total score.

A bunch of Rule objects go into a RuleSet. The idea is to mash those up as you wish. So a Rule object is for example "Use CDN". (it's also configurable, e.g. how many score points to take away for each offender). Also within a RuleSet you can define what is the relative weight of each Rule. E.g. is F on "Expires" rule as bad as F on "CSS expressions". You can create your own RuleSets (e.g. "Small blog") including an configuring any of the existing rules you like and also add more custom Rules. It's one big happy pool of Rules to pick from and configure. In fact YSlow 2.0 shipped with three rulesets - the new one with more rules, the old yslow1 and a "small site or blog"

At the end there is one central lint() method which takes a RuleSet, loops over the Rules in it, calls each Rule's lint() and collects the results into a ResultSet.

From there it's a question of rendering the ResultSet, grades, offenders, etc. Additionally there are tools that can run on the ComponentSet (e.g. JSLint) and stats. In addition to the YSlow UI, you should be able to render these results in any way you like, including exporting a JSON or whathaveyou.

Whew!

I may have missed some details but that's about all there is to the core of YSlow 2.0

Here's also a presentation that talks about these things and offers some diagrams that hopefully clarify even further

Thanks for reading!

That was it for today, only 4 days to go to Velocity. Hope you learned something you can use and you're ready to start coding your own rules and create rulesets to customize what YSlow can do for you.

To stay connected, there's now a Facebook page for YSlow and there's always the YDN (Yahoo! Developer Network) section about YSlow

 

Collecting web data with a faster, free server

Tuesday, December 8th, 2009

2010 update:
Lo, the Web Performance Advent Calendar hath moved

Dec 8 This article is part of the 2009 performance advent calendar experiment. This is also the first ever guest post to this blog.
Please welcome the world-famous Christian Heilmann! And stay tuned for the next articles.

Christian HeilmannChris Heilmann is a self confessed data junkie and worked for over 12 years as a professional web developer. Having published several books on JavaScript, Accessibility and web development using web services he right now works as a developer evangelist for the Yahoo Developer Network. He blogs at http://wait-till-i.com/, has all his talks and videos at http://icant.co.uk/ and can be found on Twitter as @codepo8.

 

RSS is a wonderful format to get information from all kind of different sources. It is dead easy to provide, has a predictable (albeit limited) format and is very easy to use. The problem of course is that with the amount of different feeds used in one page its performance goes down.

The reason is the classic HTTP request issue - the more you negotiate, find and pull the slower your page renders. Therefore you need to try to shorten the time the calls happen.

Say you want to pull the following five RSS feeds and display them:

  • http://code.flickr.com/blog/feed/rss/
  • http://feeds.delicious.com/v2/rss/codepo8?count=15
  • http://www.stevesouders.com/blog/feed/rss
  • http://www.yqlblog.net/blog/feed/
  • http://www.quirksmode.org/blog/index.xml

The least effective way of doing that is pulling and displaying them one after the other:

Retrieving five RSS feeds using curl takes about 11 seconds.

$oldtime = microtime(true);
$url = 'http://code.flickr.com/blog/feed/rss/';
$content[] = get($url);
$url = 'http://feeds.delicious.com/v2/rss/codepo8?count=15';
$content[] = get($url);
$url = 'http://www.stevesouders.com/blog/feed/rss';
$content[] = get($url);
$url = 'http://www.yqlblog.net/blog/feed/';
$content[] = get($url);
$url = 'http://www.quirksmode.org/blog/index.xml';
$content[] = get($url);
display($content);
echo '<p>Time spent: <strong>' . (microtime(true)-$oldtime) .'</strong></p>';
function get($url){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  $output = curl_exec($ch);
  curl_close($ch);
  return $output;
}
function display($data){
  foreach($data as $d){
    $obj = simplexml_load_string($d);
    echo '<div><h2><a href="'.$obj->channel->link.'">'.
          $obj->channel->title.'</a></h2>';
    echo '<ul>';
    foreach($obj->channel->item as $i){
      echo '<li><a href="'.$i->link.'">'.$i->title.'</a></li>';
    }
    echo '</ul></div>';
  }
}

Using Stoyan's Multi Curl function you already realise quite an increase in speed.

Retrieving five RSS feeds with multicurl takes about 2.8 seconds.

$data = array(
  'http://code.flickr.com/blog/feed/rss/',
  'http://feeds.delicious.com/v2/rss/codepo8?count=15',
  'http://www.stevesouders.com/blog/feed/rss',
  'http://www.yqlblog.net/blog/feed/',
  'http://www.quirksmode.org/blog/index.xml'
 
);
$r = multiRequest($data);
display($r);
 
function multiRequest($data, $options = array()) {
  // array of curl handles
  $curly = array();
  // data to be returned
  $result = array();
  // multi handle
  $mh = curl_multi_init();
  // loop through $data and create curl handles
  // then add them to the multi-handle
  foreach ($data as $id => $d) {
    $curly[$id] = curl_init();
    $url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
    curl_setopt($curly[$id], CURLOPT_URL,            $url);
    curl_setopt($curly[$id], CURLOPT_HEADER,         0);
    curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
    // post?
    if (is_array($d)) {
      if (!empty($d['post'])) {
        curl_setopt($curly[$id], CURLOPT_POST,       1);
        curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
      }
    }
    // extra options?
    if (!empty($options)) {
      curl_setopt_array($curly[$id], $options);
    }
    curl_multi_add_handle($mh, $curly[$id]);
  }
  // execute the handles
  $running = null;
  do {
    curl_multi_exec($mh, $running);
  } while($running > 0);
  // get content and remove handles
  foreach($curly as $id => $c) {
    $result[$id] = curl_multi_getcontent($c);
    curl_multi_remove_handle($mh, $c);
  }
  // all done
  curl_multi_close($mh);
  return $result;
}
function display($data){
  foreach($data as $d){
    $obj = simplexml_load_string($d);
    echo '<div><h2><a href="'.$obj->channel->link.'">'.
          $obj->channel->title.'</a></h2>';
    echo '<ul>';
    foreach($obj->channel->item as $i){
      echo '<li><a href="'.$i->link.'">'.$i->title.'</a></li>';
    }
    echo '</ul></div>';
  }
}

However, there are still two things that are annoying:

  • You pull far more data than you really need
  • You do all the request from your server

Yahoo Pipes has been used for that kind of task for quite a while, but the issue was that it is a visual interface and therefore hard to maintain. The server was also not the best performing out there.

The good news is that there is a new(er) kid on the block in Yahoo Land called YQL running on a massively fast server farm and with one purpose: making it easier to use web services, mix them and get only the data back that you want.

YQL in itself is a web service and you post queries to it that access other web services in a SQL style syntax. Normally you'd get RSS feeds using the RSS table, like so:

select * from rss where url="http://feeds.delicious.com/v2/rss/codepo8?count=15"

See the single RSS in the YQL console (you need a Yahoo account to log in). You can also see the single RSS retrieval output.

The issue with this is that it only retrieves the items of the RSS feed and not the title, which we need for the headings. Therefore we need to use the XML table:

select * from xml where url="http://feeds.delicious.com/v2/rss/codepo8?count=15"

See the single XML in the YQL console (you need a Yahoo account to log in). You can also see the single XML retrieval output.

This gives us the same data the normal cURL calls give us. The cool thing about YQL is though that you can filter the data you get back to the bare minimum. In our case, this means replacing the * with the title and the link of the feed and of the items:

select channel.title,channel.link,channel.item.title,channel.item.link 
  from xml 
  where url="http://feeds.delicious.com/v2/rss/codepo8?count=15"

See the filtered RSS in the YQL console (you need a Yahoo account to log in). You can also see the filtered RSS retrieval output.

In order to use this with all of our RSS feeds, we can use the in() command:

select channel.title,channel.link,channel.item.title,channel.item.link 
    from xml where url in(
      'http://code.flickr.com/blog/feed/rss/',
      'http://feeds.delicious.com/v2/rss/codepo8?count=15',
      'http://www.stevesouders.com/blog/feed/rss',
      'http://www.yqlblog.net/blog/feed/',
      'http://www.quirksmode.org/blog/index.xml'
    )

Check the aggregation in the console or the aggregation output.

This leaves all the hard work to the Yahoo Server farm. YQL pulls all the RSS feeds, adds one after the other and then gives it back to us as XML. We could simply use the generated URL from the console, but it is much more versatile to assemble the query in PHP:

$data = array(
  'http://code.flickr.com/blog/feed/rss/',
  'http://feeds.delicious.com/v2/rss/codepo8?count=15',
  'http://www.stevesouders.com/blog/feed/rss',
  'http://www.yqlblog.net/blog/feed/',
  'http://www.quirksmode.org/blog/index.xml'
);
$url ='http://query.yahooapis.com/v1/public/yql?q=';
$query = "select channel.title,channel.link,channel.item.title,channel.item.link from xml where url in('".implode("','",$data)."')";
$url.=urlencode($query).'&format=xml';
$content = get($url);
display($content);
function get($url){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  $output = curl_exec($ch);
  curl_close($ch);
  return $output;
}
function display($data){
  $data = simplexml_load_string($data);
  $sets = $data->results->rss;
  $all = sizeof($sets);
  for($i=0;$i<$all;$i++){
    $r = $sets[$i];
    $title = $r->channel->title.'';
    if($title != $oldtitle){
      echo '<div><h2><a href="'.($r->channel->link.'').'">'.
           ($r->channel->title.'').'</a></h2><ul>';
    }
      echo '<li><a href="'.($r->channel->item->link.'').'">'.
           ($r->channel->item->title.'').'</a></li>';
    if($title != $sets[$i+1]->channel->title.''){
      echo '</ul></div>';
    }
    $oldtitle = $r->channel->title.'';
  };
}

As you can see the loop to display the different RSS feeds a bit clunky and we could use an open YQL table to move the whole conversion to a server-side JavaScript. However, as it is the performance of this way of retrieving the RSS feeds beats all the others hands-down already:

Retrieving five RSS feeds using YQL takes about half a second

You can try it yourself, get the demo code from GitHub and run it on your own server to see the magic of YQL.

 

Yahoo Music API

Thursday, August 7th, 2008

This was meant to be a longer posts with examples and such, but Jim Bumgardner said it and coded it better than I could :) He's been with Y!Music way longer than me and has done way cooler stuff.

As a front-end engineer for Yahoo! Music, I've always thought it would great if the web services we use to create the Y! Music pages were available to developers outside of Yahoo!, and, as of today, due to the herculean labors of our web services team, they are!

Full blog post and code examples are on the YDN (Yahoo! Developer Network) blog and the Music API docs are there too.

Could you come up with an idea of a mashup using music data like artists, songs, similarities? The APIs are here.

 

My online footprint lately

Wednesday, July 23rd, 2008

This is a sort of a catch-up post for listing what I've been up to lately.

  • YUI Blog just published my first article, I'm so proud. It's about loading JavaScript in non-blocking fashion, because JavaScripts, they, you know, like, block downloads. Luckily, there's an easy fix - DOM includes, which I've previously discussed, discussed and discussed.
  • SitePoint published an update to my older article that introduces AJAX, ok, Ajax, by creating a command-line-like interface with PHP on the server side. The updated article features improved code, jQuery example, YUI example, JSON discussion and example. Check it out, bookmark and recommend to your friends that keep asking you "What's this AJAX (they are new, don't know it's now spelled "Ajax") thing? Do you know of a good article?"
  • YDN (developer.yahoo.com) published a video presentation of me and my lovely teammate Nicole Sullivan where we talk about some new and cool front-end performance techniques. So if you wandered how I look and are eager to hear my fabulous Balkan peninsula accent, give it a shot. The talk is called "After YSlow 'A'" and is targeted at those of you who have reached performance nirvana, but are still hungry for more. We talk about preloading components, post-loading, javascript, images, using flush() in PHP to send first byte early on and other fun stuff.
  • Last, not least, I decided to try and find some time to update my JavaScript patterns site. Unfortunately I got sidetracked (yep, I'm easily distracted by shiny objects) and played with a not-so-javascript pattern. The post I published (includes a pretty lame screencast! and) demonstrates how you can use animated background position to indicate loading progress.

Whew, c'est tout pout ce moment, expect a lot more now that the JavaScript book is out of the way. Ah, yep, if you feel like it, join me on Facebook, I created a JS book page.

 

Simultaneuos HTTP requests in PHP with cURL

Tuesday, February 19th, 2008

The basic idea of a Web 2.0-style "mashup" is that you consume data from several services, often from different providers and combine them in interesting ways. This means you often need to do more than one HTTP request to a service or services. In PHP if you use something like file_get_contents() this means all the requests will be synchronous: a new one is fired only after the previous has completed. If you need to make three HTTP requests and each call takes a second, your app is delayed at least three seconds.

Solution

An improvement of course is to cache responses as much as possible, but at one point or another you still need to make those requests.

Using the curl_multi* family of cURL functions you can make those requests simultaneously. This way your app is as slow as the slowest request, as opposed to the sum of all requests. And that's something.

A function

Here's a little function I coded that will allow you do multi requests.

<?php
 
function multiRequest($data, $options = array()) {
 
  // array of curl handles
  $curly = array();
  // data to be returned
  $result = array();
 
  // multi handle
  $mh = curl_multi_init();
 
  // loop through $data and create curl handles
  // then add them to the multi-handle
  foreach ($data as $id => $d) {
 
    $curly[$id] = curl_init();
 
    $url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
    curl_setopt($curly[$id], CURLOPT_URL,            $url);
    curl_setopt($curly[$id], CURLOPT_HEADER,         0);
    curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
 
    // post?
    if (is_array($d)) {
      if (!empty($d['post'])) {
        curl_setopt($curly[$id], CURLOPT_POST,       1);
        curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
      }
    }
 
    // extra options?
    if (!empty($options)) {
      curl_setopt_array($curly[$id], $options);
    }
 
    curl_multi_add_handle($mh, $curly[$id]);
  }
 
  // execute the handles
  $running = null;
  do {
    curl_multi_exec($mh, $running);
  } while($running > 0);
 
 
  // get content and remove handles
  foreach($curly as $id => $c) {
    $result[$id] = curl_multi_getcontent($c);
    curl_multi_remove_handle($mh, $c);
  }
 
  // all done
  curl_multi_close($mh);
 
  return $result;
}
 
?>

Consuming

The function accepts an array of URLs to hit and optionally an array of cURL options if you need to pass any. The first array can be a simple indexed array or URLs or it can be an array of arrays where the second has a key named "url". If you use the second way and you also have a key called "post", the function will do a post request.

The function returns an array of responses as strings. The keys in the result array match the keys in the input.

A GET example

Let's say you want to use some Yahoo search web services (consult YDN) to create a music artist band-o-pedia kind of mashup. Here's how you can search audio, video and images at the same time:

<?php
 
$data = array(
  'http://search.yahooapis.com/VideoSearchService/V1/videoSearch?appid=YahooDemo&query=Pearl+Jam&output=json',
  'http://search.yahooapis.com/ImageSearchService/V1/imageSearch?appid=YahooDemo&query=Pearl+Jam&output=json',
  'http://search.yahooapis.com/AudioSearchService/V1/artistSearch?appid=YahooDemo&artist=Pearl+Jam&output=json'
);
$r = multiRequest($data);
 
echo '<pre>';
print_r($r);
 
?>

This will print something like:

Array
(
    [0] => {"ResultSet":{"totalResultsAvailable":"633","totalResultsReturned":...
    [1] => {"ResultSet":{"totalResultsAvailable":"105342","totalResultsReturned":...
    [2] => {"ResultSet":{"totalResultsAvailable":10,"totalResultsReturned":...
)

A POST example

There's an interesting Yahoo search service called term extraction which analyses content. It accepts POST requests. Here's how to consume this service with the function above, making two simultaneous requests:

<?php
$data = array(array(),array());
 
$data[0]['url']  = 'http://search.yahooapis.com/ContentAnalysisService/V1/termExtraction';
$data[0]['post'] = array();
$data[0]['post']['appid']   = 'YahooDemo';
$data[0]['post']['output']  = 'php';
$data[0]['post']['context'] = 'Now I lay me down to sleep,
                               I pray the Lord my soul to keep;
                               And if I die before I wake,
                               I pray the Lord my soul to take.';
 
 
$data[1]['url']  = 'http://search.yahooapis.com/ContentAnalysisService/V1/termExtraction';
$data[1]['post'] = array();
$data[1]['post']['appid']   = 'YahooDemo';
$data[1]['post']['output']  = 'php';
$data[1]['post']['context'] = 'Now I lay me down to sleep,
                               I pray the funk will make me freak;
                               If I should die before I waked,
                               Allow me Lord to rock out naked.';
 
$r = multiRequest($data);
 
print_r($r);
?>

And the result:

Array
(
    [0] => a:1:{s:9:"ResultSet";a:1:{s:6:"Result";s:5:"sleep";}}
    [1] => a:1:{s:9:"ResultSet";a:1:{s:6:"Result";a:3:{i:0;s:5:"freak";i:1;s:5:"sleep";i:2;s:4:"funk";}}}
)