Audio sprites

April 13th, 2011. Tagged: mobile, Music, performance

Another "brilliant" idea that I had recently - how about combining audio files into a single file to reduce HTTP requests, just like we do with CSS sprites? Then use the audio APIs to play only selected parts of the audio. Unlike pretty much all brilliant ideas I have, I decided to search for this one before I dive in. Turned out Remy Sharp has already talked about this. So I knew it was possible and wanted to check the server-side or things. (Remy is amazing, by the way, and I was happy to have him as a reviewer of "JavaScript Patterns")

Here's the demo - parts of Voodoo Chile covered by yours truly.

Playing separate files

Markup is a few audio elements:

<audio id="in">
  <source src="in.mp3">
  <source src="in.ogg" type="video/ogg">
<audio id="1">
  <source src="1.mp3">
  <source src="1.ogg" type="video/ogg">

I have the files:

  1. in.mp3 - intro
  2. 1.mp3 - figure 1
  3. 2.mp3 - figure 2
  4. out.mp3 - the end

1 and 2 repeat a few times.

I play these files in JavaScript using a next() iterator function, which contains (in a private closure) the melody (which file after which) and a pointer to the current file being played. After play()-ing each audio, I subscribe to the "ended" event and play the next() file.

var thing = 'the thing';
var next = (function() {  
    //log('#: file');
    var these = ['in', '1', '2', '1', '2', '1', '2', '1', 'out'],
        current = 0;
    return function() {
        thing = document.getElementById(these[current]);
        //log(current + ': ' + these[current] + ' ' + thing.currentSrc);;
        if (current < these.length - 1) {
            thing.addEventListener('ended', next, false);
        } else {
            current = 0;

That was easy enough. And worked (except on iPhone, see later). But we should be able to do better with sprites.


For the sprite I have this audio:

<audio id="sprite">
  <source src="combo.mp3">
  <source src="combo.ogg" type="video/ogg">

There's only one file - combo.mp3 which contains all the other four files played one after the other.

So we need to know the start and the length of each piece of audio. There are two parts to playing the sprite. First is knowing the lenghts and the "song" (meaning the succession of audios) and starting to play:

var sprites = {
      // id: [start, length]
       'in': [0, 3.07],
        '1': [3.07,  2.68],
        '2': [3.07 + 2.68,  2.68],
        out: [3.07 + 2.68 + 2.68, 11.79]
    song = ['in', '1', '2', '1', '2', '1', '2', '1', 'out'],
    current = 0,
    id = song[current],
    start = 0,
    end = sprites[id][1],
thing = document.getElementById('sprite');;

Next is "listening" and stopping when one audio should be stopped, then seeking through the file and playing another part of it. This is done with a setInterval(), I couldn't find a better audio event to listen to.

// change
interval = setInterval(function() {
    if (thing.currentTime > end) {
        if (current === song.length - 1) {
        id = song[current];
        start = sprites[id][0];
        end = start + sprites[id][1]
        thing.currentTime = start;;
        log(id + ': start: ' + sprites[id].join(', length: '));
}, 10);

And this is it. The property currentTime is read/write - you can figure out where we are and also fast-forward or rewind to where you want to go.


  • Sprites play fine in FF, Chrome, O, Safari, iPhone's mobile webkit.
  • I haven't tested IE9.
  • All the browsers played y stuff off by a few milliseconds, I think some early, some late. This is probably due to unreliable setTimeout(). Also I didn't cut the audio pieces very well, so that might have someting to do. Also adding a few milliseconds of silence between the sprites may help. A follow up experiment will be to have a piano of sorts and see how timely the audio is played after a click/button press.
  • iPhone didn't play properly the non-sprited verison - I believe because it won't let you autoplay unless there's a user action. There might be a workaround, but I only cared about the sprites and they are fine!

Server side

I was imagining the whole thing as a combo service like YUI's JS/CSS combo handler. The browser says: i need these 5 files, the server then creates a new audio file and sends it back. In this case it should also somehow send the data about start/length of each audio, so maybe a JSONP thing. I was mostly curious about those file formats and how the stitching would work.

In terms of file formats, it's not that bad, turns out all I need is MP3 and OGG in order to support all these browsers (I was prepared for worse).

(I could also probably support IE3? and above with a <bgsound> and a WAV, but the WAV is too big to be practical. So any IE (before 9) enthusiasm should probably end up in Flash.)

I recorded my audio pieces in Garage Band and exported as MP3.

ffmpeg is teh tool! It's like imagemagick for audio/video.

Cutting out extra 4-5 seconds Garage Band adds to each file you export:

$ ffmpeg -i in.mp3 -ss 0 -t 2.43 in-ok.mp3

(I didn't do that very precisely I think)

Converting MP3 to OGG is like:

$ ffmpeg -i in.mp3 -acodec vorbis -aq 60 -strict experimental in.ogg

Then the stitching.

MP3 files can actually be concatenated together just like JS/CSS, provided they have the same bitrate. I've done it in the past.

You can also combine by reading the files and piping them into ffmpeg. That somehow feels better:

$ cat in.mp3 1.mp3 2.mp3 out.mp3 | ffmpeg -i - combo.mp3

You can also consider putting a bit of silence between the separate audios.

In order to get the length info to return it to the client, you can use ffmpeg -i filename.mp3 (I haven't done that part)

OGGs cannot be concatenated like MP3, so the combo service should `cat` the mp3s as shown above, then convert to OGG (also shown :) )


You can now roll your own on-demand audio combo handler and use audio sprites to have fewer HTTP requests a more responsive app/game/html5 thing.

Tell your friends about this post: Facebook, Twitter, Google+

13 Responses

  1. >This is probably due to unreliable setTimeout().

    Even 100% accurate 1msec timing wouldn’t be good enough to make it completely seamless. For this kind of magic you need some audio buffer which you fill yourself. (Firefox’s new-ish experimental audio API should do the trick.)

    >Also I didn’t cut the audio pieces very well, so that might have someting to do.

    Maybe. Well, you should be aware that MP3 always adds some leading and trailing silence. It’s not suitable for looping. (Flash’s authoring tool works around this by automatically generating some cue-in and cue-out points, which are used to make the loop seamless.)

    >all I need is MP3 and OGG

    You should avoid MP3 if possible. The quality/size ratio isn’t that good – especially at lower bitrates – and then there is also that leading/trailing silence thing, I mentioned earlier.

    I use Ogg/Vorbis for all nice browsers, M4A/AAC for IE & Safari, and MP3 for Flash.

    For reference, my (low-q) batch file looks like this:

    rem m4a/aac
    for %%i in (*.wav) do faac -q 70 -w -s “%%i”
    del ..\m4a\*.m4a
    copy/y *.m4a ..\m4a\
    del *.m4a

    rem ogg/vorbis
    for %%i in (*.wav) do oggenc2 -q 0 “%%i”
    del ..\ogg\*.ogg
    copy/y *.ogg ..\ogg\
    del *.ogg

    rem mp3
    for %%i in (*.wav) do lame -b 64 -h “%%i” “%%~ni.mp3″
    del ..\mp3\*.mp3
    copy/y *.mp3 ..\mp3\
    del *.mp3

    There is res/m4a, res/mp3, res/ogg, and res/wav. The batch is in the wav directory.

    >combining audio files into a single file to reduce HTTP requests

    By the way, I’m currently using some kind of gzipped mxhr (with an index) for this. All of my resources (images, audio files, json, whatever) are in one file.

    Haven’t tried blobs/object-URLs yet, but they should work even better. Annoyingly this will mean that there are even more packages I need to generate. 2 (gzip/uncompressed) * 3 (ogg/m4a/noaudio) * 2 (b64/raw) = 12… oh joy! :)

  2. Thanks Jos, very informative and much appreciated! I’m a total noob in the audio formats and their quirks.

  3. Oh I forgot to mention one really annoying detail. See that “no audio” thing in the last paragraph? The reason for that is Safari.

    Safari + Windows – QuickTime -> Audio is undefined!

    One would expect that the Audio object always exists and that canPlayType would just return nothing for everything if the decoders don’t exist, but no, that’s not the case with bloody Safari. The Audio object itself doesn’t exist. So, guard your audio stuff with “if (Audio)”.

    I only do this check once during the initialization phase. If Audio doesn’t exist my framework creates dummy audio objects which got all the methods, but don’t do anything. This way I can write code which looks like I’m living in a world where the Audio object always exists.

    Another thing I want to drop here, because this information is kinda hard to find/test, are the Ogg/Vorbis and M4A/AAC checks:

    if (audio.canPlayType(‘audio/ogg; codecs=vorbis’) === ‘probably’) {
    [...go with ogg...]
    } else if (audio.canPlayType(‘audio/mp4; codecs=mp4a’) === ‘probably’) {
    [...go with m4a...]

    The exact codec name of HE-AAC is “mp4a.40.5″ and the one for AAC-LC is “mp4a.40.2″.

  4. Great concept. Really something new and refreshing to web developers.

  5. Nice code… What about android devices? Will it work there ?

  6. Hello there

    After countless hours finding out what to do with crackling & choppy sound playback on the Android platform as I am working on a page/mobile app in HTML 5 that can be used by people with aphasia, ms, ALS etc.

    Prototype choppy sound:

    I have been scrambling the web now to find out if I am incapable of implementing the HTML5 audio tag or if there was something else afoot – and I must say that I am nearly at the conclusion that creating a sound sprite is indeed the solution I’ve been looking for – I’ve merged a number of mp3′s as suggested (using SoX) and they playback without noise per se, it seems like a rambling but here’s the test:

    The test is heavily inspired by your experiments here and it seems like it just might do the trick….

    Naturally I need to combine the test with the layout already in place above but given some time I might actually have a prototype with non (or less) choppy sound up and running, which is very neat indeed – I would like to sincerely thank you for sharing the sound sprite idea and hope all the best for you and your efforts in the future – Keep up the good work!!!

    @Ritesh August 3rd, 2011 at 6:17 am

    Nice codeā€¦ What about android devices? Will it work there ?

    What android version works ?
    As for the above I have currently only tested and verified it to be working on Samsung Galaxy S2 that runs kernel

    Android 2.3.3


    BackWhat android version works ?

    Currently only tested on Samsung Galaxy S2 that runs kernel

    Android 2.3.3


  7. Seems like I pasted the same link twice in my comment above…. Grrr – Prototype choppy sound should have been – Sorry if it has caused any confusion.

    Yes I was very tired last night!

    – Anyway – As I said before – Thank you for sharing your experiences with HTML5 audio.

  8. self improvement books…

    [...]Audio sprites / Stoyan’s[...]…

  9. A fun website. Type your friend’s name and see what alphabet photos would pop up…

    [...]Audio sprites / Stoyan’s[...]…

  10. Great article, I’m going to refer to this in a course I’m taking. One thing, shouldn’t the ogg type be audio/ogg (not video/ogg)? Also to be safe you could also include the audio/mpeg type for mp3.

  11. There’s a nice example of something similar in the HTML5 track element spec, done by adding a track to a sound:

  12. nikon d700…

    [...]Audio sprites / Stoyan’s[...]…

  13. [...] thought: an audio sprite would be a good idea, put all samples in one file, then JS can update the UI depending on which [...]

Leave a Reply