How many bytes is “normal” for a web font: a study using Google fonts / Stoyan's phpied.com

For images I think we (web developers) have a sense of how many bytes we can expect an image we see on a page to be. A JPEG photo? 100-ish K is ok for a decent quality. Less is nice. How about 200K? Hmmm..., ok. Half a meg? This must be a Hero of some sort. 2 megs? That better be a downloadable hi-res photo of Neptune or something.

But file sizes of web fonts? I personally don't have a gut feeling how much is too much and how much is to be expected. So here's my attempt to find out.

Data set

Turns out one can download all Google Fonts from GitHub. Under a gigabyte of stuff, lots of fonts. For my purposes I decided to only look into regular fonts (no bold, italics), which is still plenty. I took only the TTF files that have "Regular" in the name and that's 1128 files.

  find /gfonts -type f -iname "*regular*" -print0 | xargs -0 cp -t ../regulars

Tools I used

Glyphhanger is a nice and easy Nodejs library and CLI that uses Python's fonttools and makes it trivial to subset fonts, while also converting to WOFF2 which is the format that will end up on the web.

Fontkit is also a Nodejs library that can inspect a font file and tell you some meta data such as number of characters, number of glyphs (those two are not synonymous, turns out). And there's also a nice crisp web UI on top of fontkit for all your font introspection needs.

US ASCII subset

Because I was sure some of these fonts may be wild (big sizes, tons of glyphs), I thought I'd level the playing field by subsetting each font only to the 95 characters in basic English, so no umlats and so on. This is the unicode range U+0020-007E, also conveniently called US_ASCII in Glyphhanger.

Converting all fonts is a one-liner:

$ glyphhanger --subset="*.ttf" --US_ASCII --formats=woff2

Randomly inspecting some fonts I saw some have just a handful of characters, not the expected 95. Reason is some, say Japanese-only, have very few characters in the US_ASCII unicode range. So I thought I should filter only those that have 95 characters.

The complete script is available, but the salient parts are just looping all files, reading the content and passing each one to fontkit for introspection:

const fontkit = require('fontkit');

// all files
fs.readdir(fontDirectory, (err, files) => {
  files.forEach((file) => {
    fs.readFile(fontPath, (err, fontBuffer) => {
      const font = fontkit.create(fontBuffer);
  
      // and now some handy properties are available:
      font.familyName
      font.numGlyphs
      font.characterSet

font.characterSet.length lets us only work with the fonts that have 95 characters and discard the rest. This results in a total of 1074 files for us draw general conclusions. And here are the results...

Results

Average File Size: 19751.88 bytes
Median File Size: 12380 bytes
Average Glyph Count: 144.92
Median Glyph Count: 107
Number of font files: 1074

As you can see there are usually a few more glyphs than there are characters.

And so, a conclusion: the median font file with English-only subset of characters should be around 12K. If you look at your network requests and your font is much larger, well there's work for you to do.

Stats

The full stats are available here in CSV format but here's a taste...

Num chars	Num glyphs	Bytes	File	Font name
...	...	...	...	...
95	175	40260	GreatVibes-Regular-subset.woff2	Great Vibes
95	96	4248	Gudea-Regular-subset.woff2	Gudea
95	116	16088	GreyQo-Regular-subset.woff2	Grey Qo
95	96	47676	Griffy-Regular-subset.woff2	Griffy
95	123	14660	Gruppo-Regular-subset.woff2	Gruppo
95	107	13760	Gupter-Regular-subset.woff2	Gupter
95	156	17964	Gulzar-Regular-subset.woff2	Gulzar
95	116	24364	Gwendolyn-Regular-subset.woff2	Gwendolyn
95	213	14468	HachiMaruPop-Regular-subset.woff2	Hachi Maru Pop
95	98	10452	Halant-Regular-subset.woff2	Halant
95	98	6648	Habibi-Regular-subset.woff2	Habibi
95	96	10736	HammersmithOne-Regular-subset.woff2	Hammersmith One
95	96	10696	Handlee-Regular-subset.woff2	Handlee
95	107	34260	Hanalei-Regular-subset.woff2	Hanalei
95	107	16448	HanaleiFill-Regular-subset.woff2	Hanalei Fill
95	96	8356	Gurajada-Regular-subset.woff2	Gurajada
95	96	14912	HeadlandOne-Regular-subset.woff2	HeadlandOne
...	...	...	...	...

Outliers

What about some font files on the outer edges of the median?

Some small files (2K) are hardly useable:

Others (also 2K) are perfectly fine, though simple:

And even 3k can "buy" you a fine font that makes your visitors say, hey this website is not like the others:

On the larger side (250K) we have

(what happened to the capital F?)

and

I suspect more hole-y fonts are more complicated to draw and therefore weigh more, compared to simple strokes, like an old-timey digital watch.

LATIN

Alright, 95 characters is fine and all, but you're one Voilà! away from embarrassment, because your font doesn't have an à. So how about a more character-complete LATIN subset. Glyphhanger's LATIN is a more involved set of unicode ranges:

U+0000-00FF
U+0131
U+0152-0153
U+02BB-02BC
U+02C6
U+02DA
U+02DC
U+2000-206F
U+2074
U+20AC
U+2122
U+2191
U+2193
U+2212
U+2215
U+FEFF
U+FFFD

I'm not going to pretend I understand why this is the range, but I can tell you these are 385 characters in total, I checked.

let count = 13; // single chars: U+0131, U+02C6, etc

for (let codePoint = 0x0000; codePoint <= 0x00FF; codePoint++) {
  count++;
}
for (let codePoint = 0x0152; codePoint <= 0x0153; codePoint++) {
  count++;
}
for (let codePoint = 0x02BB; codePoint <= 0x02BC; codePoint++) {
  count++;
}
for (let codePoint = 0x2000; codePoint <= 0x206F; codePoint++) {
  count++;
}
console.log(count); // 385

Subsetting to LATIN is just as easy as US_ASCII:

$ glyphhanger --subset="*.ttf" --LATIN --formats=woff2

With US_ASCII we had 95 characters in most fonts and removed the ones with fewer characters to keep it all equal. Here, rarely, if ever there's a font that has all 385 characters. Most have a little over 200. So I somewhat randomly picked 200 as a number under which the font is not considered for a comparison. We still have over 1000 font files to compare, but that's a little caveat: not all fonts support the same characters. (I did keep the number of characters in the stats, see below)

Results

Average File Size: 29045.30 bytes
Median File Size: 19092 bytes
Average Glyph Count: 287.03
Median Glyph Count: 236
Number of font files: 1009

Conclusion: the median font file with Latin-extended subset of characters should be a little inder 20K. If you look at your network requests and your font is much larger, well there's work for you to do.

Stats

The full stats are available here in CSV format but here's a taste...

Num chars	Num glyphs	Bytes	File	Font name
262	315	15884	Arya-Regular-subset.woff2	Arya
224	260	32052	Arizonia-Regular-subset.woff2	Arizonia
224	247	40712	AreYouSerious-Regular-subset.woff2	Are You Serious
235	236	17488	Armata-Regular-subset.woff2	Armata
209	210	16920	Arvo-Regular-subset.woff2	Arvo
228	233	23044	Asar-Regular-subset.woff2	Asar
216	217	24424	Artifika-Regular-subset.woff2	Artifika
231	350	23464	Arsenal-Regular-subset.woff2	Arsenal
231	348	21244	AsapCondensed-Regular-subset.woff2	Asap Condensed
230	261	20792	Athiti-Regular-subset.woff2	Athiti
...	...	...	...	...
221	340	12504	ZenKakuGothicAntique-Regular-subset.woff2	Zen Kaku Gothic Antique
216	229	15872	ZenLoop-Regular-subset.woff2	Zen Loop
227	921	107016	YujiMai-Regular-subset.woff2	Yuji Mai
221	340	12516	ZenKakuGothicNew-Regular-subset.woff2	Zen Kaku Gothic New
226	350	15928	ZenKurenaido-Regular-subset.woff2	Zen Kurenaido
226	348	15564	ZenMaruGothic-Regular-subset.woff2	Zen Maru Gothic
221	341	34128	ViaodaLibre-Regular-subset.woff2	Viaoda Libre
226	350	19696	ZenOldMincho-Regular-subset.woff2	Zen Old Mincho
225	590	43104	ZillaSlab-Regular-subset.woff2	Zilla Slab
227	921	94288	YujiSyuku-Regular-subset.woff2	Yuji Syuku
216	317	32700	ZenTokyoZoo-Regular-subset.woff2	Zen Tokyo Zoo
229	595	43912	ZillaSlabHighlight-Regular-subset.woff2	Zilla Slab Highlight

Next time...

So here it is, folks, a web font file that supports extended Latin characters, your Às and your Ás and Â, Ã, Ä, Å... should weigh around 20K. Anything a little over (or a lot over) 20K is up to you to decide. Is the font worth it, can it be subset, etc, etc.

That's, of course, just, like, my opinion. Curious to see other folks' thoughts and/or further experimentation.

As a follow up I want to just try to see how much subsetting really helps. Stay tuned.

Tell your friends about this post on Facebook and Twitter

Sorry, comments disabled and hidden due to excessive spam.

Meanwhile, hit me up on twitter @stoyanstefanov