CSS Lexer

November 27th, 2010. Tagged: CSS

I have so much stuff to do and I've been feeling a little overwhelmed lately. Not depressed, because it's next to impossible to be depressed at a climate including 320 sunny days a year and a beach. So I thought why not drop everything and relax. I'm currently staying at home, enjoying my unused vacation days. So no work, no meetings, no nothing. I thought I should relax by taking on a task that requires some degree of concentration as opposed to just jumping from one task to the next.

I have ideas for a bunch of CSS related tools and utilities, all of which, naturally, require understanding of CSS code. So I need a parser and thought I should write one in JavaScript.

The first step is a lexer scanner and I'm happy to share what I committed to github today. It's right here. I called it cssex (yep, cheesy but I didn't want to spend any time thinking of a proper name).

It's not doing much currently but it's a step. It takes a piece of CSS code and tokenizes it producing the following token types:

  • comment
  • string
  • white (spaces or tabs)
  • line (new lines)
  • identifier (could be anything, such as a property or value or font name)
  • number
  • match (5 combinations of two operators used in attribute matching such as ^=)
  • operator - such as . # % * and so on

You can see a test page here and I'll be happy to hear any bug reports. The test page takes CSS, tokenizes it, then recreates the source from the tokens to compare that the original is reproducible. It also highlights the different tokens in different colors and finally dumps the types and values of the tokens (the complete dump is in console.log)

As you can see, I'm continuing with the cheesiness:

  • foreplay.html is the test page (instead of "playground")
  • test-osterone.js (instead of simply "test") is the test runner that uses JavaScriptCore
  • penthouse.sh (instead of "suite") runs tests with the 213 CSS files from CSSZenGarden.com
  • sex.js is the lexer itself which defines the global CSSEX object with two methods - lex(source) and its opposite toSource(tokens)

So what's next is a proper parser validating those tokens produced by the lexer. Then the tools such as a minifier, highlighter, lint, and whatnot (for example something that will add automatically all -moz- and -o- and stuff to your border-radius). But first I need to draw me some railroad diagrams like those Douglas Crockford has for JavaScript and JSON, they should be immensely helpful when parsing. As you can probably guess, Crockford's JSlint and JSON parser and his writeup on Pratt's top down operator precedence is my source of "view source" :)

My main motivation behind all this (other than the itch) is a proper minifier written in JavaScript (therefore running everywhere), not just a collection of regular expressions that YUICSSmin is right now. Also a proper validator, one that understands the nature of the frontend beast and can handle everything from CSS2.1, CSS3's media queries, transitions, latest -webkit and -moz craziness all the way down to IE hacks, expressions, behaviors and filters. And everything in between. Because more often than not we don't validate CSS simply due to the w3c validator being too strict and out of touch with reality.

Tell your friends about this post: Facebook, Twitter, Google+

21 Responses

  1. [...] Więcej: CSS Lexer / Stoyan's phpied.com [...]

  2. [...] This post was mentioned on Twitter by Vladimir Carrer and Stoyan Stefanov, Ben Alman. Ben Alman said: RT @stoyanstefanov: Blog: CSS Lexer http://www.phpied.com/css-lexer/ [...]

  3. [...] more here: CSS Lexer / Stoyan's phpied.com Related Posts:Javascript Patterns By Stoyan Stefanov Ben Nadel reviews Javascript Patterns by [...]

  4. Stoyan, if you can come up with a way to make the file names any more lascivious I think your blog will get the clap.

  5. yeah, could be an invite to comment spammers to go crazy with the viagra stuff :)

  6. Stoyan nice work :D

  7. Kudos! Especially for the names :-)

  8. Ouch, that looks like a lot of work :)

    I’ve written CSS parser using token expressions almost straight from CSS spec page:


    You can easily use them in normal regex engine if you build regular expression like:


    and then number of capture will give you index of matched token (or parse error if last one matches). Just repeat it and you’ve got fast tokenizer!

  9. Hi Stoyan,
    Sounds cool, or I should say… sexy! :)
    One thing I noticed is that the universal selector (*) is not recognized as a identifier.

  10. Very interesting idea and great names :D

  11. You might want to reuse a WebKit’s lexer/grammar.

    CSSOM.js currently doesn’t use it, but it’s on the radar.

  12. Hey this looks pretty cool. Question, would CSSEX.lex(someSource).toSource() effectively function as as CSS source prettifier? I need one of those for my own project.

  13. Also, https://github.com/nzakas/parser-lib/tree/master/src/css/

  14. illl

  15. https://github.com/kpdecker/cssParser/blob/master/tests/testLexer.js#L63 has some test productions from the css parser that I was attempting to put together way back when. Feel free to pull if you find it helpful (and the token productions are a close enough match between your work and mine).

    All that I ask is that you please ignore the embarrassing for loop with an empty block. I’m sure I had a reason for that in the local repository ;)

  16. Thanks buddy. :D Very helpful

  17. Seriously. How would not I come up with this?

  18. Hey!

    I would like to mention work from Daniel Glazman: http://glazman.org/JSCSSP/

    Enjoy =)

  19. Thanks for every other wonderful article. The place else may anyone get that kind of info in such an ideal manner of writing? I’ve a presentation next week, and I am on the look for such info.

  20. learn…

    [...]CSS Lexer / Stoyan’s phpied.com[...]…

  21. Useful info. Fortunate me I discovered your site unintentionally, and I am stunned why this accident didn’t took place in advance! I bookmarked it.

Leave a Reply