CSS Lexer

I have so much stuff to do and I've been feeling a little overwhelmed lately. Not depressed, because it's next to impossible to be depressed at a climate including 320 sunny days a year and a beach. So I thought why not drop everything and relax. I'm currently staying at home, enjoying my unused vacation days. So no work, no meetings, no nothing. I thought I should relax by taking on a task that requires some degree of concentration as opposed to just jumping from one task to the next.

I have ideas for a bunch of CSS related tools and utilities, all of which, naturally, require understanding of CSS code. So I need a parser and thought I should write one in JavaScript.

The first step is a lexer scanner and I'm happy to share what I committed to github today. It's right here. I called it cssex (yep, cheesy but I didn't want to spend any time thinking of a proper name).

It's not doing much currently but it's a step. It takes a piece of CSS code and tokenizes it producing the following token types:

  • comment
  • string
  • white (spaces or tabs)
  • line (new lines)
  • identifier (could be anything, such as a property or value or font name)
  • number
  • match (5 combinations of two operators used in attribute matching such as ^=)
  • operator - such as . # % * and so on

You can see a test page here and I'll be happy to hear any bug reports. The test page takes CSS, tokenizes it, then recreates the source from the tokens to compare that the original is reproducible. It also highlights the different tokens in different colors and finally dumps the types and values of the tokens (the complete dump is in console.log)

As you can see, I'm continuing with the cheesiness:

  • foreplay.html is the test page (instead of "playground")
  • test-osterone.js (instead of simply "test") is the test runner that uses JavaScriptCore
  • penthouse.sh (instead of "suite") runs tests with the 213 CSS files from CSSZenGarden.com
  • sex.js is the lexer itself which defines the global CSSEX object with two methods - lex(source) and its opposite toSource(tokens)

So what's next is a proper parser validating those tokens produced by the lexer. Then the tools such as a minifier, highlighter, lint, and whatnot (for example something that will add automatically all -moz- and -o- and stuff to your border-radius). But first I need to draw me some railroad diagrams like those Douglas Crockford has for JavaScript and JSON, they should be immensely helpful when parsing. As you can probably guess, Crockford's JSlint and JSON parser and his writeup on Pratt's top down operator precedence is my source of "view source" :)

My main motivation behind all this (other than the itch) is a proper minifier written in JavaScript (therefore running everywhere), not just a collection of regular expressions that YUICSSmin is right now. Also a proper validator, one that understands the nature of the frontend beast and can handle everything from CSS2.1, CSS3's media queries, transitions, latest -webkit and -moz craziness all the way down to IE hacks, expressions, behaviors and filters. And everything in between. Because more often than not we don't validate CSS simply due to the w3c validator being too strict and out of touch with reality.

This entry was posted on Saturday, November 27th, 2010 and is filed under CSS. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.


Get notification for future posts: follow me on Twitter or subscribe to my RSS feed

21 Responses to “CSS Lexer”

  1. CSS Lexer / Stoyan's phpied.com : : css Says:

    [...] Więcej: CSS Lexer / Stoyan's phpied.com [...]

  2. Tweets that mention CSS Lexer / Stoyan's phpied.com -- Topsy.com Says:

    [...] This post was mentioned on Twitter by Vladimir Carrer and Stoyan Stefanov, Ben Alman. Ben Alman said: RT @stoyanstefanov: Blog: CSS Lexer http://www.phpied.com/css-lexer/ [...]

  3. HTML Scripts Tips and Secrets » Blog Archive » CSS Lexer / Stoyan's phpied.com Says:

    [...] more here: CSS Lexer / Stoyan's phpied.com Related Posts:Javascript Patterns By Stoyan Stefanov Ben Nadel reviews Javascript Patterns by [...]

  4. Robert Gentel Says:

    Stoyan, if you can come up with a way to make the file names any more lascivious I think your blog will get the clap.

  5. Stoyan Says:

    yeah, could be an invite to comment spammers to go crazy with the viagra stuff :)

  6. Peter van der Zee Says:

    Stoyan nice work :D

  7. tuzemec Says:

    Kudos! Especially for the names :-)

  8. porneL Says:

    Ouch, that looks like a lot of work :)

    I’ve written CSS parser using token expressions almost straight from CSS spec page:

    https://github.com/pornel/CSS-Preprocessor/blob/master/csstokens.php

    You can easily use them in normal regex engine if you build regular expression like:

    /(token1)|(token2)|(token3)|(.)/

    and then number of capture will give you index of matched token (or parse error if last one matches). Just repeat it and you’ve got fast tokenizer!

  9. Thierry Koblentz Says:

    Hi Stoyan,
    Sounds cool, or I should say… sexy! :)
    One thing I noticed is that the universal selector (*) is not recognized as a identifier.

  10. Alex Says:

    Very interesting idea and great names :D

  11. Nikita Vasilyev Says:

    You might want to reuse a WebKit’s lexer/grammar.
    http://svn.webkit.org/repository/webkit/trunk/WebCore/css/tokenizer.flex
    http://svn.webkit.org/repository/webkit/trunk/WebCore/css/CSSGrammar.y

    CSSOM.js currently doesn’t use it, but it’s on the radar.

  12. Greg Reimer Says:

    Hey this looks pretty cool. Question, would CSSEX.lex(someSource).toSource() effectively function as as CSS source prettifier? I need one of those for my own project.

  13. mseeley Says:

    Also, https://github.com/nzakas/parser-lib/tree/master/src/css/

  14. bbn Says:

    illl

  15. Kevin Decker Says:

    https://github.com/kpdecker/cssParser/blob/master/tests/testLexer.js#L63 has some test productions from the css parser that I was attempting to put together way back when. Feel free to pull if you find it helpful (and the token productions are a close enough match between your work and mine).

    All that I ask is that you please ignore the embarrassing for loop with an empty block. I’m sure I had a reason for that in the local repository ;)

  16. Random Knowledge Says:

    Thanks buddy. :D Very helpful

  17. Deloris Prendergast Says:

    Seriously. How would not I come up with this?

  18. Adrien Leygues Says:

    Hey!

    I would like to mention work from Daniel Glazman: http://glazman.org/JSCSSP/

    Enjoy =)

  19. Chung Terrezza Says:

    Thanks for every other wonderful article. The place else may anyone get that kind of info in such an ideal manner of writing? I’ve a presentation next week, and I am on the look for such info.

  20. learn Says:

    learn…

    [...]CSS Lexer / Stoyan’s phpied.com[...]…

  21. learn css l تعلم css Says:

    Useful info. Fortunate me I discovered your site unintentionally, and I am stunned why this accident didn’t took place in advance! I bookmarked it.

Leave a Reply