screaming.org

This is the third installment in a series that parallels Matt Brubeck’s guided tour through making a browser engine so simplistic that its design can be captured in a handful of blog entries. The first entry offered a minimal set of types for a document object model. The second entry described a basic parser that could reify that model from an HTML document. This entry repeats those two exercises for a different kind of document that browsers must understand: a cascading style sheet.

As usual, you should read our fearless leader’s post prior to reading this one. After all, Matt works for Mozilla, while I’m just someone who thought it would be fun to follow along.

CSS Model

Cascading Style Sheets (CSS) is a declarative language specific to the domain of associating elements in some markup language with properties describing their format and layout. The structure of the language, at the largest scale, is just a list of rules.

public struct Stylesheet {
    public let rules: [Rule]
}

One rule might look like this:

body { background-color: #4a525a; }

Each rule is made of a comma-delimited list of selectors followed by a declaration block, the former being everything prior to the opening curly-brace, and the latter being a semicolon-delimited list of declarations bounded by curly-braces.

public struct Rule {
    public let selectors: [Selector]
    public let declarations: [Declaration]
}

We’re going to confine ourselves to a well-trafficked subset of selector syntax, but we’re going to allow a place to extend our model later should we decide to support selector combinators or selector namespaces. We’ll do this by defining an enum – which, for now, will only have one case:

public enum Selector {
    case Simple(SimpleSelector)
}

That case encompasses the familiar:

…a simple selector can include a tag name, an ID prefixed by ‘#’, any number of class names prefixed by ‘.’, or some combination of the above.

public struct SimpleSelector {
    public var tagName: String?
    public var id: String?
    public var classes: [String]
}

Each declaration in the semicolon-delimited list is just a name/value pair separated by a colon.

public struct Declaration {
    public let name: String
    public let value: Value
}

The values in these pairs, however, can take many forms. Again, we aim to ignore the devil hiding in the mountain of details so we limit ourselves to simple keywords, pixel-unit values, and colors.

public enum Value {
    case Keyword(String)
    case Length(Float, Unit)
    case Color(UInt8, UInt8, UInt8, UInt8) // RGBA
}

public enum Unit {
    case Px
}

Now that we have some types we can use to represent simple stylesheets, we move on to making a corresponding parser.

CSS Parser

Our quick-and-dirty parser implementation looks very much like the HTML parser in the last post, with a handlful of functions that invoke the consumeWhile() function with various predicates.

// Parse a property name or keyword.
mutating func parseIdentifier() -> String {
    return self.consumeWhile(validIdentifierChar)
}

This is a good place to mention a little bit of follow-up from the last post. Recall that I ran into a little hurdle when I tried to test character-set membership using a Swift Character type, and ended up creating bridging the gap through NSString, which was a bit of a kluge. It turns out that this hack only became necessary because I was insisting on using the old NSCharacterSet class. If I had given up on that API, I might have ended up with some predicates that looked a lot more like their Rust counterparts in the Robinson code.

After playing around in a Swift Playground, I figured out that ranges of Characters actually do work after all. I just had to use ... instead of .., and replace Rust’s | operator with commas.

func validIdentifierChar(c: Character) -> Bool {
    switch c {
    case "a"..."z", "A"..."Z", "0"..."9", "-", "_": return true // TODO: Include U+00A0 and higher.
    case _: return false
    }
}

That said, I would have had to write some extra code either way, because Matt’s code gets to lean on Rust’s std::char::is_whitespace method. Swift’s Character doesn’t have such a method (yet) so either I create one of my own – similar to the validIdentifierChar() method above – or I set up the Rube Goldberg bridge to NSCharacterSet.

Now we can look at our method to parse a SimpleSelector. You may have noticed that I declared its fields as var above. There was a reason for that: the way the following method imitates its Rust counterpart requires it. Notice how we allocate the SimpleSelector before entering the loop, and mutate its fields as we encounter characters in the loop. If I really wanted to keep my structure immutable, I could have created more local variables and constructed the SimpleSelector at the end of the loop.

// Parse one simple selector, e.g.: `type#id.class1.class2.class3`
mutating func parseSimpleSelector() -> SimpleSelector {
    var selector = SimpleSelector(tagName: nil, id: nil, classes: [])
    outerLoop: while !self.eof() {
        switch self.nextCharacter() {
        case "#":
            self.consumeCharacter()
            selector.id = self.parseIdentifier()
        case ".":
            self.consumeCharacter()
            selector.classes.append(self.parseIdentifier())
        case "*":
            // universal selector
            self.consumeCharacter()
        case let c where validIdentifierChar(c):
            selector.tagName = self.parseIdentifier()
        case _:
            break outerLoop
        }
    }
    return selector;
}

The type alias for the Specificity tuple looks very much like the equivalent in Robinson. I was tempted here to use Swift’s feature that allows us to name the three fields of the tuple. There were a few moments where I had to keep reminding myself which field represented which part of the selector, so I probably should have leveraged that feature.

public typealias Specificity = (Int, Int, Int)

I wrote the function that computes the specificity of a selector turned out as a Swift “computed property”. There was no particular reason for that, and I’m not yet sure when one should or should not do that.

There is more to say about how this compares to the equivalent method in Robinson, though. Notice how Rust’s std::option is an actual collection with an iterator that you can query for cardinality. This is something you cannot yet do with Swift’s Optional. I don’t mind having to substitute some ternary expressions here, but I do like the Rust way.

extension Selector {
    public var specificity: Specificity {
        switch self {
        case .Simple(let simple):
            let a = simple.id == nil ? 0 : 1
            let b = simple.classes.count
            let c = simple.tagName == nil ? 0 : 1
            return Specificity(a, b, c)
        }
    }
}

Here is the last pair of methods mentioned in Matt’s post. As with most of what we have seen so far, the Swift versions looks very much like the Rust counterparts. The most obvious difference here is that the break statement requires an outerLoop label to signify that it is the while loop that we want to break out of, and not the switch. Rust doesn’t have this ambiguity. The reason can be traced back to Rust’s expression-oriented nature: its match expression returns a value. What value should it return if one were allowed to use break to escape the match expression?

I bet that Swift will likely be the last major new language to go down the statement-oriented path. If I could have only one wish to make Swift better, I would wish that it had taken the high road.

// Parse a rule set: `<selectors> { <declarations> }`.
mutating func parseRule() -> Rule {
    return Rule(selectors: self.parseSelectors(), declarations: self.parseDeclarations())
}

// Parse a comma-separated list of selectors.
mutating func parseSelectors() -> [Selector] {
    var selectors: [Selector] = []
    outerLoop: while true {
        selectors.append(.Simple(self.parseSimpleSelector()))
        self.consumeWhitespace()
        let c = self.nextCharacter()
        switch c {
            case ",":
                self.consumeCharacter()
                self.consumeWhitespace()
            case "{":
                break outerLoop
            case _:
                assert(false, "Unexpected character \(c) in selector list")
        }
    }
    // Return selectors with highest specificity first, for use in matching.
    selectors.sort {
        $0 > $1
    }
    return selectors
}

Finally, I would like to point out that the closure given to the sort method above works because I overloaded the > operator. This is another thing that Rust gives for free. Since every slot in Matt’s Specificity tuple is occupied by something that implements Ord, the entire tuple automatically implements Ord, and can therefore be compared in a manner somewhat similar to how Python allows. In the absence of such linguistic luxury, I implemented this:

func > (lhs:Specificity, rhs:Specificity) -> Bool {
    if lhs.0 == rhs.0 {
        if lhs.1 == rhs.1 {
            return lhs.2 > rhs.2
        } else {
            return lhs.1 > rhs.1
        }
    } else {
        return lhs.0 > rhs.0
    }
}

func > (lhs: Selector, rhs: Selector) -> Bool {
    return lhs.specificity > rhs.specificity
}

In the next post, we will take up the two proton packs we have built and ignore Dr. Egon Spengler’s admonition to never cross the streams.

In the mean time, you may want to check out Martin Tomasi’s project, which is following this same exercise, but is implemented in Java.