Compiler addSlot: #Lexer valued: Cloneable derive. Compiler Lexer addSlot: #stream. Compiler Lexer addSlot: #undoBuffer. Compiler Lexer addSlot: #lineNumber. "The lexer takes an input character stream and divides it up into tokens, using a buffer as necessary to hold the tokenized contents. Also, debugging information is stored for now in terms of the line number that the current stream position has reached." l@(Compiler Lexer traits) newOn: stream "Target the lexer to the particular stream and initialize it." [| newL | newL: l clone. newL stream: stream. newL undoBuffer: ExtensibleSequence newEmpty. newL lineNumber: 1. newL ]. l@(Compiler Lexer traits) atEnd "The lexer has reached its end when the stream is exhausted and the buffer is empty." [ l undoBuffer isEmpty and: [l stream atEnd] ]. l@(Compiler Lexer traits) peekCharacter "Grab the next character, but leave it in the buffer, so the position is not advanced." [ l undoCharacter: l nextCharacter ]. l@(Compiler Lexer traits) nextCharacter "To get the next character, either pull one from the buffer or read from the stream of characters. Raise an error if this is used at the end, and advance the line number if a new-line is reached." [| c | l undoBuffer isNotEmpty ifTrue: [c: l undoBuffer removeLast] ifFalse: [ l stream atEnd ifTrue: [error: 'Line ' ; l lineNumber ; ': Unexpected end of stream']. c: l stream next ]. c = $\n ifTrue: [l lineNumber: l lineNumber + 1]. c ]. l@(Compiler Lexer traits) undoCharacter: c "Put the character back into the buffer, and decrement the line number if it's a new-line." [ c = $\n ifTrue: [l lineNumber: l lineNumber - 1]. l undoBuffer addLast: c ]. l@(Compiler Lexer traits) readInteger: radix "The general method for building integers from the raw characters, with a radix (number of digits) parameter. Grab all following digits for the radix, multiplying the accumulator by the radix and adding the numeric equivalent of the character." [| number | number: 0. [l atEnd not and: [l peekCharacter isDigit: radix]] whileTrue: [ number: number * radix + (l nextCharacter toDigit: radix) ]. number ]. l@(Compiler Lexer traits) readMantissa "Build a floating-point number's fractional part." [| number place | number: 0. place: 1. [l atEnd not and: [l peekCharacter isDigit]] whileTrue: [ number: number * 10 + l nextCharacter toDigit. place: place * 10 ]. (number as: Float) / (place as: Float) ]. l@(Compiler Lexer traits) readExponent "Build a floating-point number's exponent as an integer." [| sign c | sign: 1. c: l nextCharacter. (c = $+ or: [c = $-]) ifTrue: [c = $- ifTrue: [sign: -1]] ifFalse: [l undoCharacter: c]. sign * (l readInteger: 10) ]. l@(Compiler Lexer traits) readNumber "The overall routine for building numbers." [| token number sign c | "Assign the default sign, then override it based on the presence of an explicit sign character." sign: 1. c: l nextCharacter. (c = $+ or: [c = $-]) ifTrue: [c = $- ifTrue: [sign: -1]] ifFalse: [l undoCharacter: c]. "Now read in all the continuous string of digits possible as an integer." number: (l readInteger: 10). "Reaching the end of the lexing stream just finalizes the process." l atEnd ifTrue: [ token: Compiler LiteralToken clone. token value: sign * number. ^ token ]. "Conditionalize on the next character: it may set up a radix or a decimal." c: l nextCharacter. (c = $r or: [c = $R]) ifTrue: [number: (l readInteger: number)] ifFalse: [ c = $. ifTrue: [ number: (number as: Float) + l readMantissa. l atEnd ifTrue: [ token: Compiler LiteralToken clone. token value: sign * number. ^ token ]. c: l nextCharacter. ]. (c = $e or: [c = $E]) ifTrue: [number: (number as: Float) * (10.0 raisedTo: l readExponent)] ifFalse: [l undoCharacter: c] ]. token: Compiler LiteralToken clone. token value: sign * number. token ]. l@(Compiler Lexer traits) readEscapedCharacter "Language support for character escapes. This should be called at the point after the initial escape is seen, whether as a character or part of a string." "TODO: ensure this case-list is complete." [| c | c: l nextCharacter. c caseOf: { $n -> [$\n]. $t -> [$\t]. $r -> [$\r]. $b -> [$\b]. $s -> [$\s]. $a -> [$\a]. $v -> [$\v]. $f -> [$\f]. $0 -> [$\0] } otherwise: [c] ]. l@(Compiler Lexer traits) readString "Build a string until the next single-quote character is encountered. Escaping is accounted for." [| writeStream token c | writeStream: (WriteStream newOn: ''). [c: l nextCharacter. c = $'] whileFalse: [writeStream nextPut: (c = $\\ ifTrue: [l readEscapedCharacter] ifFalse: [c])]. token: Compiler LiteralToken clone. token value: writeStream contents. token ]. l@(Compiler Lexer traits) readComment "Build a comment string until the next double-quote character is encountered. Escaping is accounted for." [| writeStream token c | writeStream: (WriteStream newOn: ''). [c: l nextCharacter. c = $"] whileFalse: [writeStream nextPut: (c = $\\ ifTrue: [l readEscapedCharacter] ifFalse: [c])]. token: Compiler CommentToken clone. token comment: writeStream contents. token ]. l@(Compiler Lexer traits) readSelector: type "Read a selector symbol into a token." [| writeStream token c | writeStream: (WriteStream newOn: ''). [ l atEnd or: [c: l peekCharacter. c isWhitespace] or: ['()[]{}@.|!' includes: c] ] whileFalse: [writeStream nextPut: l nextCharacter]. token: type clone. token selector: (writeStream contents as: Symbol). token ]. l@(Compiler Lexer traits) readLiteral "This handles the literal brace array syntaxes." [| writeStream token c | writeStream: (WriteStream newOn: ''). (l atEnd not and: [l peekCharacter = $(]) ifTrue: [^ Compiler BeginLiteralParenthesisToken]. (l atEnd not and: [l peekCharacter = ${]) ifTrue: [^ Compiler BeginLiteralArrayToken]. (l atEnd not and: [l peekCharacter = $[]) ifTrue: [^ Compiler BeginLiteralBlockToken]. (l atEnd not and: [l peekCharacter = $']) ifTrue: [ l nextCharacter. [c: l nextCharacter. c = $'] whileFalse: [writeStream nextPut: (c = $\\ ifTrue: [l readEscapedCharacter] ifFalse: [c])]. token: Compiler LiteralToken clone. token value: (writeStream contents as: Symbol). ^ token ]. [ l atEnd or: [c: l peekCharacter. c isWhitespace] or: ['()[]{}@.|!' includes: c] ] whileFalse: [writeStream nextPut: l nextCharacter]. token: Compiler LiteralToken clone. token value: (writeStream contents as: Symbol). token ]. l@(Compiler Lexer traits) readCharacter "Read in a single character into a token or an escaped one." [| token c | c: l nextCharacter. c = $\\ ifTrue: [c: l readEscapedCharacter]. token: Compiler LiteralToken clone. token value: c. token ]. l@(Compiler Lexer traits) readToken "The overall handler for tokenization, this conditionalizes on the various initializing characters to build the various token objects." "TODO: place these dispatch tables in persistent places, much like a Lisp read-table." [| c | [ l atEnd ifTrue: [^ Compiler EndStreamToken]. c: l nextCharacter. c isWhitespace ] whileTrue. c caseOf: { $' -> [l readString]. $" -> [l readComment]. $$ -> [l readCharacter]. $# -> [l readLiteral]. $( -> [Compiler BeginParenthesisToken]. $) -> [Compiler EndParenthesisToken]. ${ -> [Compiler BeginArrayToken]. $} -> [Compiler EndArrayToken]. $[ -> [Compiler BeginBlockToken]. $] -> [Compiler EndBlockToken]. $@ -> [Compiler AtToken]. $. -> [Compiler EndStatementToken]. $| -> [Compiler BeginVariablesToken]. $! -> [Compiler TypeToken]. $` -> [l readSelector: Compiler MacroSelectorToken] } otherwise: [ ((c = $+ or: [c = $-]) and: [l peekCharacter isDigit]) ifTrue: [l undoCharacter: c. l readNumber] ifFalse: [ l undoCharacter: c. c isDigit ifTrue: [l readNumber] ifFalse: [l readSelector: Compiler SelectorToken] ] ] ].