fri.patterns.interpreter.parsergenerator.lexer
Class StandardLexerRules

java.lang.Object
  |
  +--fri.patterns.interpreter.parsergenerator.lexer.StandardLexerRules

public abstract class StandardLexerRules
extends java.lang.Object

Standard lexer rules are building blocks for lexers dealing with text input. This class resolves nonterminals enclosed in `backquotes` within an EBNF, e.g. `cstylecomment`.

Furthermore it provides methods to retrieve sets of rules describing certain standard scan items like `number` or `identifier`. The resulting arrays can be built together by SyntaxUtil.catenizeRules(...).

This class provides rules for comments with an arbitrary start character or start/end sequence:

Example (CStyleCommentStrip):

        String [][] rules = {
                { Token.TOKEN, "others" },	// define what we want to receive
                { Token.TOKEN, "`stringdef`" },	// need this rule as string definitions could contain comments
                { Token.IGNORED, "`cstylecomment`" },
                { "others", "others", "other" },
                { "others", "other" },
                { "other", "`char`", Token.BUTNOT, "`cstylecomment`", Token.BUTNOT, "`stringdef`" },
        };
        Syntax syntax = new Syntax(rules);
        SyntaxSeparation separation = new SyntaxSeparation(syntax);
        LexerBuilder builder = new LexerBuilder(separation.getLexerSyntax(), separation.getIgnoredSymbols());
        Lexer lexer = builder.getLexer();
        
TODO: Refactor this class and make smaller units with better names.

Author:
(c) 2002, Fritz Ritzberger
See Also:
LexerBuilder

Field Summary
static java.lang.String[][] chardefRules
          Rules describing C/Java-like character definitions: 'c', '\r', '\007'.
static java.lang.String[][] digitRules
          Numerical rules for binary and octal digits.
static java.lang.String[][] lexerSyntax
          Premade lexer syntax used to scan textual EBNF-like syntax specifications.
static java.lang.String[][] newlinesRules
          Rules describing one or more newlines.
static java.lang.String[][] numberRules
          Numerical rules for numbers within sourcecode: number ::= integer | float.
static java.lang.String[][] whitespaceRules
          Rules describing whitespace: newlines and spaces, minimum one.
static java.lang.String[][] xmlCharRules
          XML Char definitions of W3C.
static java.lang.String[][] xmlCombinigAndExtenderRules
          XML CombiningChar and XML Extender definitions of W3C.
 
Method Summary
static java.lang.String[][] catenizeRules(java.lang.String[][][] arrays)
          Catenizes some rule sets to one rule set.
static java.lang.String[][] getBinDigitsRules()
          Rules for binary number chars.
static java.lang.String[][] getCommentRules()
          Rules to scan C-style slash-star and slash-slash AND shell-style # comments.
static java.lang.String[][] getCStyleCommentRules()
          Rules to scan C-style slash-star and slash-slash comments.
static java.lang.String[][] getCustomMultiLineCommentRules(java.lang.String nonterminalName, java.lang.String startSeq, java.lang.String endSeq)
          Returns rules for a custom comment (like C-style "/*", but with passed start and end sequence).
static java.lang.String[][] getCustomOneLineCommentRules(java.lang.String nonterminalName, java.lang.String startChar)
          Returns rules for a custom comment (like C-style "//", but with passed start sequence).
static java.lang.String[][] getFloatRules()
          Rules for float number chars.
static java.lang.String[][] getHexDigitRules()
          Rules to scan one hexdigit.
static java.lang.String[][] getHexDigitsRules()
          Rules to scan hexdigits that form a number, starting "0x" not included.
static java.lang.String[][] getIntegerRules()
          Rules for integer number chars.
static java.lang.String[][] getNewlineRules()
          Rules to scan one platform independent newline.
static java.lang.String[][] getNewlinesRules()
          Rules to scan one platform independent newline.
static java.lang.String[][] getNumberRules()
          Rules for general number chars (integer, float).
static java.lang.String[][] getOctDigitsRules()
          Rules for octal number chars.
static java.lang.String[][] getQuantifierRules()
          Rules to read quantifiers "*+?"
static java.lang.String[][] getRulerefRules()
          Rules to read a `lexerrule` within EBNF syntax specifications.
static java.lang.String[][] getShellStyleCommentRules()
          Rules to scan # shell-style comments.
static java.lang.String[][] getSpaceRules()
          Rules to scan one space.
static java.lang.String[][] getSpacesRules()
          Rules to scan spaces.
static java.lang.String[][] getUnicodeBNFChardefRules()
          Rules to scan BNF-like 'c'haracterdefinitions.
static java.lang.String[][] getUnicodeChardefRules()
          Rules to scan C/Java-like 'c'haracterdefinitions: '\377', 'A', '\n'.
static java.lang.String[][] getUnicodeCharRules()
          Rules to scan one UNICODE character: 0x0 .. 0xFFFF.
static java.lang.String[][] getUnicodeCombiningCharRules()
          Rules for XML combining chars.
static java.lang.String[][] getUnicodeDigitRules()
          Rules to scan one digit.
static java.lang.String[][] getUnicodeDigitsRules()
          Rules to scan digits.
static java.lang.String[][] getUnicodeExtenderCharRules()
          Rules for XML extender chars.
static java.lang.String[][] getUnicodeIdentifierRules()
          Rules to scan identifiers that start with letter and continue with letter or digit or '_'.
static java.lang.String[][] getUnicodeLetterRules()
          Rules to scan one letter.
static java.lang.String[][] getUnicodeStringdefRules()
          Rules to scan "stringdefinitions" that can contain backslash as masking character.
static java.lang.String[][] getUnicodeXmlCharRules()
          Rules for XML combining chars.
static java.lang.String[][] getWhitespaceRules()
          Rules to scan one space or newline.
static java.lang.String[][] getWhitespacesRules()
          Rules to scan spaces or newlines.
static void printRules(java.lang.String[][] syntax)
          Print a grammar to System.out.
static java.lang.String[][] rulesForIdentifier(java.lang.String id)
          Returns e.g. the Letter-Rules getUnicodeLetterRules() for id "letter".
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

lexerSyntax

public static final java.lang.String[][] lexerSyntax
Premade lexer syntax used to scan textual EBNF-like syntax specifications.


xmlCharRules

public static final java.lang.String[][] xmlCharRules
XML Char definitions of W3C.


xmlCombinigAndExtenderRules

public static final java.lang.String[][] xmlCombinigAndExtenderRules
XML CombiningChar and XML Extender definitions of W3C.


digitRules

public static final java.lang.String[][] digitRules
Numerical rules for binary and octal digits.


numberRules

public static final java.lang.String[][] numberRules
Numerical rules for numbers within sourcecode: number ::= integer | float.


newlinesRules

public static final java.lang.String[][] newlinesRules
Rules describing one or more newlines.


chardefRules

public static final java.lang.String[][] chardefRules
Rules describing C/Java-like character definitions: 'c', '\r', '\007'.


whitespaceRules

public static final java.lang.String[][] whitespaceRules
Rules describing whitespace: newlines and spaces, minimum one.

Method Detail

rulesForIdentifier

public static java.lang.String[][] rulesForIdentifier(java.lang.String id)
Returns e.g. the Letter-Rules getUnicodeLetterRules() for id "letter". Using this, one can write things like `identifier` in a Lexer specification text, as LexerBuilder tries to resolve these words calling this method. Possible values for id are:


getCustomOneLineCommentRules

public static final java.lang.String[][] getCustomOneLineCommentRules(java.lang.String nonterminalName,
                                                                      java.lang.String startChar)
Returns rules for a custom comment (like C-style "//", but with passed start sequence).

Parameters:
nonterminalName - name of comment to be used within syntax, e.g. "basicComment".
startChar - string (1-n characters) defining the start sequence of the comment, e.g. ";"

getCustomMultiLineCommentRules

public static final java.lang.String[][] getCustomMultiLineCommentRules(java.lang.String nonterminalName,
                                                                        java.lang.String startSeq,
                                                                        java.lang.String endSeq)
Returns rules for a custom comment (like C-style "/*", but with passed start and end sequence).

Parameters:
nonterminalName - name of comment to be used within syntax, e.g. "pascalComment".
startSeq - string defining the start sequence of the comment, e.g. "(*"
endSeq - string defining the end sequence of the comment, e.g. "*)"

getUnicodeCharRules

public static final java.lang.String[][] getUnicodeCharRules()
Rules to scan one UNICODE character: 0x0 .. 0xFFFF.


getNewlineRules

public static final java.lang.String[][] getNewlineRules()
Rules to scan one platform independent newline.


getNewlinesRules

public static final java.lang.String[][] getNewlinesRules()
Rules to scan one platform independent newline.


getSpaceRules

public static final java.lang.String[][] getSpaceRules()
Rules to scan one space.


getSpacesRules

public static final java.lang.String[][] getSpacesRules()
Rules to scan spaces.


getWhitespaceRules

public static final java.lang.String[][] getWhitespaceRules()
Rules to scan one space or newline.


getWhitespacesRules

public static final java.lang.String[][] getWhitespacesRules()
Rules to scan spaces or newlines.


getHexDigitRules

public static final java.lang.String[][] getHexDigitRules()
Rules to scan one hexdigit.


getHexDigitsRules

public static final java.lang.String[][] getHexDigitsRules()
Rules to scan hexdigits that form a number, starting "0x" not included.


getUnicodeLetterRules

public static final java.lang.String[][] getUnicodeLetterRules()
Rules to scan one letter.


getUnicodeDigitRules

public static final java.lang.String[][] getUnicodeDigitRules()
Rules to scan one digit.


getUnicodeDigitsRules

public static final java.lang.String[][] getUnicodeDigitsRules()
Rules to scan digits.


getUnicodeIdentifierRules

public static final java.lang.String[][] getUnicodeIdentifierRules()
Rules to scan identifiers that start with letter and continue with letter or digit or '_'.


getUnicodeChardefRules

public static final java.lang.String[][] getUnicodeChardefRules()
Rules to scan C/Java-like 'c'haracterdefinitions: '\377', 'A', '\n'.


getUnicodeBNFChardefRules

public static final java.lang.String[][] getUnicodeBNFChardefRules()
Rules to scan BNF-like 'c'haracterdefinitions. They differ from C/Java-chardefs in that they can be written as digits: 0x20.


getUnicodeStringdefRules

public static final java.lang.String[][] getUnicodeStringdefRules()
Rules to scan "stringdefinitions" that can contain backslash as masking character.


getRulerefRules

public static final java.lang.String[][] getRulerefRules()
Rules to read a `lexerrule` within EBNF syntax specifications.


getQuantifierRules

public static final java.lang.String[][] getQuantifierRules()
Rules to read quantifiers "*+?" within EBNF syntax specifications.


getCommentRules

public static final java.lang.String[][] getCommentRules()
Rules to scan C-style slash-star and slash-slash AND shell-style # comments.


getCStyleCommentRules

public static final java.lang.String[][] getCStyleCommentRules()
Rules to scan C-style slash-star and slash-slash comments.


getShellStyleCommentRules

public static final java.lang.String[][] getShellStyleCommentRules()
Rules to scan # shell-style comments.


getUnicodeXmlCharRules

public static final java.lang.String[][] getUnicodeXmlCharRules()
Rules for XML combining chars.


getUnicodeCombiningCharRules

public static final java.lang.String[][] getUnicodeCombiningCharRules()
Rules for XML combining chars.


getUnicodeExtenderCharRules

public static final java.lang.String[][] getUnicodeExtenderCharRules()
Rules for XML extender chars.


getOctDigitsRules

public static final java.lang.String[][] getOctDigitsRules()
Rules for octal number chars.


getBinDigitsRules

public static final java.lang.String[][] getBinDigitsRules()
Rules for binary number chars.


getNumberRules

public static final java.lang.String[][] getNumberRules()
Rules for general number chars (integer, float).


getIntegerRules

public static final java.lang.String[][] getIntegerRules()
Rules for integer number chars.


getFloatRules

public static final java.lang.String[][] getFloatRules()
Rules for float number chars.


printRules

public static void printRules(java.lang.String[][] syntax)
Print a grammar to System.out.


catenizeRules

public static final java.lang.String[][] catenizeRules(java.lang.String[][][] arrays)
Catenizes some rule sets to one rule set. Does not check for uniqueness.