1.1 NotationThe descriptions of lexical analysis and syntax use a modified BNF grammar notation where necessary. This uses the following style of definition:
The first line says that a name is an lc_letter followed by a sequence
of zero or more lc_letters and underscores. A letter in turn is any of
the single characters "a" through "z". (This rule
is actually adhered to for the names defined in lexical and grammar rules
in this document.) In lexical definitions (as the example above), two more conventions are
used: Two literal characters separated by three dots mean a choice of
any single character in the given (inclusive) range of ASCII characters.
A phrase between angular brackets (<...>) gives an informal description
of the symbol defined; e.g., this could be used to describe the notion
of `control character' if needed. 2 Lexical AnalysisAn emBASIC program is read by a parser. Input to the parser is a stream
of tokens, generated by the lexical analyzer. This chapter describes how
the lexical analyzer breaks entered text into tokens. 2.1 Line structureAn emBASIC program is divided into a number of logical lines. Each line is parsed and interpreted once read from the input stream. Input stream could be either serial line input or BitBUS message sent by emBASIC WorkShop. 2.1.1 Logical linesThe end of a logical line is represented by the token EOL. Statements cannot cross logical line boundaries except where EOL is allowed by the syntax (e.g., between statements in compound statements). A logical line is constructed from one or more physical lines by following the explicit line joining rules. 2.1.2 Physical linesA physical line ends in whatever the current platform's convention is for terminating lines. On current platform, this is the ASCII LF (linefeed) character. 2.1.3 CommentsA comment starts with a REM token that is not part of a string literal,
and ends at the end of the physical line. A comment entered after program
code in the same line ends the logical line. Comments are ignored by the
syntax; they are not tokens. A line starting with * or // is also regarded
a comment line. 2.1.4 Explicit line joiningTwo or more physical lines may be joined into logical lines using backslash characters (\), as follows: when a physical line ends in a backslash that is not part of a string literal or comment, it is joined with the following forming a single logical line, deleting the backslash and the following end-of-line character. For example:
A line ending in a backslash cannot carry a comment. A backslash does
not continue a comment. Tokens or string literals cannot be split such
that part of the token or literal is placed on the continuation line. 2.1.5 Line numbering emBASIC does not need line numbers. It may generate them on demand and
signal errors with line numbers (or line offsets from the last label or
procedure start) but it uses line numbers only internally. Branches depend
on labels and function/procedure names. 2.1.6 Blank linesA logical line that contains only spaces, tabs, formfeeds and possibly
a comment, is not ignored (i.e., eol element is generated). During interactive
input of statements, handling of a blank line may differ depending on
the implementation of the WorkShop. In the current implementation, an
entirely blank logical line (i.e. one containing not even whitespace or
a comment) instructs TCE generator to produce eol to the code segment.
2.1.7 IndentationLeading whitespace (spaces and tabs) at the beginning of a logical line
is used to provide the indentation level of the line.
2.1.8 Whitespace between tokensExcept at the beginning of a logical line or in string literals, the whitespace characters space, tab and formfeed should be used to separate tokens. Whitespace is needed between two tokens always if their concatenation could be interpreted as a different token (e.g., ab is one token, but a b is two tokens). 2.1.9 Other tokensBesides EOL, the following categories of tokens exist: identifiers, keywords, literals, operators, and delimiters. Whitespace characters (other than line terminators, discussed earlier) are not tokens, but serve to delimit tokens. Where ambiguity exists, a token comprises the longest possible string that forms a legal token, when read from left to right. 2.2 Identifiers and keywordsIdentifiers can contain the letters A..Z or a..z, numbers 0..9 and underline “_”, dollar “$” or percent sign “%”. The length of an identifier is limited to 32 characters. A variable must not begin with a number and must not be identical to a reserved word. Case is not significant. 2.2.1 KeywordsThe following identifiers are used as reserved words, or keywords of the language, and cannot be used as ordinary identifiers. They must be spelled exactly as written here:
2.2.2 Reserved classes of identifiersCertain classes of identifiers (besides keywords) have special meanings.
These are system variables and system constants. The prepend character
"@" is used to identify system variable and prepend character
"$" is used to identify system constants. $VERSION $BUILD $COUNTRY $TERMINAL $TIMEZONE $CPUBUS $TSMBUS $ECBBUS $CANBUS $I2CBUS $DIN $DOUT $AIN $AOUT $PWM $EVTCNT $FREQ $POS $CFG_START_SIMULATION $CFG_STOP_SIMULATION $CFG_DOWN $CFG_ENABLE
$CFG_SET_CHANNEL_RANGE $CFG_SET_CONV_SPEED $CFG_SET_GAIN $CFG_SET_ATTENTUATE
$CFG_SET_OFFSET $CFG_SET_LINTAB $CFG_SET_PWM_FREQ $CFG_SET_INC_MODE $CFG_SET_DIR
$CFG_SSI_SET_TURNS $CFG_SSI_SET_STEPS $BGM_MODE_FIFO $BGM_MODE_LIFO $BGM_MODE_RING $BGM_MODE_RANDOM $BGM_MODE_FIT $BGM_CMD_GETSIZE $BGM_CMD_SETSIZE $BGM_CMD_FORMAT $COM_BR76800 $COM_BR57600 $COM_BR38400
$COM_BR19200 $COM_BR9600 $COM_BR4800 $COM_BR2400 $COM_BR1200 $COM_BR300
$COM_NONE $COM_EVEN $COM_ODD $COM_MARK $COM_SPACE $COM_7BPC $COM_8BPC
$COM_9MARK $COM_9SPACE $COM_9MARKONFIRST $COM_NOHS $COM_RTS $COM_XON 2.2.3 LiteralsLiterals are notations for constant values of some built-in types. 2.2.4 String literalsString literals can be enclosed in matching single quotes (') or double
quotes ("). They can also be enclosed in matching groups of three
single or double quotes (these are generally referred to as triple-quoted
strings). The backslash (\) character is used to escape characters that
otherwise have a special meaning, such as newline, backslash itself, or
the quote character. 2.2.5 String literal concatenationMultiple adjacent string literals (delimited by “+”), possibly
using different quoting conventions, are allowed, and their meaning is
the same as their concatenation.
|