General — Specification
Definitions
To avoid any misunderstandings, let's agree on some important definitions that hold throughout this specification:
A syntax of a formal language (such as a programming language) is a set of rules that determine how to discover a syntactic structure of any phrase in that language, where a syntactic structure of a phrase is a tree-like (i.e., hierarchical) structure induced on that phrase (in the form of a character string) that enables formulation of the meaning of the phrase (or a constituent phrase thereof) by means of a straightforward recursive definition.1 Sometimes different syntaxes may adequately describe the same programming language, and the syntax is normally specified by a context-free grammar plus lexical rules.
A semantics of a formal language is a set of rules that determine how to discover the meaning of any phrase in that language.
Note that these definitions may be at odds with your intuition and may slightly differ from the corresponding definitions as they appear in some other contexts (such as specifications of other programming languages, theoretic and applied linguistics, etc.).
General Program Structure
A program in MANOOL consists of one or more source files, referred to as program units (or more accurately, native program units), each written in the formal language of MANOOL forms. Thus, the MANOOL specification concerns, in fact, with syntactic structure and meaning of MANOOL forms (or equally, with syntax and semantics of the language of forms).
A MANOOL program contains a designated main program unit and all program units it depends on, either directly or indirectly (that is, recursively). The presence of circular dependencies between program units would result in a meaningless program and even may be a cause of a non-terminating behavior thereof.
Note that program units written in MANOOL may also depend on foreign program units, implemented in other programming languages. For more information on program units and their dependencies, refer to Program Units.
The Abstract Machine
This specification introduces the concept of a fictional device implementing MANOOL, called the abstract machine, and in a few occasions its structure and behavior are discussed explicitly. This is for illustration purposes only; by no means the MANOOL specification places requirements on either structure or internal mode of operation of conforming implementations, which instead are to emulate the observable behavior of the abstract machine.2 This in practice extends to its asymptotic complexity characteristics whenever such characteristics are explicitly specified.
Translation Overview
To figure out the meaning of a form that makes up a program unit, the abstract machine transforms (compiles) the contents of the source file into an internal run-time representation, called run-time code.
Note that here the distinction between a compilation phase and a post-compilation (i.e., execution) phase is introduced not just for illustration purposes — in particular, some constituent expressions may actually need to be evaluated (once!) during compilation of the whole expression.3 In this specification a compilation phase is referred to hereinafter as compile-time whereas a post-compilation phase as run-time.
A three-stage translation (i.e., compilation) scheme is suggested for the abstract machine:
-
lexical analysis — The input string of characters is split into lexical elements (lexemes), whose meaning is then encoded in left-to-right order as a sequence of tokens.4 Note that in practice, whatever internal syntactic structure of individual lexemes is devised, it is generally unimportant for determination of their meaning; rather, the lexical syntax is used for their sheer classification.
-
syntactic analysis — The string of terminal symbols that corresponds to the sequence of tokens resulting from the previous compilation phase undergoes a syntactic analysis guided by a context-free grammar, which ultimately yields an abstract syntax tree (AST) encoded as a MANOOL (semantic) value. Note that in contrast to lexical analysis, here syntactic structure is essential for correct interpretation of source code.
-
semantic analysis and code generation — The form and consistency (e.g., the presence and placement of certain keywords) of the AST resulting from the previous compilation phase are checked, and finally, the run-time code is produced.5 Note that no new structural features are to be exposed on this stage, or they would at least reflect closely those of the AST.
Semantic analysis and code generation is a compositional process; that is, to carry out the semantic analysis and code generation for a form (encoded in an AST), the abstract machine performs (among other things) the semantic analysis and code generation for its constituent forms (represented by some subtrees of the original AST). For a description of this process, refer to Compiler Dispatcher.