ICS 142
Klefstad
Review
Outline
Compiler Design Lexical Analysis Syntax Analysis Error Handling and Recovery Syntax-Directed Translation Type Checking Run-time Environments Intermediate Code Generation Code Generation Code Optimization Interpreters
Compiler Design
Separation of activities is good design
know the purpose, input, output, and alternatives of each
Compiler activities
Lexical Analysis
Syntax Analysis
Semantic Analysis
Error Handling
Intermediate Code Generation
Code Optimization
Code Generation
General goal of compiler science
automatic generation of quality compiler from a language specification
Lexical Analysis
Primary purpose
convert input characters into a sequence of tokens
Class of languages
regular languages (only repetition, concatenation, disjunction)
Notations
regular expressions
Formal model
Finite-State Automaton
Issues
must be fast (looks at every character in source program)
Algorithms: RE --> NFA --> DFA --> Minimal DFA
Alternatives
hand-coded state machine
generated automatically from regular definition
Syntax Analysis
Primary purpose
group tokens into constructs, to see if program has correct syntax
builds a
parse-tree
showing the derivation from start symbol
Input
a stream of tokens
Output
a parse tree or a syntax tree
Class of languages
context free languages (RL + nesting/balancing)
can't handle for correct use of identifiers (EG types, subprograms)
Notations
Context-Free Grammars
Formal model
push-down automata
- FSA with unbounded stack
Alternatives
CKY, Earley's
Operator-Precedence parsing
LR Parsing (
handle-pruning
(shift/reduce))
SLR, LALR
LL Parsing
With back-tracking
Predictive Parsing (Must compute First and Follow)
Issues
Algorithms: LR parser generation
Writing a grammar:
associativity
precedence
left-factoring
removing right recursion
Ambiguity
Error Handling and Recovery
Purpose
to detect and report program errors, then continue to check program
Alternatives
panic mode
discard input tokens until one of an designated set is found
phrase-level recovery
replace a prefix of the input with a correct string
error productions
add productions to the grammar for common errors
global correction
choose a minimal sequence of changes in the input to obtain a correct string
Syntax-Directed Translation
Primary purpose
a notation for specifying translation along with syntax
Input
a syntax-directed definition (CFG with semantic actions)
Two Varieties of Attributes
inherited attributes
synthesized attributes
Two Restrictions of SDD
S-attribute definition
- only synthesized attributes
L-attribute definition
- synthesized + inherited from left
any L may be converts to S
These attribute definitions can be evaluated while doing LR parsing
Symbol Tables
Purpose
for managing inherited attributes
Functions
insert, lookup, enter scope, exit scope
Representation
For Algol-like languages: a stack of tables
Type Checking
Formal model
Type Systems
Notations
Type Expressions
Alternatives
structural equivalence
- types have same constructors
name equivalence
- types have same name
Efficient Representation for Type Systems
a Directed-Acyclic-Graph (DAG)
allows type checking with pointer equality comparison
Run-time Environments
Language issues affect storage organization and allocation
recursion, lifetime of variables, parameter passing modes, non-local references, dynamic storage allocation and deallocation, multi-tasking
Typical organization of memory
code segment
,
initialized data segment
,
uninitialized data segment
,
stack
,
heap
Issues
activation record
alignment, padding, packing
Intermediate Code Generation
Purpose
allows machine independent optimizations and easier porting of compiler
Alternatives
three address code
syntax trees
DAGs
postfix
Code Generation
Purpose
to generate machine specific instructions from intermediate code
Issues
instruction selection
register allocation
instruction scheduling (for RISC)
Sethi-Ullman register numbering and generation from expressison trees
Code Generator Generators
Peep-Hole Optimization - to fix local inefficiencies
Code Optimization
Purpose
apply techniques to yield space/time improvements
Requirement
must not introduce errors
must provide measurable improvement
Some Optimizations
Subprograms
static allocation of activation records
eliminating recursion
in-line substitution of subprogram bodies
Loops
loop unrolling
code motion (moving loop invariants out of loops)
reduction in strength
induction-variable elimination
Blocks
assigning variables to registers
algebraic transformations
switching order of evaluation within a block (to minimize registers)
eliminating common subexpressions
constant and copy propagation
constant folding
eliminating dead code
Techniques
DAGs for basic blocks
Flow Graphs - to represent control flow
Data-Flow analysis - optimizations based on Flow Graph
Call graph - shows who can call whom and when
focus optimizations on inner loops
Interpreters
Purpose
an alternative means of executing programs
provides quicker, more interactive program development
Components
representation for interpreted programs - REP
a scanner/parser - TEXT --> REP
a pretty-printer - REP --> TEXT
a symbol table manager - NAMES --> ATTRIBUTES
a memory manager - storage for parameters, activations
a compiler - REP --> CODE
The Lisp interpreter