288x Filetype PDF File size 0.04 MB Source: courses.cs.washington.edu
CSE 401: Introduction to Compiler Construction Course Outline
Text:ModernCompilerImplementationinJava,SecondEdition, Compiler front-ends:
by Appel, with Palsberg • lexical analysis (scanning): characters → tokens
Suggested: Compilers - Principles, Techniques, and Tools, by • syntactic analysis (parsing): tokens → abstract syntax trees
Ahoet al. (the "Dragon Book") • semantic analysis (typechecking): annotate ASTs
Goals: Midterm
• learn principles & practice of language implementation
• brings together theory & pragmatics of previous courses Compiler back-ends:
• understand compile-time vs. run-time processing • intermediate code generation: ASTs → intermediate code
• study interactions among: • target code generation: intermediate code → target code
• language features
• implementation efficiency • run-time storage layout
• compiler complexity • target instruction selection
• architectural features • register allocation
• gain more experience with object-oriented design & Java • optimizations
• gain more experience working on a team
Final
Prerequisites: 322, 326, 341, 378
Sign up on course mailing list!
Craig Chambers 1 CSE 401 Craig Chambers 2 CSE 401
Project Grading
Start with compiler for MiniJava, written in Java Project: 40% total
Homework: 20% total
Add: Midterm: 15%
• comments Final: 25%
• floating-point values
• arrays
• static (class) variables Homework & projects due at the start of class
• for loops
• break statements 3 free late days, per person, for the whole quarter
• and more • thereafter, 25% off per calendar day late
Completed in stages over the quarter
Strongly encourage working in a 2-person team on project
• but only if joint work, not divided work
Grading based on:
• correctness
• clarity of design & implementation
• quality of test cases
Craig Chambers 3 CSE 401 Craig Chambers 4 CSE 401
An example compilation First step: lexical analysis
Sample (extended) MiniJava program: Factorial.java “Scanning”, “tokenizing”
// Computes 10! and prints it out Read in characters, clump into tokens
class Factorial { • strip out whitespace & comments in the process
public static void main(String[] a) {
System.out.println(
new Fac().ComputeFac(10));
}
}
class Fac {
// the recursive helper function
public int ComputeFac(int num) {
int numAux;
if (num < 1)
numAux = 1;
else
numAux = num * this.ComputeFac(num-1);
return numAux;
}
}
Craig Chambers 5 CSE 401 Craig Chambers 6 CSE 401
Specifying tokens: regular expressions Second step: syntactic analysis
Example: “Parsing”
Ident ::= Letter AlphaNum* Read in tokens, turn into a tree based on syntactic structure
Integer ::= Digit+ • report any errors in syntax
AlphaNum::= Letter | Digit
Letter ::= 'a' | ... | 'z' | 'A' | ... | 'Z'
Digit ::= '0' | ... | '9'
Craig Chambers 7 CSE 401 Craig Chambers 8 CSE 401
Specifying syntax: context-free grammars Third step: semantic analysis
EBNF is a popular notation for CFG’s “Name resolution and typechecking”
Example: Given AST:
Stmt ::= if ( Expr ) Stmt [else Stmt] • figure out what declaration each name refers to
| while ( Expr ) Stmt • perform typechecking and other static consistency checks
| ID = Expr;
| ... Key data structure: symbol table
Expr ::= Expr + Expr | Expr < Expr | ... • maps names to info about name derived from declaration
| ! Expr • tree of symbol tables corresponding to nesting of scopes
| Expr . ID ( [Expr {, Expr}] )
| ID Semantic analysis steps:
| Integer | ... 1. Process each scope, top down
| (Expr)
| ... 2. Process declarations in each scope into symbol table for
scope
EBNF specifies concrete syntax of language 3. Process body of each scope in context of symbol table
Parser usually constructs tree representing abstract syntax of
language
Craig Chambers 9 CSE 401 Craig Chambers 10 CSE 401
Fourth step: intermediate code generation Example
Given annotated AST & symbol tables, int Fac.ComputeFac(*? this, int num) {
translate into lower-level intermediate code int T1, numAux, T8, T3, T7, T2, T6, T0;
T0 := 1;
Intermediate code is a separate language T1 := num < T0;
• Source-language independent ifnonzero T1 goto L0;
• Target-machine independent T2 := 1;
T3 := num - T2;
T6 := Fac.ComputeFac(this, T3);
Intermediate code is simple and regular T7 := num * T6;
⇒ good representation for doing optimizations numAux := T7;
goto L2;
Mightbeareasonabletargetlanguageitself,e.g.Javabytecode label L0;
T8 := 1;
numAux := T8;
label L2;
return numAux;
}
Craig Chambers 11 CSE 401 Craig Chambers 12 CSE 401
Fifth step: target (machine) code generation Summary of compiler phases
Analysis Synthesis
Translate intermediate code into target code of input program of output program
(front-end) (back-end)
Need to do: character
• instruction selection: choose target instructions for stream intermediate
(subsequences of) intermediate code instructions form
• register allocation: allocate intermediate code variables to Lexical Analysis
machine registers, spilling excess to stack token Optimization
• compute layout of each procedure’s stack frame & stream
other run-time data structures Syntactic Analysis
• emit target code intermediate
abstract form
syntax
tree Code Generation
Semantic Analysis target
language
annotated
AST
Intermediate
Code Generation
Ideal: many front-ends, many back-ends sharing one
intermediate language
Craig Chambers 13 CSE 401 Craig Chambers 14 CSE 401
Other language processing tools Engineering issues
Compilers translate the input language into Compilers are hard to design so that they are
a different, usually lower-level, target language • fast
• highly optimizing
Interpreters directly execute the input language • extensible & evolvable
• same front-end structure as a compiler • correct
• then evaluate the annotated AST,
or translate to intermediate code and evaluate that Some parts of compilers can be automatically generated from
specifications, e.g., scanners, parsers, & target code
Software engineering tools can resemble compilers generators
• same front-end structure as a compiler • generated parts are fast & correct
• then: • specifications are easily evolvable
• pretty-print/reformat/colorize (Some of my current research is on generating fast, correct
• analyze to compute relationships like declarations/uses, optimizations from specifications.)
calls/callees, etc.
• analyze to find potential bugs Need good management of software complexity
• aid in refactoring/restructuring/evolving programs
Craig Chambers 15 CSE 401 Craig Chambers 16 CSE 401
no reviews yet
Please Login to review.