m (Inject moved page Phases of translation - C to c/phases of translation) |
m |
||
Line 1: | Line 1: | ||
− | A [[C]] program can consist of one or more files; the text of the program is kept in units called ''[[source file]]s''. The '''phases of translation''' are a series of steps a translator, or compiler, must go through to convert a source file into an executable program. During these phases, the source file gets converted into a | + | {{c title|Phases of Translation}} |
+ | A [[C]] program can consist of one or more files; the text of the program is kept in units called ''[[source file]]s''. The '''phases of translation''' are a series of steps a translator, or compiler, must go through to convert a source file into an executable program. During these phases, the source file gets converted into a {{C|preprocessing translation unit}}, then into a {{C|translation unit}}, and finally into an [[executable program]]. It is also possible to translate individual units separately and then later link them to produce an executable program.<ref>ISO/IEC 9899:2011 §5.1.1.1 p1</ref> | ||
== Translation phases == | == Translation phases == | ||
Line 16: | Line 17: | ||
During the first phase of translation, the physical source file is mapped to the source character set in an implementation-defined manner. For example, the compiler may choose to interpret the source as UTF-8 or simply as ASCII and convert it to the implementation's internal source representation if necessary.<ref>ISO/IEC 9899:2011 §5.1.1.2 p1.1</ref> | During the first phase of translation, the physical source file is mapped to the source character set in an implementation-defined manner. For example, the compiler may choose to interpret the source as UTF-8 or simply as ASCII and convert it to the implementation's internal source representation if necessary.<ref>ISO/IEC 9899:2011 §5.1.1.2 p1.1</ref> | ||
− | In addition to the character mapping; | + | In addition to the character mapping; {{C|trigraphs|trigraph sequences}} are replaced by corresponding single-character internal representations. For example: |
<source lang="C"> | <source lang="C"> | ||
Line 62: | Line 63: | ||
== Tokenization == | == Tokenization == | ||
− | In the third phase of translation, the | + | In the third phase of translation, the {{C|preprocessor}} tokenizes the source file into preprocessing [[token]]s and sequences of whitespace characters. Comments are placed by a single [[whitespace]] character and [[new-line]] characters are retained.<ref>ISO/IEC 9899:2011 §5.1.1.2 p1.3</ref> |
== Preprocessing == | == Preprocessing == | ||
− | During this stage all processing directives are executed, macro invocations are expanded, and the | + | During this stage all processing directives are executed, macro invocations are expanded, and the {{C|_Pragma}} operator expressions are executed. Any included file is processed from phase 1 through phase 4, recursively. By the conclusion of this phase, all preprocessing directives are deleted.<ref>ISO/IEC 9899:2011 §5.1.1.2 p1.4</ref> |
== Character-set mapping == | == Character-set mapping == | ||
− | In the fifth phase of translation, each source character set member and | + | In the fifth phase of translation, each source character set member and {{C|escape sequence}} in character constants and string literals are converted to their corresponding execution character set member. Whenever that's not possible, the character is convert in an implementation-defined manner to some character other than null.<ref>ISO/IEC 9899:2011 §5.1.1.2 p1.5</ref> |
== String concatenation == | == String concatenation == | ||
− | In this phase, all adjacent | + | In this phase, all adjacent {{C|string literal}} tokens are [[concatenation|concatenated]]. For example: <code>"A" "B" C"</code> becomes <code>"ABC"</code> and <code>"A" u"B" "C"</code> becomes <code>u"ABC"</code><ref>ISO/IEC 9899:2011 §5.1.1.2 p1.6</ref> |
== Translation == | == Translation == | ||
− | In the seventh phase, all [[whitespace]] characters separating tokens becomes insignificant. Every | + | In the seventh phase, all [[whitespace]] characters separating tokens becomes insignificant. Every {{C|preprocessing token}} is converted into a token. Tokens are syntactically and semantically analyzed and translated as a translation unit.<ref>ISO/IEC 9899:2011 §5.1.1.2 p1.7</ref> |
== Linkage == | == Linkage == |
Revision as of 09:05, 4 January 2015
A C program can consist of one or more files; the text of the program is kept in units called source files. The phases of translation are a series of steps a translator, or compiler, must go through to convert a source file into an executable program. During these phases, the source file gets converted into a preprocessing translation unit, then into a translation unit, and finally into an executable program. It is also possible to translate individual units separately and then later link them to produce an executable program.[1]
Contents
Translation phases
The latest C standard, C11, specifies eight translation phases:
- Character mapping
- Line splicing
- Tokenization
- Preprocessing
- Character-set mapping
- String concatenation
- Translation
- Linkage
Character mapping
During the first phase of translation, the physical source file is mapped to the source character set in an implementation-defined manner. For example, the compiler may choose to interpret the source as UTF-8 or simply as ASCII and convert it to the implementation's internal source representation if necessary.[2]
In addition to the character mapping; trigraph sequences are replaced by corresponding single-character internal representations. For example:
#include <stdio.h>
int main()??<
char hello??(??) = "Hello World!";
puts(hello);
return 0;
??>
Becomes
#include <stdio.h>
int main(){
char hello[] = "Hello World!";
puts(hello);
return 0;
}
Line splicing
During the second phase of translation, any instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing the physical source lines to form logical source lines.[3] For example
#include <stdio.h>
int main() {
p\
u\
t\
s("Hello World");
return 0;
}
becomes:
int main() {
puts("Hello World");
return 0;
}
Tokenization
In the third phase of translation, the preprocessor tokenizes the source file into preprocessing tokens and sequences of whitespace characters. Comments are placed by a single whitespace character and new-line characters are retained.[4]
Preprocessing
During this stage all processing directives are executed, macro invocations are expanded, and the _Pragma operator expressions are executed. Any included file is processed from phase 1 through phase 4, recursively. By the conclusion of this phase, all preprocessing directives are deleted.[5]
Character-set mapping
In the fifth phase of translation, each source character set member and escape sequence in character constants and string literals are converted to their corresponding execution character set member. Whenever that's not possible, the character is convert in an implementation-defined manner to some character other than null.[6]
String concatenation
In this phase, all adjacent string literal tokens are concatenated. For example: "A" "B" C"
becomes "ABC"
and "A" u"B" "C"
becomes u"ABC"
[7]
Translation
In the seventh phase, all whitespace characters separating tokens becomes insignificant. Every preprocessing token is converted into a token. Tokens are syntactically and semantically analyzed and translated as a translation unit.[8]
Linkage
In the final phase of translation, all external object and function references are resolved. Library components are linked to resolve all external references to functions and objects. All translation units are collected together into a single program image which contains the necessary information needed for execution in its execution environment.[9]
References
- ↑ ISO/IEC 9899:2011 §5.1.1.1 p1
- ↑ ISO/IEC 9899:2011 §5.1.1.2 p1.1
- ↑ ISO/IEC 9899:2011 §5.1.1.2 p1.2
- ↑ ISO/IEC 9899:2011 §5.1.1.2 p1.3
- ↑ ISO/IEC 9899:2011 §5.1.1.2 p1.4
- ↑ ISO/IEC 9899:2011 §5.1.1.2 p1.5
- ↑ ISO/IEC 9899:2011 §5.1.1.2 p1.6
- ↑ ISO/IEC 9899:2011 §5.1.1.2 p1.7
- ↑ ISO/IEC 9899:2011 §5.1.1.2 p1.8