Understanding C Language-3: Compilation & Linkng 中文版
Intro
Building upon the previous article, which explained how C language manipulates data through statements, this article continues to adhere to the principle of addressing the “why”. It begins with the end goal in mind, starting from the memory image required by the operating system to execute a program. From there, it works backwards to deduce the tasks that the compilation and linking subsystem must perform to bridge the gap between the source code and the final executable file.
Memory Image of one Program
From the operating system’s perspective, executing a program is a seemingly straightforward process: load the required set of machine instructions and associated data into memory, then jump to the specified entry point then start execution. Over!
However, this simplicity relies on several underlying prerequisites:
- The requirement is for machine instructions, not the original C language code.
- The jumps and references between these instructions and data must form a complete and correct structure.
- To avoid redundant work, the obtained instructions and data should be persisted in a format that is most convenient for loading into memory.
- A single, unique entry point must be designated.
This implies that before C code can be executed, it must be translated into a specific format of an executable file which should be as similar as possible to the memory image obtained after it is loaded into memory
Before discussing the memory image, here is a brief introduction to virtual memory:
Virtual Memory
Virtual memory is a key mechanism designed by the operating system for managing memory resources, sitting between the physical memory and applications. Its existence offers significant advantages:
- Isolates the address space of applications
Each application corresponds to an independent virtual memory space (e.g., a 4GB space for 32-bit addresses). This greatly simplifies the design of a program’s memory layout, as it only needs to consider its own space. - Overcomes the size limitations of physical memory
Since the memory required by a program at any given time is only a subset of its virtual address space, the operating system ensures that only the hot pages of each process reside in physical memory through paging and page reclamation mechanisms. This allows multiple programs to run normally even when physical memory is much smaller than the total virtual memory allocated (although frequent paging can impact performance). - Saves physical memory overhead
The Copy-on-Write (CoW) mechanism can map the same physical memory pages into the virtual address spaces of multiple processes. This saves substantial physical memory overhead and forms the basis for dynamic shared library technology.
Memory Image
A memory image refers to the concrete distribution of program code and data within the virtual address space.The following diagram (Figure 1), borrowed from Computer Systems: A Programmer’s Perspective
, illustrates its general structure:
The key points:
- Segmentation
The operating system manages the virtual address space in segments. Different segments possess different permissions (Read/Write/Execute), which enhances security.
Data within a segment can either be loaded from the executable file (e.g., the code segment .text, data segments .data/.bss), reserved for program execution-dependent data structures (the stack), or allocated dynamically during program runtime (the heap). - Segment Distribution Follows Platform-Specific ABI
- The entire address space is divided into a part usable by regular programs and a part reserved for kernel use (this isolation improves security).
- The starting address of the code segment is fixed (which may vary across different ABIs), and the data segment is allocated immediately following the code segment.
- The stack grows from high addresses towards lower addresses.
- The heap grows from a specific address towards higher addresses.
Engineering Efficiency
Laziness Drives Innovation.
Now, let’s assume we are the designers of the C programming language. We have already established the syntax of C and defined the required memory image for programs to execute on a specific platform.
Building upon this foundation, we decide to develop a program—let’s tentatively call it the “Terminator”—that can directly translate C source code into the desired memory image.
The executable file generated by the Terminator contains the following sections:
- A data section
- A code section (.text)
Global variables and functions are referenced via offsets, and the file has a fixed entry point. - The absolute starting addresses of the data and code sections within the virtual address space.
When the program is run, the operating system loads these sections into memory at the specified addresses and begins execution directly.
Initially, everything seemed perfect, and everyone was satisfied.
However, as time passed, efficiency issues began to emerge:
On one hand, individual C source files were becoming increasingly large, leading to correspondingly longer processing times for the Terminator.
On the other hand, functionally redundant functions were appearing more frequently across different C codebases and projects.
These problems stemmed from the increasing scale of software development. Consequently, we needed to seek a solution at the fundamental level.
Splitting Strategy
After analysis, it becomes evident that the project can be divided into different modules based on functionality, which can be maintained separately and reused across different projects.
This is a classic application of the divide-and-conquer strategy. The “Terminator” can process individual source files that have changed, effectively solving the aforementioned problems in one fell swoop.
However, this splitting strategy introduces new challenges:
We refer to variables and functions as symbols.
- Global symbols that originally belonged to the same file may now have their definitions and references distributed across different files. How can the Terminator write the address of a symbol from another file into the code section of one file?
- When combining different files from the same project, how does one coordinate the addresses of code sections belonging to different files?
- How can we fully utilize the Copy-on-Write (CoW) mechanism provided by the operating system to reduce the memory footprint of function libraries?
These questions all point to one single solution that the work of the Terminator needs to be split into two distinct phases:
- Compilation
Translates C source files into relocatable files containing a symbol table where symbol addresses are offsets relative to their section. - Linking
Merges multiple relocatable files, analyzes and determines the final address of each symbol definition, and overwrites all references to that symbol with its defined address.
Interface Strategy
Following the evolution of the separate compilation and linking processes, we observe that the compilation phase essentially only requires declarations (providing type information) for external symbols (those not defined within the current module).
The essence of these declarations is an application of an interface strategy, which separates the potentially changing code implementation from the stable declarations, thereby improving project development efficiency
From Source Code to Executable
Based on the previous explanation, Figure 2 illustrates the stages that C language source code needs to undergo in reality before execution:
Stages
Next, we will explain each stage in Figure 2 step by step. For clarity, we will use the following source code as example:
1 | // compile_link.c |
Preprocessing
The preprocessing stage performs text substitution operations on the source file. Its input is C source code, and its output remains C source code.
Directive include
1 |
Through the #include directive, the entire content of the a.hfile can be inserted at the current position.
The interface strategy mentioned above encourages placing a module’s corresponding declarations and constants into a separate header file (typically with a .hextension) for distribution.
We runs the following script:
1 | gcc -E compile_link.c -o compile_link.i |
then get:
1 | # 1 "compile_link.c" |
The #include directive works exactly as expected. It’s important to note that the content included by #includeis not limited to just declarations.
Directive define
1 |
The #define directive performs macro substitution, which can be simply understood as a regular text replacement.
Besides constants, macro functions can also be defined to eliminate function call overhead:
1 |
|
After preprocessing,
1 | int main() { |
Note that there is one space after 1. Therefore, to ensure correct semantics, macro definitions are generally enclosed in parentheses.
1 |
Since macros process code, macro functions can also be used as code templates to generate C source code. Here is a technique for initializing an array:
1 | // data.def |
Preprocessing includes other directives, such as conditional compilation (#ifdef, #ifndef, #endif, etc.)
, which will not be elaborated on here.
Compilation
The compilation stage is responsible for translating C source code into assembly code for the target platform. Different target platforms will yield different results.
1 | // compile_link3.c |
Execute the following code:
1 | gcc -S compile_link3.c -o compile_link3.s |
The assembly code obtained from compilation (the .s extension stands for source) is as follows:
1 | .file "compile_link3.c" ; logical filename |
All strings ending with a colon : are labels. Among them, labels starting with .Lare local labels used by the assembler .
All instructions starting with a period . are directives used to guide the subsequent operations of the assembler and linker。
It is particularly important to note that instructions starting with .cfi record Call Frame Information (CFI) in DWARF format.
This information will be collected by the assembler into the .eh_frame section (introduced by the C++ ABI for exception handling) or into sections starting with .debug for use by debuggers.
The assembly code obtained from compiling another source file is as follows:
1 | // compile_link |
It can be observed that both global variables (initialized/uninitialized) and functions are symbols. Furthermore, global variables are referenced in the code section via an offset relative to the %rip register.
Assembly
The objective of the assembly stage is to generate a corresponding relocatable object file for each source file. Object files can have various formats, with ELF (Executable and Linkable Format) being the most commonly used on Linux.
The typical structure of an ELF object file is shown in Figure 3:
The most critical components directly serving the linking process are the Section Header Table, the Symbol Table (.symtab), and the Relocation Entries (.rel.text/.rel.data).
The .data and .bss sections contain the corresponding data.
The .text section contains the machine code directly translated from the assembly code generated in the previous stage.
1 | // compile_link3.o |
The .eh_frame section contains the Call Frame Information (CFI) used for stack unwinding during debugging or exception handling.
Section Header Table
The Section Header Table records the size and location (relative offset within the file) of each section.
Run the following script:
1 | gcc -c compile_link3.c -o compile_link3.o |
compile and check output:
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000014 0000000000000000 0000000000000000 00000040 20
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000000 0000000000000000 0000000000000000 00000054 20
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 0000000000000000 0000000000000000 00000054 20
ALLOC
3 .comment 0000002e 0000000000000000 0000000000000000 00000054 20
CONTENTS, READONLY
4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000082 20
CONTENTS, READONLY
5 .eh_frame 00000038 0000000000000000 0000000000000000 00000088 23
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
Symbol Table
The Symbol Table records all symbols present in the file (excluding local variable symbols, which do not require relocation). Its key attributes include:
- *Symbol Name
Includes section names and the file name itself, which are also entries in the symbol table. - Type
Indicates the symbol’s type, such as data, function, file name, or section name. - Defining Section
Points to the section to which the symbol belongs. It also includes three special virtual sections:- ABS(Symbols that do not require relocation)
- COMM(Represents uninitialized global variables)
- UNDEF(Represents externally referenced symbols)
- Value(Address)
The offset of the symbol relative to the start of its defining section.
The symbol table for compile_link3.o is as follows:
Symbol table ‘.symtab’ contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS compile_link3.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 6
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 5
8: 0000000000000000 20 FUNC GLOBAL DEFAULT 1 extern_func3
9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND global_b
he symbol table for compile_link.o is as follows:
Symbol table ‘.symtab’ contains 18 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS compile_link.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000008 4 OBJECT LOCAL DEFAULT 3 static_e
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 5
9: 0000000000000000 12 FUNC GLOBAL DEFAULT 1 extern_func
10: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 extern_f
11: 0000000000000004 4 OBJECT GLOBAL DEFAULT 3 global_a
12: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM global_b
13: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM global_c
14: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM global_d
15: 000000000000000c 14 FUNC GLOBAL DEFAULT 1 global_func
16: 000000000000001a 93 FUNC GLOBAL DEFAULT 1 main
17: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND extern_func3
可以发现未初始化符号的值为对齐边界。
Relocation Entries
Since the final absolute addresses of each section are determined during the linking process, the symbol definitions and references within each section need to be modified accordingly.
Relocation entries list all these definitions and references that require adjustment. Note that there may be multiple references to the same symbol within a single section.
The relocation entries for compile_link.o are as follows:
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000009 R_X86_64_PC32 global_b-0x0000000000000004
RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_PC32 .text
The relocation entries for compile_link3.o are as follows:
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
000000000000002f R_X86_64_PC32 global_func-0x0000000000000004
000000000000003c R_X86_64_PC32 extern_func-0x0000000000000004
0000000000000049 R_X86_64_PC32 extern_func3-0x0000000000000004
000000000000004f R_X86_64_PC32 global_b-0x0000000000000004
0000000000000059 R_X86_64_PC32 extern_func-0x0000000000000004
000000000000005f R_X86_64_PC32 global_c-0x0000000000000004
0000000000000065 R_X86_64_PC32 extern_f-0x0000000000000004
000000000000006b R_X86_64_PC32 global_d-0x0000000000000004
0000000000000071 R_X86_64_PC32 .data+0x0000000000000004
RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_PC32 .text
0000000000000040 R_X86_64_PC32 .text+0x000000000000000c
0000000000000060 R_X86_64_PC32 .text+0x000000000000001a
Descriptions of Fields:
- OFFSET
The offset within the corresponding section to which this entry applies. - TYPE
- R_X86_64_PC32
Uses a PC-relative address. - R_X86_64_32
Uses an absolute address.
- R_X86_64_PC32
- VALUE
The expression used to calculate the specific address.
Linking
It can be observed that the value for the uninitialized symbol is an alignment boundary.
Linking into an Executable
The executable file incorporates most of the sections from the relocatable format while introducing the concept of segments (each segment packages together specific sections from the object files). It also includes the glibc glue code provided by the operating system, which handles program startup and termination.
The work performed by the linker can be divided into three steps:
- Merge sections of the same type* from all source object files (including referenced static/dynamic libraries) to form corresponding segments. The starting addresses of these segments are determined according to the platform’s ABI
- Analyze, combine, and deduplicate the symbol tables* from all source object files. Modify the value of each symbol to its defined absolute address
- Traverse all relocation entries* and modify all address references within the data and code sections
After run the following command this process yields the final executable file. Let’s examine the changes in its various parts.
1 | gcc compile_link3.c compile_link.c -o compile_link |
First, we verify that the various sections have been assigned new addresses.
Key contents of Program Header:PHDR off 0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 23
filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 221
filesz 0x00000000000007b4 memsz 0x00000000000007b4 flags r-x
LOAD off 0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
filesz 0x0000000000000228 memsz 0x0000000000000238 flags rw-Key contents of Section Table:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 0000000000400238 0000000000400238 00000238 20
CONTENTS, ALLOC, LOAD, READONLY, DATA
…
4 .dynsym 00000048 00000000004002b8 00000000004002b8 000002b8 23
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynstr 00000038 0000000000400300 0000000000400300 00000300 20
CONTENTS, ALLOC, LOAD, READONLY, DATA
…
10 .init 0000001a 00000000004003a8 00000000004003a8 000003a8 22
CONTENTS, ALLOC, LOAD, READONLY, CODE
11 .plt 00000030 00000000004003d0 00000000004003d0 000003d0 24
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .text 000001f2 0000000000400400 0000000000400400 00000400 24
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .fini 00000009 00000000004005f4 00000000004005f4 000005f4 22
CONTENTS, ALLOC, LOAD, READONLY, CODE
14 .rodata 00000010 0000000000400600 0000000000400600 00000600 23
CONTENTS, ALLOC, LOAD, READONLY, DATA
15 .eh_frame_hdr 0000004c 0000000000400610 0000000000400610 00000610 22
CONTENTS, ALLOC, LOAD, READONLY, DATA
16 .eh_frame 00000154 0000000000400660 0000000000400660 00000660 23
CONTENTS, ALLOC, LOAD, READONLY, DATA
17 .init_array 00000008 0000000000600e10 0000000000600e10 00000e10 23
CONTENTS, ALLOC, LOAD, DATA
18 .fini_array 00000008 0000000000600e18 0000000000600e18 00000e18 23
CONTENTS, ALLOC, LOAD, DATA
19 .jcr 00000008 0000000000600e20 0000000000600e20 00000e20 23
CONTENTS, ALLOC, LOAD, DATA
20 .dynamic 000001d0 0000000000600e28 0000000000600e28 00000e28 23
CONTENTS, ALLOC, LOAD, DATA
21 .got 00000008 0000000000600ff8 0000000000600ff8 00000ff8 23
CONTENTS, ALLOC, LOAD, DATA
22 .got.plt 00000028 0000000000601000 0000000000601000 00001000 23
CONTENTS, ALLOC, LOAD, DATA
23 .data 00000010 0000000000601028 0000000000601028 00001028 22
CONTENTS, ALLOC, LOAD, DATA
24 .bss 00000010 0000000000601038 0000000000601038 00001038 22
ALLOC
25 .comment 0000002d 0000000000000000 0000000000000000 00001038 2**0
CONTENTS, READONLYEIt can be confirmed that .text is allocated at 0x400400, .data at 0x601028, and .bss follows immediately after. Besides these familiar sections, many unfamiliar ones appear (e.g., .init, .got, .got.plt, etc.), which we will set aside for now.
Second, we confirm that the symbol table has been merged and symbol values have been rewritten.
Key contents of the Symbol Table:Symbol table ‘.dynsym’ contains 3 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND libc_start_main@GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_startSymbol table ‘.symtab’ contains 72 entries:
Num: Value Size Type Bind Vis Ndx Name
…
27: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
28: 0000000000600e20 0 OBJECT LOCAL DEFAULT 20 JCR_LIST__
29: 0000000000400430 0 FUNC LOCAL DEFAULT 13 deregister_tm_clones // 2nd in .text
30: 0000000000400460 0 FUNC LOCAL DEFAULT 13 register_tm_clones // 3rd in .text
31: 00000000004004a0 0 FUNC LOCAL DEFAULT 13 do_global_dtors_aux // 4th in .text
32: 0000000000601038 1 OBJECT LOCAL DEFAULT 25 completed.6355
33: 0000000000600e18 0 OBJECT LOCAL DEFAULT 19 __do_global_dtors_aux_fin
34: 00000000004004c0 0 FUNC LOCAL DEFAULT 13 frame_dummy // 5th in .text
35: 0000000000600e10 0 OBJECT LOCAL DEFAULT 18 __frame_dummy_init_array_
36: 0000000000000000 0 FILE LOCAL DEFAULT ABS compile_link.c
37: 0000000000601034 4 OBJECT LOCAL DEFAULT 24 static_e
38: 0000000000000000 0 FILE LOCAL DEFAULT ABS compile_link3.c
39: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
40: 00000000004007b0 0 OBJECT LOCAL DEFAULT 17 __FRAME_END
41: 0000000000600e20 0 OBJECT LOCAL DEFAULT 20 __JCR_END
42: 0000000000000000 0 FILE LOCAL DEFAULT ABS
43: 0000000000600e18 0 NOTYPE LOCAL DEFAULT 18 init_array_end
44: 0000000000600e28 0 OBJECT LOCAL DEFAULT 21 DYNAMIC
45: 0000000000600e10 0 NOTYPE LOCAL DEFAULT 18 __init_array_start
46: 0000000000400610 0 NOTYPE LOCAL DEFAULT 16 __GNU_EH_FRAME_HDR
47: 0000000000601000 0 OBJECT LOCAL DEFAULT 23 _GLOBAL_OFFSET_TABLE
48: 00000000004005f0 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
49: 000000000060103c 4 OBJECT GLOBAL DEFAULT 25 global_b
50: 0000000000601040 4 OBJECT GLOBAL DEFAULT 25 global_d
51: 0000000000601028 0 NOTYPE WEAK DEFAULT 24 data_start
52: 0000000000601030 4 OBJECT GLOBAL DEFAULT 24 global_a
53: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 24 _edata
54: 00000000004004ed 12 FUNC GLOBAL DEFAULT 13 extern_func // 6th in .text
55: 00000000004005f4 0 FUNC GLOBAL DEFAULT 14 _fini
56: 000000000060102c 4 OBJECT GLOBAL DEFAULT 24 extern_f
57: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_
58: 0000000000601028 0 NOTYPE GLOBAL DEFAULT 24 __data_start
59: 0000000000400564 20 FUNC GLOBAL DEFAULT 13 extern_func3 // 9th in .text
60: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start
61: 0000000000400608 0 OBJECT GLOBAL HIDDEN 15 __dso_handle
62: 0000000000400600 4 OBJECT GLOBAL DEFAULT 15 IO_stdin_used
63: 0000000000601044 4 OBJECT GLOBAL DEFAULT 25 global_c
64: 0000000000400580 101 FUNC GLOBAL DEFAULT 13 __libc_csu_init
65: 0000000000601048 0 NOTYPE GLOBAL DEFAULT 25 _end
66: 0000000000400400 0 FUNC GLOBAL DEFAULT 13 start // 1st in .text
67: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 25 __bss_start
68: 0000000000400507 93 FUNC GLOBAL DEFAULT 13 main // 8th in .text
69: 0000000000601038 0 OBJECT GLOBAL HIDDEN 24 __TMC_END
70: 00000000004003a8 0 FUNC GLOBAL DEFAULT 11 _init
71: 00000000004004f9 14 FUNC GLOBAL DEFAULT 13 global_func // 7th in .textIgnoring the unfamiliar symbols, we can see that the symbols from the previous object file’s symbol table now possess absolute addresses. Symbols defined by us no longer have the UNDEF section index.
Finally, we confirm that the references within the code and data sections have been modified:
The disassembled result of the main code section is as follows:main:
400507: 55 push %rbp
400508: 48 89 e5 mov %rsp,%rbp
40050b: 48 83 ec 10 sub $0x10,%rsp
40050f: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
400516: bf 01 00 00 00 mov $0x1,%edi
// 0xffffffd9 is complement of -39; 0x400520 - 39 = 0x4004f9
40051b: e8 d9 ff ff ff callq 4004f9
400520: 89 45 f8 mov %eax,-0x8(%rbp)
400523: bf 04 00 00 00 mov $0x4,%edi
400528: e8 c0 ff ff ff callq 4004ed
40052d: 89 45 f4 mov %eax,-0xc(%rbp)
400530: bf 00 00 00 00 mov $0x0,%edi
400535: e8 2a 00 00 00 callq 400564
40053a: 89 05 fc 0a 20 00 mov %eax,0x200afc(%rip) # 60103c
400540: bf 03 00 00 00 mov $0x3,%edi
400545: e8 a3 ff ff ff callq 4004ed
40054a: 89 05 f4 0a 20 00 mov %eax,0x200af4(%rip) # 601044
400550: 8b 05 d6 0a 20 00 mov 0x200ad6(%rip),%eax # 60102c
400556: 89 05 e4 0a 20 00 mov %eax,0x200ae4(%rip) # 601040
// 0x200ad2 = 0x601034-0x400562
40055c: 8b 05 d2 0a 20 00 mov 0x200ad2(%rip),%eax # 601034
400562: c9 leaveq
400563: c3 retqObserving the Glue Code
By examining the ELF header, we find the entry point of the executable is 0x400400. This is the address of the _start code implanted by the linker.1
2
3
4
5
6
7
8
9
10
11
12
13
140000000000400400 <_start>:
400400: 31 ed xor %ebp,%ebp ; cleanup ebp
400402: 49 89 d1 mov %rdx,%r9 ; 6th arg: rtld_fini
400405: 5e pop %rsi ; 2nd arg: count of arguements
400406: 48 89 e2 mov %rsp,%rdx ; 3rd arg: list of argument
400409: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp ; align stack pointer by 16
40040d: 50 push %rax ; save %rax
40040e: 54 push %rsp ; 7th arg: stack_end
40040f: 49 c7 c0 f0 05 40 00 mov $0x4005f0,%r8 ; 5th arg: __libc_csu_fini
400416: 48 c7 c1 80 05 40 00 mov $0x400580,%rcx ; 4th arg: __libc_csu_init
40041d: 48 c7 c7 07 05 40 00 mov $0x400507,%rdi ; 1st arg: main
400424: e8 b7 ff ff ff callq 4003e0 <__libc_start_main@plt>
400429: f4 hlt
40042a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)Before this assembly code executes, the operating system initializes the program’s stack structure, as shown in Figure 4:
_start ultimately calls__libc_start_main,The source code for __libc_start_main is in glibc (csu/libc-start.c). It performs various preparatory and cleanup tasks before and after calling the main function, including:
- Register the dynamic linker cleanup function rtld_fini
1
__cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);
- Register the exit cleanup function __libc_csu_fini
1
__cxa_atexit ((void (*) (void *)) fini, NULL, NULL);
- Execute the initialization function __libc_csu_init
1
(*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
- Execute the main function
1
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
- Execute the exit functions
1
exit (result);
- Register the dynamic linker cleanup function rtld_fini
Linking into static lib
You can use the command to package multiple modules containing common functions (none of which contain the entry point main) into a static library libname.a. An executable file is then generated by linking against this static library. This avoids the overhead of repeatedly compiling the static library code.
1 | ar acs name.a a.o b.o |
When linking static libraries, note the following:
- Only the modules referenced by the source files are copied into the executable. Multiple executables referencing the same static library each possess their own copy of the module’s code.
- Copying is performed on a per-module basis. Even if a source file references only a single function from the static library, the entire module containing that function is copied in its entirety.
Linking into dynamic lib
You can also use the command to packet ackage the modules into a dynamic library (shared library).
1 | gcc -shared -fPIC -o liba.so a.o b.o |
When generating an executable by linking against a dynamic library, the executable only records the filenames of the shared libraries used (to facilitate loading the actual code during subsequent execution) and generates an indirect jump table (.got/.plt) containing the addresses of the referenced library functions. The complete linking is deferred until load time.
Using dynamic libraries offers additional advantages:
- Saves memory space
Only a single copy of the dynamic library code resides in memory. Upon loading, each executable maps this same code into its own virtual address space. - Enables code updates
As long as function prototypes remain unchanged, code can be updated by providing a newer version of the dynamic library without needing to recompile and relink the executables that use it.
The characteristic of dynamic libraries being loaded at different addresses requires them to be generated in PIC (Position-Independent Code) mode: within the code section, references to global symbols are set as offsets pointing to corresponding entries in the indirect jump table (.got). Before loading, the linker (or dynamic linker) fills the final address of the symbol into this jump table entry. This method of indirect referencing of global variables may incur a slight performance cost, but it ensures that the code section of the dynamic library does not require modification after being loaded into memory (modification would trigger Copy-on-Write (CoW), reverting to the state of having multiple copies)
Loading
The loading stage is responsible for loading the executable file and the required dynamic libraries into memory, completing any missing linkages, and initiating execution.
The dynamic libraries a program needs to link against can be viewed using the ldd command:
1 | ldd compile_link |
In practice, all C programs require linking against dynamic libraries because the __libc_start_main function, located at the _start entry point mentioned earlier, resides within the dynamic library libc.so.6.
The specific dynamic libraries are loaded through the following steps:
Load the Dynamic Loader(dynamic loader)
The dynamic linker (or loader) is responsible for loading other dynamic libraries. Its specific location is specified by the contents of the .interp section within the executable1
2
3
4readelf -p .interp compile_link
String dump of section '.interp':
[ 0] /lib64/ld-linux-x86-64.so.2The dynamic linker sequentially loads the code of other required dynamic libraries
During this process, the absolute addresses of global symbols/sections referenced by the dynamic linker’s own code are stored in the .dynamic section and indirectly referenced via .got[0]1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28readelf -d compile_link
Dynamic section at offset 0xe28 contains 24 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x4003a8 // _init
0x000000000000000d (FINI) 0x4005f4 // _fini
0x0000000000000019 (INIT_ARRAY) 0x600e10 // __frame_dummy_init_array_entry
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x600e18 // __do_global_dtors_aux_fini_array_entry
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x400298
0x0000000000000005 (STRTAB) 0x400300 // string used by dynamic loader
0x0000000000000006 (SYMTAB) 0x4002b8 // symbol used by dynamic loader
0x000000000000000a (STRSZ) 56 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x601000
0x0000000000000002 (PLTRELSZ) 48 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x400378
0x0000000000000007 (RELA) 0x400360
0x0000000000000008 (RELASZ) 24 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x400340
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x400338
0x0000000000000000 (NULL) 0x0The dynamic linker corrects (relocates) the absolute addresses of entries in the Global Offset Table (.got) of both the executable and the dynamic libraries.
Runtime Linking
Function Lazy Binding
A clever strategy for lazy binding of function addresses can be achieved through the intricate cooperation of the .got(Global Offset Table) and .plt(Procedure Linkage Table).
During the executable generation phase, the linking process creates corresponding .got and .plt entries with preset values. For example:
1 | Disassembly of section .plt: |
A call to __libc_start_main in the code section would jump to the address pointed to by .got[3]. During program load time, the dynamic linker rewrites .got[1]and .got[2]with their correct values (see comments below)
When the function __libc_start_main is called for the first time, it triggers the binding logic:
- jump to address(4003e6) saved in .got[3], The instruction at this address pushes a relocation offset (e.g., $0x0) onto the stack.
- It then jumps to .plt section,pushes the address referenced via .got[1] and finally jumps to the dynamic linker’s entry (whose address is stored in .got[2])
- dynamic loader resolves the actual address,and patches (overwrites) the corresponding entry in .got[3]with this resolved address
- execute the __libc_start_main function
Subsequent calls to __libc_start_mainwill directly jump to its actual address now stored in .got[3], bypassing the binding logic.
Runtime Symbol Resolution
Dynamic linking is the most powerful technique. You can even use the APIs provided by the dlfcn standard library to change the address of a symbol during program execution, effectively altering the logic of functions.
Reflection
Up to this point, we have comprehensively illustrated the entire lifecycle of a program, from source code to execution, and explained its step-by-step evolution process as thoroughly as possible.
Dynamic linking is a fascinating feature. Writing code that can be dynamically linked at runtime can endow statically compiled programs with some characteristics of interpreted languages. On the other hand, injecting dynamic libraries has also become a viable attack vector, warranting in-depth study.
References
- Randal E. Bryant, David R. O’Hallaron, (2015). Computer Systems: A Programmer’s Perspective
- x86_64-abi-0.99.pdf
- Robert W. Sebesta, (2016). Concepts of Programming Languages