Understanding C Language-3: Compilation & Linkng 中文版

Intro

Building upon the previous article, which explained how C language manipulates data through statements, this article continues to adhere to the principle of addressing the “why”. It begins with the end goal in mind, starting from the memory image required by the operating system to execute a program. From there, it works backwards to deduce the tasks that the compilation and linking subsystem must perform to bridge the gap between the source code and the final executable file.

Memory Image of one Program

From the operating system’s perspective, executing a program is a seemingly straightforward process: load the required set of machine instructions and associated data into memory, then jump to the specified entry point then start execution. Over!

However, this simplicity relies on several underlying prerequisites:

  1. The requirement is for machine instructions, not the original C language code.
  2. The jumps and references between these instructions and data must form a complete and correct structure.
  3. To avoid redundant work, the obtained instructions and data should be persisted in a format that is most convenient for loading into memory.
  4. A single, unique entry point must be designated.

This implies that before C code can be executed, it must be translated into a specific format of an executable file which should be as similar as possible to the memory image obtained after it is loaded into memory

Before discussing the memory image, here is a brief introduction to virtual memory:

Virtual Memory

Virtual memory is a key mechanism designed by the operating system for managing memory resources, sitting between the physical memory and applications. Its existence offers significant advantages:

  1. Isolates the address space of applications
    Each application corresponds to an independent virtual memory space (e.g., a 4GB space for 32-bit addresses). This greatly simplifies the design of a program’s memory layout, as it only needs to consider its own space.
  2. Overcomes the size limitations of physical memory
    Since the memory required by a program at any given time is only a subset of its virtual address space, the operating system ensures that only the hot pages of each process reside in physical memory through paging and page reclamation mechanisms. This allows multiple programs to run normally even when physical memory is much smaller than the total virtual memory allocated (although frequent paging can impact performance).
  3. Saves physical memory overhead
    The Copy-on-Write (CoW) mechanism can map the same physical memory pages into the virtual address spaces of multiple processes. This saves substantial physical memory overhead and forms the basis for dynamic shared library technology.

Memory Image

A memory image refers to the concrete distribution of program code and data within the virtual address space.The following diagram (Figure 1), borrowed from Computer Systems: A Programmer’s Perspective
, illustrates its general structure:
Figure.1 Memory Image of C Runtime

The key points:

  • Segmentation
    The operating system manages the virtual address space in segments. Different segments possess different permissions (Read/Write/Execute), which enhances security.
    Data within a segment can either be loaded from the executable file (e.g., the code segment .text, data segments .data/.bss), reserved for program execution-dependent data structures (the stack), or allocated dynamically during program runtime (the heap).
  • Segment Distribution Follows Platform-Specific ABI
    1. The entire address space is divided into a part usable by regular programs and a part reserved for kernel use (this isolation improves security).
    2. The starting address of the code segment is fixed (which may vary across different ABIs), and the data segment is allocated immediately following the code segment.
    3. The stack grows from high addresses towards lower addresses.
    4. The heap grows from a specific address towards higher addresses.

Engineering Efficiency

Laziness Drives Innovation.

Now, let’s assume we are the designers of the C programming language. We have already established the syntax of C and defined the required memory image for programs to execute on a specific platform.
Building upon this foundation, we decide to develop a program—let’s tentatively call it the “Terminator”—that can directly translate C source code into the desired memory image.
The executable file generated by the Terminator contains the following sections:

  • A data section
  • A code section (.text)
    Global variables and functions are referenced via offsets, and the file has a fixed entry point.
  • The absolute starting addresses of the data and code sections within the virtual address space.
    When the program is run, the operating system loads these sections into memory at the specified addresses and begins execution directly.

Initially, everything seemed perfect, and everyone was satisfied.

However, as time passed, efficiency issues began to emerge:
On one hand, individual C source files were becoming increasingly large, leading to correspondingly longer processing times for the Terminator.
On the other hand, functionally redundant functions were appearing more frequently across different C codebases and projects.

These problems stemmed from the increasing scale of software development. Consequently, we needed to seek a solution at the fundamental level.

Splitting Strategy

After analysis, it becomes evident that the project can be divided into different modules based on functionality, which can be maintained separately and reused across different projects.
This is a classic application of the divide-and-conquer strategy. The “Terminator” can process individual source files that have changed, effectively solving the aforementioned problems in one fell swoop.

However, this splitting strategy introduces new challenges:
We refer to variables and functions as symbols.

  1. Global symbols that originally belonged to the same file may now have their definitions and references distributed across different files. How can the Terminator write the address of a symbol from another file into the code section of one file?
  2. When combining different files from the same project, how does one coordinate the addresses of code sections belonging to different files?
  3. How can we fully utilize the Copy-on-Write (CoW) mechanism provided by the operating system to reduce the memory footprint of function libraries?

These questions all point to one single solution that the work of the Terminator needs to be split into two distinct phases:

  1. Compilation
    Translates C source files into relocatable files containing a symbol table where symbol addresses are offsets relative to their section.
  2. Linking
    Merges multiple relocatable files, analyzes and determines the final address of each symbol definition, and overwrites all references to that symbol with its defined address.

Interface Strategy

Following the evolution of the separate compilation and linking processes, we observe that the compilation phase essentially only requires declarations (providing type information) for external symbols (those not defined within the current module).
The essence of these declarations is an application of an interface strategy, which separates the potentially changing code implementation from the stable declarations, thereby improving project development efficiency

From Source Code to Executable

Based on the previous explanation, Figure 2 illustrates the stages that C language source code needs to undergo in reality before execution:
图2 C Progress of Compilation and Linking

Stages

Next, we will explain each stage in Figure 2 step by step. For clarity, we will use the following source code as example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// compile_link.c
#include "compile_link2.h"
#include "compile_link3.h"

int global_a = 1;
int global_b;
int global_c;
int global_d;
static int static_e = 1;

int global_func(int a);

int global_func(int a) {
return 1;
}

int main() {
int x = 1;
int y = global_func(1);
int z = extern_func(4);

global_b = extern_func3(0);
global_c = extern_func(3);
global_d = extern_f;
return static_e;
}

// compile_link2.h
int extern_func(int a) {
return a;
}

int extern_f = 10;

// compile_link3.h
int extern_func3(int a);

// compile_link3.c
extern int global_b;

int extern_func3(int a) {
return a + global_b;
}

Preprocessing

The preprocessing stage performs text substitution operations on the source file. Its input is C source code, and its output remains C source code.

Directive include

1
#include <a.h>

Through the #include directive, the entire content of the a.hfile can be inserted at the current position.
The interface strategy mentioned above encourages placing a module’s corresponding declarations and constants into a separate header file (typically with a .hextension) for distribution.

We runs the following script:

1
gcc -E compile_link.c -o compile_link.i

then get:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# 1 "compile_link.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "compile_link.c"
# 1 "compile_link2.h" 1
int extern_func(int a) {
return a;
}

int extern_f = 10;
# 2 "compile_link.c" 2
# 1 "compile_link3.h" 1
int extern_func3(int a);
# 3 "compile_link.c" 2

int global_a = 1;
int global_b;
int global_c;
int global_d;
static int static_e = 1;

int global_func(int a);

int global_func(int a) {
return 1;
}

int main() {
int x = 1;
int y = global_func(1);
int z = extern_func(4);

global_b = extern_func3(0);
global_c = extern_func(3);
global_d = extern_f;
return static_e;
}

The #include directive works exactly as expected. It’s important to note that the content included by #includeis not limited to just declarations.

Directive define

1
2
#define SYM a+b // replace all SYM with a+b
#define MACRO(a, b) a+b // replace all MACRO() with a+b

The #define directive performs macro substitution, which can be simply understood as a regular text replacement.
Besides constants, macro functions can also be defined to eliminate function call overhead:

1
2
3
4
5
#define MACRO(a, b) a+b

int main() {
int c = MACRO(1 ,2);
}

After preprocessing,

1
2
3
int main() {
int c = 1 +2;
}

Note that there is one space after 1. Therefore, to ensure correct semantics, macro definitions are generally enclosed in parentheses.

1
#define MACRO(a, b) ((a)+(b))

Since macros process code, macro functions can also be used as code templates to generate C source code. Here is a technique for initializing an array:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// data.def
DEFINEDATA (data1_attr1, data1_attr2, data1_attr3)
DEFINEDATA (data2_attr1, data2_attr2, data2_attr3)

// enum.c
#define DEFINEDATA(a1, a2, a3, a4) a1
attr1_type attr1_array[] = {
#include data.def
};

#define DEFINEDATA(a1, a2, a3, a4) a2
attr2_type attr2_array[] = {
#include data.def
};

Preprocessing includes other directives, such as conditional compilation (#ifdef, #ifndef, #endif, etc.)
, which will not be elaborated on here.

Compilation

The compilation stage is responsible for translating C source code into assembly code for the target platform. Different target platforms will yield different results.

1
2
3
4
5
6
// compile_link3.c
extern int global_b;

int extern_func3(int a) {
return a + global_b;
}

Execute the following code:

1
gcc -S compile_link3.c -o compile_link3.s

The assembly code obtained from compilation (the .s extension stands for source) is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
	.file	"compile_link3.c" ; logical filename
.text ; text section starts
.globl extern_func3 ; define global symbol
.type extern_func3, @function ; type of global symbol is function
extern_func3: ; label
.LFB0: ; local function start label
.cfi_startproc ; beginning of function that should have an entry in .eh_frame
pushq %rbp
.cfi_def_cfa_offset 16 ; CFA = current_location + 16 (previous rip + rbp)
.cfi_offset 6, -16 ; Previous value of register 6(rbp) is saved at CFA -16
movq %rsp, %rbp
.cfi_def_cfa_register 6 ; register 6(rbp) will be used for computing CFA
movl %edi, -4(%rbp)
movl global_b(%rip), %edx ; gloabl_b is symbol with value of offset address
movl -4(%rbp), %eax
addl %edx, %eax
popq %rbp
.cfi_def_cfa 7, 8 ; CFA = register 7(rsp) + 8 after pop rbp
ret
.cfi_endproc ; end of function
.LFE0: ; local function end label
.size extern_func3, .-extern_func3 ; size = current_location - label_extern_func3
.ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-39)" ; assembler tags
.section .note.GNU-stack,"",@progbits ; add section .note.GNU-stack progbits means contains data

All strings ending with a colon : are labels. Among them, labels starting with .Lare local labels used by the assembler .
All instructions starting with a period . are directives used to guide the subsequent operations of the assembler and linker。

It is particularly important to note that instructions starting with .cfi record Call Frame Information (CFI) in DWARF format.
This information will be collected by the assembler into the .eh_frame section (introduced by the C++ ABI for exception handling) or into sections starting with .debug for use by debuggers.

The assembly code obtained from compiling another source file is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
// compile_link
.file "compile_link.c"
.text ; text section starts
.globl extern_func
.type extern_func, @function
extern_func:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movl -4(%rbp), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size extern_func, .-extern_func
.globl extern_f
.data ; data section starts
.align 4 ; align by 4 bytes
.type extern_f, @object ; type of extern_f symbol is data
.size extern_f, 4
extern_f:
.long 10 ; value of extern_f
.globl global_a
.align 4
.type global_a, @object
.size global_a, 4
global_a:
.long 1 ; value of extern_f
.comm global_b,4,4 ; uninitialized data with 4 bytes size aligned by 4 bytes
.comm global_c,4,4
.comm global_d,4,4
.align 4
.type static_e, @object
.size static_e, 4
static_e:
.long 1
.text ; text section starts
.globl global_func
.type global_func, @function
global_func:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movl $1, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size global_func, .-global_func
.globl main
.type main, @function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp // allocate local variables
movl $1, -4(%rbp) // x = 1
movl $1, %edi // set argument a = 1 using edi
call global_func
movl %eax, -8(%rbp) // y = global_func(1)
movl $4, %edi // set argument a = 4 using edi
call extern_func
movl %eax, -12(%rbp) // z = extern_func(4)
movl $0, %edi // set argument a = 0 using edi
call extern_func3
movl %eax, global_b(%rip) // global_b = extern_func3(0)
movl $3, %edi // set argument a = 3 using edi
call extern_func
movl %eax, global_c(%rip) // global_c = extern_func(3)
movl extern_f(%rip), %eax
movl %eax, global_d(%rip) // global_d = extern_f
movl static_e(%rip), %eax // return static_e
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
.ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-39)"
.section .note.GNU-stack,"",@progbits

It can be observed that both global variables (initialized/uninitialized) and functions are symbols. Furthermore, global variables are referenced in the code section via an offset relative to the %rip register.

Assembly

The objective of the assembly stage is to generate a corresponding relocatable object file for each source file. Object files can have various formats, with ELF (Executable and Linkable Format) being the most commonly used on Linux.
The typical structure of an ELF object file is shown in Figure 3:
Figure.3 ELF Structure

The most critical components directly serving the linking process are the Section Header Table, the Symbol Table (.symtab), and the Relocation Entries (.rel.text/.rel.data).
The .data and .bss sections contain the corresponding data.
The .text section contains the machine code directly translated from the assembly code generated in the previous stage.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// compile_link3.o
0000000000000000 <extern_func3>:
push %rbp
mov %rsp,%rbp
mov %edi,-0x4(%rbp)
mov 0x0(%rip),%edx # d <extern_func3+0xd>
mov -0x4(%rbp),%eax
add %edx,%eax
pop %rbp
retq

// compile_link.o
000000000000001a <main>:
push %rbp
mov %rsp,%rbp
sub $0x10,%rsp
movl $0x1,-0x4(%rbp)
mov $0x1,%edi
callq 33 <main+0x19>
mov %eax,-0x8(%rbp)
mov $0x4,%edi
callq 40 <main+0x26>
mov %eax,-0xc(%rbp)
mov $0x0,%edi
callq 4d <main+0x33>
mov %eax,0x0(%rip) # 53 <main+0x39>
mov $0x3,%edi
callq 5d <main+0x43>
mov %eax,0x0(%rip) # 63 <main+0x49>
mov 0x0(%rip),%eax # 69 <main+0x4f>
mov %eax,0x0(%rip) # 6f <main+0x55>
mov 0x0(%rip),%eax # 75 <main+0x5b>
leaveq
retq

The .eh_frame section contains the Call Frame Information (CFI) used for stack unwinding during debugging or exception handling.

Section Header Table

The Section Header Table records the size and location (relative offset within the file) of each section.

Run the following script:

1
2
gcc -c compile_link3.c -o compile_link3.o
objdump -h compile_link3.o

compile and check output:

Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000014 0000000000000000 0000000000000000 00000040 20
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000000 0000000000000000 0000000000000000 00000054 2
0
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 0000000000000000 0000000000000000 00000054 20
ALLOC
3 .comment 0000002e 0000000000000000 0000000000000000 00000054 2
0
CONTENTS, READONLY
4 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000082 20
CONTENTS, READONLY
5 .eh_frame 00000038 0000000000000000 0000000000000000 00000088 2
3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Symbol Table

The Symbol Table records all symbols present in the file (excluding local variable symbols, which do not require relocation). Its key attributes include:

  • *Symbol Name
    Includes section names and the file name itself, which are also entries in the symbol table.
  • Type
    Indicates the symbol’s type, such as data, function, file name, or section name.
  • Defining Section
    Points to the section to which the symbol belongs. It also includes three special virtual sections:
    • ABS(Symbols that do not require relocation)
    • COMM(Represents uninitialized global variables)
    • UNDEF(Represents externally referenced symbols)
  • Value(Address)
    The offset of the symbol relative to the start of its defining section.

The symbol table for compile_link3.o is as follows:

Symbol table ‘.symtab’ contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS compile_link3.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 6
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 5
8: 0000000000000000 20 FUNC GLOBAL DEFAULT 1 extern_func3
9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND global_b

he symbol table for compile_link.o is as follows:

Symbol table ‘.symtab’ contains 18 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS compile_link.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000008 4 OBJECT LOCAL DEFAULT 3 static_e
6: 0000000000000000 0 SECTION LOCAL DEFAULT 6
7: 0000000000000000 0 SECTION LOCAL DEFAULT 7
8: 0000000000000000 0 SECTION LOCAL DEFAULT 5
9: 0000000000000000 12 FUNC GLOBAL DEFAULT 1 extern_func
10: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 extern_f
11: 0000000000000004 4 OBJECT GLOBAL DEFAULT 3 global_a
12: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM global_b
13: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM global_c
14: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM global_d
15: 000000000000000c 14 FUNC GLOBAL DEFAULT 1 global_func
16: 000000000000001a 93 FUNC GLOBAL DEFAULT 1 main
17: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND extern_func3

可以发现未初始化符号的值为对齐边界。

Relocation Entries

Since the final absolute addresses of each section are determined during the linking process, the symbol definitions and references within each section need to be modified accordingly.
Relocation entries list all these definitions and references that require adjustment. Note that there may be multiple references to the same symbol within a single section.

The relocation entries for compile_link.o are as follows:

RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000009 R_X86_64_PC32 global_b-0x0000000000000004
RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_PC32 .text

The relocation entries for compile_link3.o are as follows:

RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
000000000000002f R_X86_64_PC32 global_func-0x0000000000000004
000000000000003c R_X86_64_PC32 extern_func-0x0000000000000004
0000000000000049 R_X86_64_PC32 extern_func3-0x0000000000000004
000000000000004f R_X86_64_PC32 global_b-0x0000000000000004
0000000000000059 R_X86_64_PC32 extern_func-0x0000000000000004
000000000000005f R_X86_64_PC32 global_c-0x0000000000000004
0000000000000065 R_X86_64_PC32 extern_f-0x0000000000000004
000000000000006b R_X86_64_PC32 global_d-0x0000000000000004
0000000000000071 R_X86_64_PC32 .data+0x0000000000000004
RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_PC32 .text
0000000000000040 R_X86_64_PC32 .text+0x000000000000000c
0000000000000060 R_X86_64_PC32 .text+0x000000000000001a

Descriptions of Fields:

  • OFFSET
    The offset within the corresponding section to which this entry applies.
  • TYPE
    • R_X86_64_PC32
      Uses a PC-relative address.
    • R_X86_64_32
      Uses an absolute address.
  • VALUE
    The expression used to calculate the specific address.

Linking

It can be observed that the value for the uninitialized symbol is an alignment boundary.

Linking into an Executable​

The executable file incorporates most of the sections from the relocatable format while introducing the concept of segments (each segment packages together specific sections from the object files). It also includes the glibc glue code provided by the operating system, which handles program startup and termination.

The work performed by the linker can be divided into three steps:

  1. Merge sections of the same type* from all source object files (including referenced static/dynamic libraries) to form corresponding segments. The starting addresses of these segments are determined according to the platform’s ABI
  2. Analyze, combine, and deduplicate the symbol tables* from all source object files. Modify the value of each symbol to its defined absolute address
  3. Traverse all relocation entries* and modify all address references within the data and code sections

After run the following command this process yields the final executable file. Let’s examine the changes in its various parts.

1
gcc compile_link3.c compile_link.c -o compile_link
  • First, we verify that the various sections have been assigned new addresses.
    Key contents of Program Header

    PHDR off 0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 23
    filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
    LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2
    21
    filesz 0x00000000000007b4 memsz 0x00000000000007b4 flags r-x
    LOAD off 0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
    filesz 0x0000000000000228 memsz 0x0000000000000238 flags rw-

    Key contents of Section Table

    Idx Name Size VMA LMA File off Algn
    0 .interp 0000001c 0000000000400238 0000000000400238 00000238 20
    CONTENTS, ALLOC, LOAD, READONLY, DATA

    4 .dynsym 00000048 00000000004002b8 00000000004002b8 000002b8 2
    3
    CONTENTS, ALLOC, LOAD, READONLY, DATA
    5 .dynstr 00000038 0000000000400300 0000000000400300 00000300 20
    CONTENTS, ALLOC, LOAD, READONLY, DATA

    10 .init 0000001a 00000000004003a8 00000000004003a8 000003a8 2
    2
    CONTENTS, ALLOC, LOAD, READONLY, CODE
    11 .plt 00000030 00000000004003d0 00000000004003d0 000003d0 24
    CONTENTS, ALLOC, LOAD, READONLY, CODE
    12 .text 000001f2 0000000000400400 0000000000400400 00000400 2
    4
    CONTENTS, ALLOC, LOAD, READONLY, CODE
    13 .fini 00000009 00000000004005f4 00000000004005f4 000005f4 22
    CONTENTS, ALLOC, LOAD, READONLY, CODE
    14 .rodata 00000010 0000000000400600 0000000000400600 00000600 2
    3
    CONTENTS, ALLOC, LOAD, READONLY, DATA
    15 .eh_frame_hdr 0000004c 0000000000400610 0000000000400610 00000610 22
    CONTENTS, ALLOC, LOAD, READONLY, DATA
    16 .eh_frame 00000154 0000000000400660 0000000000400660 00000660 2
    3
    CONTENTS, ALLOC, LOAD, READONLY, DATA
    17 .init_array 00000008 0000000000600e10 0000000000600e10 00000e10 23
    CONTENTS, ALLOC, LOAD, DATA
    18 .fini_array 00000008 0000000000600e18 0000000000600e18 00000e18 2
    3
    CONTENTS, ALLOC, LOAD, DATA
    19 .jcr 00000008 0000000000600e20 0000000000600e20 00000e20 23
    CONTENTS, ALLOC, LOAD, DATA
    20 .dynamic 000001d0 0000000000600e28 0000000000600e28 00000e28 2
    3
    CONTENTS, ALLOC, LOAD, DATA
    21 .got 00000008 0000000000600ff8 0000000000600ff8 00000ff8 23
    CONTENTS, ALLOC, LOAD, DATA
    22 .got.plt 00000028 0000000000601000 0000000000601000 00001000 2
    3
    CONTENTS, ALLOC, LOAD, DATA
    23 .data 00000010 0000000000601028 0000000000601028 00001028 22
    CONTENTS, ALLOC, LOAD, DATA
    24 .bss 00000010 0000000000601038 0000000000601038 00001038 2
    2
    ALLOC
    25 .comment 0000002d 0000000000000000 0000000000000000 00001038 2**0
    CONTENTS, READONLYE

    It can be confirmed that .text is allocated at 0x400400, .data at 0x601028, and .bss follows immediately after. Besides these familiar sections, many unfamiliar ones appear (e.g., .init, .got, .got.plt, etc.), which we will set aside for now.

  • Second, we confirm that the symbol table has been merged and symbol values have been rewritten.
    Key contents of the Symbol Table:

    Symbol table ‘.dynsym’ contains 3 entries:
    Num: Value Size Type Bind Vis Ndx Name
    0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
    1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND libc_start_main@GLIBC_2.2.5 (2)
    2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start

    Symbol table ‘.symtab’ contains 72 entries:
    Num: Value Size Type Bind Vis Ndx Name

    27: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
    28: 0000000000600e20 0 OBJECT LOCAL DEFAULT 20 JCR_LIST__
    29: 0000000000400430 0 FUNC LOCAL DEFAULT 13 deregister_tm_clones // 2nd in .text
    30: 0000000000400460 0 FUNC LOCAL DEFAULT 13 register_tm_clones // 3rd in .text
    31: 00000000004004a0 0 FUNC LOCAL DEFAULT 13 do_global_dtors_aux // 4th in .text
    32: 0000000000601038 1 OBJECT LOCAL DEFAULT 25 completed.6355
    33: 0000000000600e18 0 OBJECT LOCAL DEFAULT 19 __do_global_dtors_aux_fin
    34: 00000000004004c0 0 FUNC LOCAL DEFAULT 13 frame_dummy // 5th in .text
    35: 0000000000600e10 0 OBJECT LOCAL DEFAULT 18 __frame_dummy_init_array_
    36: 0000000000000000 0 FILE LOCAL DEFAULT ABS compile_link.c
    37: 0000000000601034 4 OBJECT LOCAL DEFAULT 24 static_e
    38: 0000000000000000 0 FILE LOCAL DEFAULT ABS compile_link3.c
    39: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
    40: 00000000004007b0 0 OBJECT LOCAL DEFAULT 17 __FRAME_END

    41: 0000000000600e20 0 OBJECT LOCAL DEFAULT 20 __JCR_END

    42: 0000000000000000 0 FILE LOCAL DEFAULT ABS
    43: 0000000000600e18 0 NOTYPE LOCAL DEFAULT 18 init_array_end
    44: 0000000000600e28 0 OBJECT LOCAL DEFAULT 21 DYNAMIC
    45: 0000000000600e10 0 NOTYPE LOCAL DEFAULT 18 __init_array_start
    46: 0000000000400610 0 NOTYPE LOCAL DEFAULT 16 __GNU_EH_FRAME_HDR
    47: 0000000000601000 0 OBJECT LOCAL DEFAULT 23 _GLOBAL_OFFSET_TABLE

    48: 00000000004005f0 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
    49: 000000000060103c 4 OBJECT GLOBAL DEFAULT 25 global_b
    50: 0000000000601040 4 OBJECT GLOBAL DEFAULT 25 global_d
    51: 0000000000601028 0 NOTYPE WEAK DEFAULT 24 data_start
    52: 0000000000601030 4 OBJECT GLOBAL DEFAULT 24 global_a
    53: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 24 _edata
    54: 00000000004004ed 12 FUNC GLOBAL DEFAULT 13 extern_func // 6th in .text
    55: 00000000004005f4 0 FUNC GLOBAL DEFAULT 14 _fini
    56: 000000000060102c 4 OBJECT GLOBAL DEFAULT 24 extern_f
    57: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@@GLIBC_
    58: 0000000000601028 0 NOTYPE GLOBAL DEFAULT 24 __data_start
    59: 0000000000400564 20 FUNC GLOBAL DEFAULT 13 extern_func3 // 9th in .text
    60: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start

    61: 0000000000400608 0 OBJECT GLOBAL HIDDEN 15 __dso_handle
    62: 0000000000400600 4 OBJECT GLOBAL DEFAULT 15 IO_stdin_used
    63: 0000000000601044 4 OBJECT GLOBAL DEFAULT 25 global_c
    64: 0000000000400580 101 FUNC GLOBAL DEFAULT 13 __libc_csu_init
    65: 0000000000601048 0 NOTYPE GLOBAL DEFAULT 25 _end
    66: 0000000000400400 0 FUNC GLOBAL DEFAULT 13 start // 1st in .text
    67: 0000000000601038 0 NOTYPE GLOBAL DEFAULT 25 __bss_start
    68: 0000000000400507 93 FUNC GLOBAL DEFAULT 13 main // 8th in .text
    69: 0000000000601038 0 OBJECT GLOBAL HIDDEN 24 __TMC_END

    70: 00000000004003a8 0 FUNC GLOBAL DEFAULT 11 _init
    71: 00000000004004f9 14 FUNC GLOBAL DEFAULT 13 global_func // 7th in .text

    Ignoring the unfamiliar symbols, we can see that the symbols from the previous object file’s symbol table now possess absolute addresses. Symbols defined by us no longer have the UNDEF section index.

  • Finally, we confirm that the references within the code and data sections have been modified:
    The disassembled result of the main code section is as follows:

    main:
    400507: 55 push %rbp
    400508: 48 89 e5 mov %rsp,%rbp
    40050b: 48 83 ec 10 sub $0x10,%rsp
    40050f: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
    400516: bf 01 00 00 00 mov $0x1,%edi
    // 0xffffffd9 is complement of -39; 0x400520 - 39 = 0x4004f9
    40051b: e8 d9 ff ff ff callq 4004f9
    400520: 89 45 f8 mov %eax,-0x8(%rbp)
    400523: bf 04 00 00 00 mov $0x4,%edi
    400528: e8 c0 ff ff ff callq 4004ed
    40052d: 89 45 f4 mov %eax,-0xc(%rbp)
    400530: bf 00 00 00 00 mov $0x0,%edi
    400535: e8 2a 00 00 00 callq 400564
    40053a: 89 05 fc 0a 20 00 mov %eax,0x200afc(%rip) # 60103c
    400540: bf 03 00 00 00 mov $0x3,%edi
    400545: e8 a3 ff ff ff callq 4004ed
    40054a: 89 05 f4 0a 20 00 mov %eax,0x200af4(%rip) # 601044
    400550: 8b 05 d6 0a 20 00 mov 0x200ad6(%rip),%eax # 60102c
    400556: 89 05 e4 0a 20 00 mov %eax,0x200ae4(%rip) # 601040
    // 0x200ad2 = 0x601034-0x400562
    40055c: 8b 05 d2 0a 20 00 mov 0x200ad2(%rip),%eax # 601034
    400562: c9 leaveq
    400563: c3 retq

  • Observing the Glue Code
    By examining the ELF header, we find the entry point of the executable is 0x400400. This is the address of the _start code implanted by the linker.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    0000000000400400 <_start>:
    400400: 31 ed xor %ebp,%ebp ; cleanup ebp
    400402: 49 89 d1 mov %rdx,%r9 ; 6th arg: rtld_fini
    400405: 5e pop %rsi ; 2nd arg: count of arguements
    400406: 48 89 e2 mov %rsp,%rdx ; 3rd arg: list of argument
    400409: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp ; align stack pointer by 16
    40040d: 50 push %rax ; save %rax
    40040e: 54 push %rsp ; 7th arg: stack_end
    40040f: 49 c7 c0 f0 05 40 00 mov $0x4005f0,%r8 ; 5th arg: __libc_csu_fini
    400416: 48 c7 c1 80 05 40 00 mov $0x400580,%rcx ; 4th arg: __libc_csu_init
    40041d: 48 c7 c7 07 05 40 00 mov $0x400507,%rdi ; 1st arg: main
    400424: e8 b7 ff ff ff callq 4003e0 <__libc_start_main@plt>
    400429: f4 hlt
    40042a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)

    Before this assembly code executes, the operating system initializes the program’s stack structure, as shown in Figure 4:
    Figure.4 Stack Initialization

    _start ultimately calls__libc_start_main,The source code for __libc_start_main is in glibc (csu/libc-start.c). It performs various preparatory and cleanup tasks before and after calling the main function, including:

    1. Register the dynamic linker cleanup function rtld_fini
      1
      __cxa_atexit ((void (*) (void *)) rtld_fini, NULL, NULL);
    2. Register the exit cleanup function __libc_csu_fini
      1
      __cxa_atexit ((void (*) (void *)) fini, NULL, NULL);
    3. Execute the initialization function __libc_csu_init
      1
      (*init) (argc, argv, __environ MAIN_AUXVEC_PARAM);
    4. Execute the main function
      1
      result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
    5. Execute the exit functions
      1
      exit (result);

Linking into static lib

You can use the command to package multiple modules containing common functions (none of which contain the entry point main) into a static library libname.a. An executable file is then generated by linking against this static library. This avoids the overhead of repeatedly compiling the static library code.

1
ar acs name.a a.o b.o

When linking static libraries, note the following:

  • Only the modules referenced by the source files are copied into the executable. Multiple executables referencing the same static library each possess their own copy of the module’s code.
  • Copying is performed on a per-module basis. Even if a source file references only a single function from the static library, the entire module containing that function is copied in its entirety.

Linking into dynamic lib

You can also use the command to packet ackage the modules into a dynamic library (shared library).

1
gcc -shared -fPIC -o liba.so a.o b.o

When generating an executable by linking against a dynamic library, the executable only records the filenames of the shared libraries used (to facilitate loading the actual code during subsequent execution) and generates an indirect jump table (.got/.plt) containing the addresses of the referenced library functions. The complete linking is deferred until load time.

Using dynamic libraries offers additional advantages:

  • Saves memory space
    Only a single copy of the dynamic library code resides in memory. Upon loading, each executable maps this same code into its own virtual address space.
  • Enables code updates
    As long as function prototypes remain unchanged, code can be updated by providing a newer version of the dynamic library without needing to recompile and relink the executables that use it.

The characteristic of dynamic libraries being loaded at different addresses requires them to be generated in PIC (Position-Independent Code) mode: within the code section, references to global symbols are set as offsets pointing to corresponding entries in the indirect jump table (.got). Before loading, the linker (or dynamic linker) fills the final address of the symbol into this jump table entry. This method of indirect referencing of global variables may incur a slight performance cost, but it ensures that the code section of the dynamic library does not require modification after being loaded into memory (modification would trigger Copy-on-Write (CoW), reverting to the state of having multiple copies)

Loading

The loading stage is responsible for loading the executable file and the required dynamic libraries into memory, completing any missing linkages, and initiating execution.

The dynamic libraries a program needs to link against can be viewed using the ldd command:

1
2
3
4
ldd compile_link
linux-vdso.so.1 => (0x00007ffbffffe000) // this lib is virtual
libc.so.6 => /lib64/libc.so.6 (0x00007efbf7a0f000)
/lib64/ld-linux-x86-64.so.2 (0x00007efbf7ddd000) // dynamic loader

In practice, all C programs require linking against dynamic libraries because the __libc_start_main function, located at the _start entry point mentioned earlier, resides within the dynamic library libc.so.6.

The specific dynamic libraries are loaded through the following steps:

  1. Load the Dynamic Loader(dynamic loader)
    The dynamic linker (or loader) is responsible for loading other dynamic libraries. Its specific location is specified by the contents of the .interp section within the executable

    1
    2
    3
    4
    readelf -p .interp compile_link

    String dump of section '.interp':
    [ 0] /lib64/ld-linux-x86-64.so.2
  2. The dynamic linker sequentially loads the code of other required dynamic libraries
    During this process, the absolute addresses of global symbols/sections referenced by the dynamic linker’s own code are stored in the .dynamic section and indirectly referenced via .got[0]

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    readelf -d compile_link

    Dynamic section at offset 0xe28 contains 24 entries:
    Tag Type Name/Value
    0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
    0x000000000000000c (INIT) 0x4003a8 // _init
    0x000000000000000d (FINI) 0x4005f4 // _fini
    0x0000000000000019 (INIT_ARRAY) 0x600e10 // __frame_dummy_init_array_entry
    0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
    0x000000000000001a (FINI_ARRAY) 0x600e18 // __do_global_dtors_aux_fini_array_entry
    0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
    0x000000006ffffef5 (GNU_HASH) 0x400298
    0x0000000000000005 (STRTAB) 0x400300 // string used by dynamic loader
    0x0000000000000006 (SYMTAB) 0x4002b8 // symbol used by dynamic loader
    0x000000000000000a (STRSZ) 56 (bytes)
    0x000000000000000b (SYMENT) 24 (bytes)
    0x0000000000000015 (DEBUG) 0x0
    0x0000000000000003 (PLTGOT) 0x601000
    0x0000000000000002 (PLTRELSZ) 48 (bytes)
    0x0000000000000014 (PLTREL) RELA
    0x0000000000000017 (JMPREL) 0x400378
    0x0000000000000007 (RELA) 0x400360
    0x0000000000000008 (RELASZ) 24 (bytes)
    0x0000000000000009 (RELAENT) 24 (bytes)
    0x000000006ffffffe (VERNEED) 0x400340
    0x000000006fffffff (VERNEEDNUM) 1
    0x000000006ffffff0 (VERSYM) 0x400338
    0x0000000000000000 (NULL) 0x0
  3. The dynamic linker corrects (relocates) the absolute addresses of entries in the Global Offset Table (.got) of both the executable and the dynamic libraries.

Runtime Linking

Function Lazy Binding

A clever strategy for lazy binding of function addresses can be achieved through the intricate cooperation of the .got(Global Offset Table) and .plt(Procedure Linkage Table).
During the executable generation phase, the linking process creates corresponding .got and .plt entries with preset values. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Disassembly of section .plt:

00000000004003d0 <.plt>:
4003d0: ff 35 32 0c 20 00 pushq 0x200c32(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
4003d6: ff 25 34 0c 20 00 jmpq *0x200c34(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
4003dc: 0f 1f 40 00 nopl 0x0(%rax)

00000000004003e0 <__libc_start_main@plt>:
4003e0: ff 25 32 0c 20 00 jmpq *0x200c32(%rip) # 601018 <__libc_start_main@GLIBC_2.2.5>
4003e6: 68 00 00 00 00 pushq $0x0
4003eb: e9 e0 ff ff ff jmpq 4003d0 <.plt>

00000000004003f0 <__gmon_start__@plt>:
4003f0: ff 25 2a 0c 20 00 jmpq *0x200c2a(%rip) # 601020 <__gmon_start__>
4003f6: 68 01 00 00 00 pushq $0x1
4003fb: e9 d0 ff ff ff jmpq 4003d0 <.plt>

Contents of section .got.plt:
601000 280e6000 00000000 ; got[0]: address of dynamic section
601008 00000000 00000000 ; got[1]: identifier for dynamic loader
601010 00000000 00000000 ; got[2]: entry for dynamic loader
601018 e6034000 00000000 ; got[3]: entry for __libc_start_main
601020 f6034000 00000000 ; got[4]: entry for __gmon_start__

A call to __libc_start_main in the code section would jump to the address pointed to by .got[3]. During program load time, the dynamic linker rewrites .got[1]and .got[2]with their correct values (see comments below)

When the function __libc_start_main is called for the first time, it triggers the binding logic:

  1. jump to address(4003e6) saved in .got[3], The instruction at this address pushes a relocation offset (e.g., $0x0) onto the stack.
  2. It then jumps to .plt section,pushes the address referenced via .got[1] and finally jumps to the dynamic linker’s entry (whose address is stored in .got[2])
  3. dynamic loader resolves the actual address,and patches (overwrites) the corresponding entry in .got[3]with this resolved address
  4. execute the __libc_start_main function

Subsequent calls to __libc_start_mainwill directly jump to its actual address now stored in .got[3], bypassing the binding logic.

Runtime Symbol Resolution

Dynamic linking is the most powerful technique. You can even use the APIs provided by the dlfcn standard library to change the address of a symbol during program execution, effectively altering the logic of functions.

Reflection

Up to this point, we have comprehensively illustrated the entire lifecycle of a program, from source code to execution, and explained its step-by-step evolution process as thoroughly as possible.

Dynamic linking is a fascinating feature. Writing code that can be dynamically linked at runtime can endow statically compiled programs with some characteristics of interpreted languages. On the other hand, injecting dynamic libraries has also become a viable attack vector, warranting in-depth study.

References

  1. Randal E. Bryant, David R. O’Hallaron, (2015). Computer Systems: A Programmer’s Perspective
  2. x86_64-abi-0.99.pdf
  3. Robert W. Sebesta, (2016). Concepts of Programming Languages

Series

  1. Understanding C Language-1: Memory Object
  2. Understanding C Language-2: Instruction Execution