Toucan Linux Project - 17
A Roll Your Own Distribution
All About Optimization
Goal 3 – Optimization Settings
There is one more optimization we can use that is relatively new to GCC. Though it has supported it for some time, only with version 8 was it useful, and in version 9, far better. It is called link time optimization, or LTO, and it occurs when the program is linked not when it is compiled. This is different from optimizing code since the optimizer can only look at the code of a single file. Before we get to far into LTO, let's first examine what the compiler does.How GCC Works
GCC has several passes it makes using various different programs including the pre-compiler which does work with macros, include files, and pragma. It builds a single file of source code that includes everything necessary to compile that single program. The next pass is compilation (which is a actually few passes) that converts the source code into an assembly file for the native processor (or chosen processor for a cross compile). It is important to note that GCC and other compilers don't optimize assembly directly but use several abstraction layers to do so, though for purposes of discussion we can think of it as optimizing the assembly. The assembly source code is then passed through an assembler which generates the executable object for the file. The optimization can occur by rewriting the assembly source or by examining the object file. Either way a new object file is created that contains the executable code, once again per source file. Finally the objects are linked together and all external references to libraries are found and added, startup code and compiler run time is added, and a file is written in the operating systems binary file structure (ELF for Linux) that if finally an executable file.Let's take an example program we'll write ourselves for demonstration purposes. It's called Vesuvius. It is contained in two files we'll create in the home directory. For this step use your regular (non-root account.)
Create a directory in your home area to work and create a file called !main.c in it.
cd
mkdir -p code/vesuvius
cd vesuvius
cat > main.c << "EOF"
// mainc Michael R Stute
// for The Toucan Linux Project
#include <stdlib.h>
#include <ncurses.h>
#include "particle.h"
#define NUM_COLORS 12
#define COLOR_VOLCANO 1
int scrX,scrY;
int colors[NUM_COLORS];
void error(char *msg)
{
endwin();
printf("%s",msg);
exit(1);
}
// Create colors
// Volcano color is grey
void
create_colors(int can_chg)
{
if(can_chg) {
init_color(0,00,00,00); // Background
init_color(1,300,300,350);
init_color(2,950,00,00); // White
init_color(3,700,00,00); // Dull white
init_color(4,950,0,00); // Blue White
init_color(5,850,0,00); // blue
init_color(6,750,0,00); // Yellow white
init_color(7,650,0,00); // Yellow
init_color(8,500,0,00); // Yellow red
init_color(9,450,00,00); // Bright red
init_color(10,350,00,0); // red
init_color(11,200,00,0); // dull red
init_pair(COLOR_VOLCANO,1,COLOR_BLACK);
init_pair(2,2,COLOR_BLACK);
init_pair(3,3,COLOR_BLACK);
init_pair(4,4,COLOR_BLACK);
init_pair(5,5,COLOR_BLACK);
init_pair(6,6,COLOR_BLACK);
init_pair(7,7,COLOR_BLACK);
init_pair(8,8,COLOR_BLACK);
init_pair(9,9,COLOR_BLACK);
init_pair(10,10,COLOR_BLACK);
init_pair(11,11,COLOR_BLACK);
}
else {
// Set volcano color
init_pair(COLOR_VOLCANO,COLOR_YELLOW,COLOR_BLACK);
// Create temperature colors
init_pair(1,COLOR_WHITE,COLOR_BLACK);
init_pair(2,COLOR_WHITE,COLOR_BLACK);
init_pair(3,COLOR_WHITE,COLOR_BLACK);
init_pair(4,COLOR_BLUE,COLOR_BLACK);
init_pair(5,COLOR_BLUE,COLOR_BLACK);
init_pair(6,COLOR_YELLOW,COLOR_BLACK);
init_pair(7,COLOR_YELLOW,COLOR_BLACK);
init_pair(8,COLOR_RED,COLOR_BLACK);
init_pair(9,COLOR_RED,COLOR_BLACK);
init_pair(10,COLOR_RED,COLOR_BLACK);
}
colors[0]=COLOR_PAIR(0);
colors[1]=COLOR_PAIR(1);
colors[2]=COLOR_PAIR(2);
colors[3]=COLOR_PAIR(3);
colors[4]=COLOR_PAIR(4);
colors[5]=COLOR_PAIR(5);
colors[6]=COLOR_PAIR(6);
colors[7]=COLOR_PAIR(7);
colors[8]=COLOR_PAIR(8);
colors[9]=COLOR_PAIR(9);
colors[10]=COLOR_PAIR(10);
colors[11]=COLOR_PAIR(11);
}
// ***
// *****
// *******
// **********
// *************
// ****************
void
draw_volcano(int lastline)
{
int x,y;
x=scrX / 2;
y=scrY-6;
attr_set(A_NORMAL,COLOR_VOLCANO,NULL);
mvprintw(y++,x-1,"***");
mvprintw(y++,x-2,"*****");
mvprintw(y++,x-3,"*******");
mvprintw(y++,x-4,"*********");
mvprintw(y++,x-5,"***********");
if(lastline)
mvprintw(y,x-7,"**************");
}
void
erupt(int rocks)
{
int counter=0,next_p=0,next_time,todo;
PARTICLE *particles[100]={NULL};
next_time=counter+1;
for(;;counter++) {
todo=1+ lrand48() % rocks;
for(int g=0;g<todo;g++) {
if(counter==next_time) {
// Start another rock unless there is no space
if(next_p>=0)
particles[next_p]=create_particle(scrX/2,scrY-7,1);
// Adjust next_p and next_time
next_time+=1;
int i;
for(i=0;i<100;i++) {
if(!particles[i]) {
next_p=i;
break;
}
}
if(i==100) // This means array is full
next_p=-1;
}
}
for(int i=0;i<100;i++) {
if(particles[i])
if(move_particle(particles[i])==TRUE) {
destroy_particle(particles[i]);
particles[i]=NULL;
}
else
draw_particle(particles[i]);
}
draw_volcano(0);
refresh();
if(getch()=='q')
break;
}
}
int
main(int argc, char *argv[])
{
// Setup curses
int chgClr=0;
initscr();
/* first test for color ability of the terminal */
if(!has_colors())
error("Terminal doesn't support colors\n");
/* next attempt to initialize curses colors */
if(start_color() != OK)
error("Starintg the color system failed.\n");
/* colors are okay; continue */
if(can_change_color())
chgClr=1;
curs_set(0);
halfdelay(2);
scrX=getmaxx(stdscr);
scrY=getmaxy(stdscr);
create_colors(chgClr);
draw_volcano(1);
erupt(5);
refresh();
getch();
endwin();
}
EOF
Create the particle file
cat > particle.c << "EOF"
// particle.c Michael R Stute
// for The Toucan Linux Project
#include <ncurses.h>
#include <stdlib.h>
#include "particle.h"
extern int colors[11];
extern int scrX, scrY;
PARTICLE *
create_particle(int sX, int sY, int temp)
{
PARTICLE *p;
p=(PARTICLE *)malloc(sizeof(PARTICLE));
// Parameters for parabola
// Formula is y = -(x^2 + hx) * ratio
// where h is the height and ratio is the opening of the parabola
// This guarantees it goes through the origin
p->rx=p->ry=0;
p->h=1+lrand48() % (scrY);
p->invert=lrand48() % 2;
p->ratio=1.0;
p->ratio=4.0 - (8.0*drand48());
p->color=(lrand48() % 10) + 1;
return(p);
}
int
move_particle(PARTICLE *p)
{
if(!p)
return(TRUE);
p->rx++;
p->ry=(int)((float)((p->rx*p->rx)) + (float)((-p->h*p->rx)) * p->ratio);
// Save last position for erasing
p->ox=p->x;
p->oy=p->y;
// Translate to screen
p->y=p->ry+scrY-7;
if(p->invert)
p->x=p->rx+(scrX/2);
else
p->x=(scrX/2) - p->rx;
if(p->y>scrX) { // Particle has landed
mvaddch(scrY-1,p->x,'*'|colors[p->color]);
return(TRUE);
}
return(FALSE);
}
void
destroy_particle(PARTICLE *p)
{
if(p)
free(p);
}
void
draw_particle(PARTICLE *p)
{
if(!p)
return;
// Erase old position
if(p->ox>=0 && p->ox<=scrX && p->oy>=0 && p->oy<=scrY)
mvaddch(p->oy,p->ox,' ');
// Draw new position
if(p->x>=0 && p->x<=scrX && p->y>=0 && p->y<=scrY)
mvaddch(p->y,p->x,'*'|colors[p->color]);
}
EOF
Create a header file to define the particle functions and structures.
cat > particle.h < "EOF"
// particle.h Michael R Stute
// for The Toucan Linux Project
typedef struct _particle
{
int x,y; // screen coords
int ox,oy; // Previous position
int h; // height
float ratio; // width of opening
int rx,ry; // Real x,y
int invert; // Invert the parabola on x axis
int color;
} PARTICLE;
PARTICLE *create_particle(int sX, int sY, int temp);
int move_particle(PARTICLE *p);
void destroy_particle(PARTICLE *p);
void draw_particle(PARTICLE *p);
EOF
Now we have two files that contain the program. Test them first by compiling them
gcc particle.c main.c -o vesuvius `pkg-config ncurses --libs --cflags`
You can run it to see a curses animated volcano spewing red hot rocks, press 'q' when you get tired of it. Now, lets use it to check out GCC's various stages.
The first step is the pre-compiler. By using the -E option GCC will stop after the pre-compiler
gcc -E main.c > /tmp/main_pc.c
You can examine the file using less or the editor. It will be much larger as the pre-compiler has retrieved and added all include files (including the includes they include) as well applying the macros in the program. That is the code that is actually passed to the compiler. Much larger? Yes.
Next step is to check out the assembly code the compiler generates
gcc -S main.c
It will output a file called main.s which contains the assembly code the compiler has created. This will be optimized using whatever options are passed on the command line, which is none (default -O0) in this case. You can use different optimization options and save the output to different file names, then use diff to find out the different optimizations.
Lastly, GCC will create only the assembled object files (but not the program with include startup code, etc.) using the -c option.
gcc -c main.c
which will leave a file called main.o behind.
Examining the Object File
Object files are the ELF format that Linux has used since around 1995. All binary object files (code), shared libraries, and executable files are stored in the ELF file format as adopted by most Unix-like operating systems support dynamic linking (and now even non-Unix OSes.) Though it is rare that a user might need to know anything about the ELF specification, we will use it to examine the output of the compiler.There are two binaries that come as part of the util-linux package: readelf and objdump. The readelf utility will display everything it can if passed the -a option (or --all). Here is the output as when it examines the !main.o file from the compiler
$ readelf -a main.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 6416 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 13
Section header string table index: 12
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000803 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000e88
00000000000009a8 0000000000000018 I 10 1 8
[ 3] .data PROGBITS 0000000000000000 00000843
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000843
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000848
000000000000008b 0000000000000000 A 0 0 8
[ 6] .comment PROGBITS 0000000000000000 000008d3
000000000000001b 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 000008ee
0000000000000000 0000000000000000 0 0 1
[ 8] .eh_frame PROGBITS 0000000000000000 000008f0
00000000000000b8 0000000000000000 A 0 0 8
[ 9] .rela.eh_frame RELA 0000000000000000 00001830
0000000000000078 0000000000000018 I 10 8 8
[10] .symtab SYMTAB 0000000000000000 000009a8
00000000000003a8 0000000000000018 11 9 8
[11] .strtab STRTAB 0000000000000000 00000d50
0000000000000135 0000000000000000 0 0 1
[12] .shstrtab STRTAB 0000000000000000 000018a8
0000000000000061 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
There is no dynamic section in this file.
Relocation section '.rela.text' at offset 0xe88 contains 103 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000d 000e00000004 R_X86_64_PLT32 0000000000000000 endwin - 4
00000000001b 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
000000000025 000f00000004 R_X86_64_PLT32 0000000000000000 printf - 4
00000000002f 001000000004 R_X86_64_PLT32 0000000000000000 exit - 4
00000000005d 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
000000000076 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
00000000008f 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
0000000000a8 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
0000000000c1 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
0000000000da 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
0000000000f3 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
00000000010c 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
000000000125 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
00000000013e 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
000000000157 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
000000000170 001200000004 R_X86_64_PLT32 0000000000000000 init_color - 4
000000000184 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000198 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000001ac 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000001c0 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000001d4 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000001e8 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000001fc 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000210 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000224 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000238 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
00000000024c 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000265 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000279 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
00000000028d 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000002a1 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000002b5 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000002c9 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000002dd 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
0000000002f1 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000305 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000319 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
00000000032d 001300000004 R_X86_64_PLT32 0000000000000000 init_pair - 4
000000000333 000b00000002 R_X86_64_PC32 0000000000000020 colors - 8
00000000033d 000b00000002 R_X86_64_PC32 0000000000000020 colors - 4
000000000347 000b00000002 R_X86_64_PC32 0000000000000020 colors + 0
000000000351 000b00000002 R_X86_64_PC32 0000000000000020 colors + 4
00000000035b 000b00000002 R_X86_64_PC32 0000000000000020 colors + 8
000000000365 000b00000002 R_X86_64_PC32 0000000000000020 colors + c
00000000036f 000b00000002 R_X86_64_PC32 0000000000000020 colors + 10
000000000379 000b00000002 R_X86_64_PC32 0000000000000020 colors + 14
000000000383 000b00000002 R_X86_64_PC32 0000000000000020 colors + 18
00000000038d 000b00000002 R_X86_64_PC32 0000000000000020 colors + 1c
000000000397 000b00000002 R_X86_64_PC32 0000000000000020 colors + 20
0000000003a1 000b00000002 R_X86_64_PC32 0000000000000020 colors + 24
0000000003b9 000900000002 R_X86_64_PC32 0000000000000004 scrX - 4
0000000003cb 000a00000002 R_X86_64_PC32 0000000000000004 scrY - 4
0000000003d8 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
0000000003e4 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
000000000401 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 1
00000000040f 001600000004 R_X86_64_PLT32 0000000000000000 mvprintw - 4
000000000425 000500000002 R_X86_64_PC32 0000000000000000 .rodata + 3
000000000433 001600000004 R_X86_64_PLT32 0000000000000000 mvprintw - 4
000000000449 000500000002 R_X86_64_PC32 0000000000000000 .rodata + 9
000000000457 001600000004 R_X86_64_PLT32 0000000000000000 mvprintw - 4
00000000046d 000500000002 R_X86_64_PC32 0000000000000000 .rodata + 11
00000000047b 001600000004 R_X86_64_PLT32 0000000000000000 mvprintw - 4
000000000491 000500000002 R_X86_64_PC32 0000000000000000 .rodata + 1b
00000000049f 001600000004 R_X86_64_PLT32 0000000000000000 mvprintw - 4
0000000004b5 000500000002 R_X86_64_PC32 0000000000000000 .rodata + 27
0000000004c3 001600000004 R_X86_64_PLT32 0000000000000000 mvprintw - 4
000000000525 001800000004 R_X86_64_PLT32 0000000000000000 lrand48 - 4
00000000056f 000a00000002 R_X86_64_PC32 0000000000000004 scrY - 4
000000000578 000900000002 R_X86_64_PC32 0000000000000004 scrX - 4
00000000058f 001900000004 R_X86_64_PLT32 0000000000000000 create_particle - 4
00000000064b 001a00000004 R_X86_64_PLT32 0000000000000000 move_particle - 4
000000000668 001b00000004 R_X86_64_PLT32 0000000000000000 destroy_particle - 4
000000000696 001c00000004 R_X86_64_PLT32 0000000000000000 draw_particle - 4
0000000006b4 001400000004 R_X86_64_PLT32 00000000000003ac draw_volcano - 4
0000000006bb 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
0000000006c3 001d00000004 R_X86_64_PLT32 0000000000000000 wrefresh - 4
0000000006ca 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
0000000006d2 001e00000004 R_X86_64_PLT32 0000000000000000 wgetch - 4
0000000006f9 001f00000004 R_X86_64_PLT32 0000000000000000 __stack_chk_fail - 4
000000000716 002100000004 R_X86_64_PLT32 0000000000000000 initscr - 4
00000000071b 002200000004 R_X86_64_PLT32 0000000000000000 has_colors - 4
000000000729 000500000002 R_X86_64_PC32 0000000000000000 .rodata + 3c
00000000072e 000c00000004 R_X86_64_PLT32 0000000000000000 error - 4
000000000733 002300000004 R_X86_64_PLT32 0000000000000000 start_color - 4
00000000073e 000500000002 R_X86_64_PC32 0000000000000000 .rodata + 64
000000000743 000c00000004 R_X86_64_PLT32 0000000000000000 error - 4
000000000748 002400000004 R_X86_64_PLT32 0000000000000000 can_change_color - 4
00000000075d 002500000004 R_X86_64_PLT32 0000000000000000 curs_set - 4
000000000767 002600000004 R_X86_64_PLT32 0000000000000000 halfdelay - 4
00000000076e 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
00000000077a 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
00000000078f 000900000002 R_X86_64_PC32 0000000000000004 scrX - 4
000000000796 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
0000000007a2 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
0000000007b7 000a00000002 R_X86_64_PC32 0000000000000004 scrY - 4
0000000007c1 001100000004 R_X86_64_PLT32 0000000000000033 create_colors - 4
0000000007cb 001400000004 R_X86_64_PLT32 00000000000003ac draw_volcano - 4
0000000007d5 001700000004 R_X86_64_PLT32 00000000000004ca erupt - 4
0000000007dc 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
0000000007e4 001d00000004 R_X86_64_PLT32 0000000000000000 wrefresh - 4
0000000007eb 001500000002 R_X86_64_PC32 0000000000000000 stdscr - 4
0000000007f3 001e00000004 R_X86_64_PLT32 0000000000000000 wgetch - 4
0000000007f8 000e00000004 R_X86_64_PLT32 0000000000000000 endwin - 4
Relocation section '.rela.eh_frame' at offset 0x1830 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
00000000003c 000200000002 R_X86_64_PC32 0000000000000000 .text + 33
00000000005c 000200000002 R_X86_64_PC32 0000000000000000 .text + 3ac
00000000007c 000200000002 R_X86_64_PC32 0000000000000000 .text + 4ca
00000000009c 000200000002 R_X86_64_PC32 0000000000000000 .text + 6ff
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 39 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 8
8: 0000000000000000 0 SECTION LOCAL DEFAULT 6
9: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM scrX
10: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM scrY
11: 0000000000000020 48 OBJECT GLOBAL DEFAULT COM colors
12: 0000000000000000 51 FUNC GLOBAL DEFAULT 1 error
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
14: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND endwin
15: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
16: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND exit
17: 0000000000000033 889 FUNC GLOBAL DEFAULT 1 create_colors
18: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND init_color
19: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND init_pair
20: 00000000000003ac 286 FUNC GLOBAL DEFAULT 1 draw_volcano
21: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND stdscr
22: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND mvprintw
23: 00000000000004ca 565 FUNC GLOBAL DEFAULT 1 erupt
24: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND lrand48
25: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND create_particle
26: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND move_particle
27: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND destroy_particle
28: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND draw_particle
29: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND wrefresh
30: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND wgetch
31: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __stack_chk_fail
32: 00000000000006ff 260 FUNC GLOBAL DEFAULT 1 main
33: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND initscr
34: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND has_colors
35: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND start_color
36: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND can_change_color
37: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND curs_set
38: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND halfdelay
No version information found in this file.
This would be even longer if the -g option were passed to the compiler to create debug. An ELF file consists of different sections and the two we are most interested in for now are the "Section Headers" and "Relocation" sectionz. We'll start with the latter first.
The section headers shows which sections are in the file. It can be viewed by itself with the -S option.
readelf -S main.o
There are 13 section headers, starting at offset 0x1910:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000803 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000e88
00000000000009a8 0000000000000018 I 10 1 8
[ 3] .data PROGBITS 0000000000000000 00000843
0000000000000000 0000000000000000 WA 0 0 1
[ 4] .bss NOBITS 0000000000000000 00000843
0000000000000000 0000000000000000 WA 0 0 1
[ 5] .rodata PROGBITS 0000000000000000 00000848
000000000000008b 0000000000000000 A 0 0 8
[ 6] .comment PROGBITS 0000000000000000 000008d3
000000000000001b 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 000008ee
0000000000000000 0000000000000000 0 0 1
[ 8] .eh_frame PROGBITS 0000000000000000 000008f0
00000000000000b8 0000000000000000 A 0 0 8
[ 9] .rela.eh_frame RELA 0000000000000000 00001830
0000000000000078 0000000000000018 I 10 8 8
[10] .symtab SYMTAB 0000000000000000 000009a8
00000000000003a8 0000000000000018 11 9 8
[11] .strtab STRTAB 0000000000000000 00000d50
0000000000000135 0000000000000000 0 0 1
[12] .shstrtab STRTAB 0000000000000000 000018a8
0000000000000061 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
This shows all the sections in the file and where to find them. You might think of this as an index to the file. Each section has several indicators of what they are and what must be done as indicated by the info, section title, and the flags. The code is contained in a section called ".text" and another section called "Relocation" (rela.text) which is a list of code relocations that must occur for the program to execute. They are architecture dependent and there are usually quite a few of them. You might think of them as a step-by-step list of instructions necessary to finding all the bits and pieces of code and data for the program to run, including allocations. The most common are X86_64_COPY which means "copy the code from the library to this address" and X86_64_JUMP_SLO (JUMP_SLOT) which is a standard way to link to a function that is shared by multiple programs (in the Global Offset Table -- GOT) so that only one copy needs to be resident in memory for all programs. The linker will use the Procedure Linkage Table to resolve and write in the addresses from the GOT (referred to as GOT/PLT) linking. This is probably more than we need to know, but we are here to learn. You won't see these in the object files (.o) because they are used for linking but are not linked themselves. You can see the GOT/PLT on the executable itself using the second command objdump.
objdump -d vesuvius
Here's a sample of the output
Disassembly of section .plt:
0000000000000ab0 <__cxa_finalize@plt-0x10>:
ab0: ff 35 3a 25 00 00 pushq 0x253a(%rip) # 2ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
ab6: ff 25 3c 25 00 00 jmpq *0x253c(%rip) # 2ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
abc: 90 nop
abd: 90 nop
abe: 90 nop
abf: 90 nop
0000000000000ac0 <__cxa_finalize@plt>:
ac0: ff 25 3a 25 00 00 jmpq *0x253a(%rip) # 3000 <__cxa_finalize@GLIBC_2.2.5>
ac6: 68 00 00 00 00 pushq $0x0
acb: e9 e0 ff ff ff jmpq ab0 <_init+0x20>
0000000000000ad0 <endwin@plt>:
ad0: ff 25 32 25 00 00 jmpq *0x2532(%rip) # 3008 <endwin>
ad6: 68 01 00 00 00 pushq $0x1
adb: e9 d0 ff ff ff jmpq ab0 <_init+0x20>
0000000000000ae0 <printf@plt>:
ae0: ff 25 2a 25 00 00 jmpq *0x252a(%rip) # 3010 <printf@GLIBC_2.2.5>
Scanning the far right hand column we see reference to the Global Offset Table, to external libraries (printf@GLIBC_2.2.5) and entries in the procedure linkage table.
What you also see is that objdump has disassembled the code which you can see both in the executable and the object files.
Now when we compile with optimization each file is examined for ways to speed up as mentioned earlier. But in this program, as with many others, the code is contained in more than one file. Here it is very simple to make it easy to follow, but image a program like ffmpeg that is made of over 300 object files. You might wonder is it possible to optimize all the code in all the files. The answer is yes.
Link Time Optimization
Newer versions of GCC support a type of optimization that examines all the object files of a project, and does further optimization of the complete program as a whole. While this has been supported from some time, it started becoming useful in GCC 7, was greatly improved in GCC 8, and finally became something fairly powerful in GCC 9.GCC breaks the source code in a token tree after parsing the code and then converts this into several other intermediate forms as it compiles the code. After the parsing tree is built it creates a representation called "generic" that represent entire functions in generic trees. In this way source code can be a mixture of the languages GCC supports (FORTRAN, C, C++, Objective C/++, D, Go, BRIG, and Ada.) Once in a generic tree format the code can be treated operationally the same, meaning the source no longer matters, making it generic. After Generic it creates tuples from the generic tree. These tuples are highly uniform in order to allow optimization. This is called GIMPLE after the original idea as proposed and implemented in the McCat compiler from McGill University which was called SIMPLE. GIMPLE is a representation of the code in tuples that allows easy examination of the code as simple steps that can be optimized.
There are several advantages to this, including being to optimize different source languages using the same optimizer since it originates from the generic pass. Now, as long as the source is first converted to generic then GIMPLE tuples it can optimized and any new technique that is added to the optimizer automatically applies to all source languages. The other major advantage is that multiple files of GIMPLE can be treated as a single entity and further optimization can occur allowing optimizations to occur at the program level instead of just the file level. To do this, the compiler needs to save the GIMPLE representation of each file as it compiles them so it can later look at them all together. That is called Link Time Optimization, or LTO for short which in enabled at compile time with -flto. With this switch, GCC will write the GIMPLE for each function (along with a few other necessities) to the object file. Of course, there is no advantage unless the program is linked from multiple object files that is why the vesuvius example program is awkwardly written over two files.
Compile it to object files with the LTO option:
gcc main.c -c -flto `pkg-config ncurses --libs --cflags
Then examine with *objdump* program
objdump -xd main.o
main.o: file format elf64-x86-64
main.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000000 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 0000000000000000 0000000000000000 00000040 2**0
ALLOC
3 .gnu.lto_.inline.5caf6f8700863e25 000000c5 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, READONLY, EXCLUDE
4 .gnu.lto_.jmpfuncs.5caf6f8700863e25 0000001e 0000000000000000 0000000000000000 00000105 2**0
CONTENTS, READONLY, EXCLUDE
5 .gnu.lto_error.5caf6f8700863e25 0000015d 0000000000000000 0000000000000000 00000123 2**0
CONTENTS, READONLY, EXCLUDE
6 .gnu.lto_create_colors.5caf6f8700863e25 0000050d 0000000000000000 0000000000000000 00000280 2**0
CONTENTS, READONLY, EXCLUDE
7 .gnu.lto_draw_volcano.5caf6f8700863e25 000004b0 0000000000000000 0000000000000000 0000078d 2**0
CONTENTS, READONLY, EXCLUDE
8 .gnu.lto_erupt.5caf6f8700863e25 0000076f 0000000000000000 0000000000000000 00000c3d 2**0
CONTENTS, READONLY, EXCLUDE
9 .gnu.lto_main.5caf6f8700863e25 000005cf 0000000000000000 0000000000000000 000013ac 2**0
CONTENTS, READONLY, EXCLUDE
10 .gnu.lto_.symbol_nodes.5caf6f8700863e25 0000017f 0000000000000000 0000000000000000 0000197b 2**0
CONTENTS, READONLY, EXCLUDE
11 .gnu.lto_.refs.5caf6f8700863e25 00000059 0000000000000000 0000000000000000 00001afa 2**0
CONTENTS, READONLY, EXCLUDE
12 .gnu.lto_.decls.5caf6f8700863e25 000016fd 0000000000000000 0000000000000000 00001b53 2**0
CONTENTS, READONLY, EXCLUDE
13 .gnu.lto_.symtab.5caf6f8700863e25 00000287 0000000000000000 0000000000000000 00003250 2**0
CONTENTS, READONLY, EXCLUDE
14 .gnu.lto_.opts 000003e5 0000000000000000 0000000000000000 000034d7 2**0
CONTENTS, READONLY, EXCLUDE
15 .comment 0000001b 0000000000000000 0000000000000000 000038bc 2**0
CONTENTS, READONLY
16 .note.GNU-stack 00000000 0000000000000000 0000000000000000 000038d7 2**0
CONTENTS, READONLY
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 main.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .gnu.lto_.inline.5caf6f8700863e25 0000000000000000 .gnu.lto_.inline.5caf6f8700863e25
0000000000000000 l d .gnu.lto_.jmpfuncs.5caf6f8700863e25 0000000000000000 .gnu.lto_.jmpfuncs.5caf6f8700863e25
0000000000000000 l d .gnu.lto_error.5caf6f8700863e25 0000000000000000 .gnu.lto_error.5caf6f8700863e25
0000000000000000 l d .gnu.lto_create_colors.5caf6f8700863e25 0000000000000000 .gnu.lto_create_colors.5caf6f8700863e25
0000000000000000 l d .gnu.lto_draw_volcano.5caf6f8700863e25 0000000000000000 .gnu.lto_draw_volcano.5caf6f8700863e25
0000000000000000 l d .gnu.lto_erupt.5caf6f8700863e25 0000000000000000 .gnu.lto_erupt.5caf6f8700863e25
0000000000000000 l d .gnu.lto_main.5caf6f8700863e25 0000000000000000 .gnu.lto_main.5caf6f8700863e25
0000000000000000 l d .gnu.lto_.symbol_nodes.5caf6f8700863e25 0000000000000000 .gnu.lto_.symbol_nodes.5caf6f8700863e25
0000000000000000 l d .gnu.lto_.refs.5caf6f8700863e25 0000000000000000 .gnu.lto_.refs.5caf6f8700863e25
0000000000000000 l d .gnu.lto_.decls.5caf6f8700863e25 0000000000000000 .gnu.lto_.decls.5caf6f8700863e25
0000000000000000 l d .gnu.lto_.symtab.5caf6f8700863e25 0000000000000000 .gnu.lto_.symtab.5caf6f8700863e25
0000000000000000 l d .gnu.lto_.opts 0000000000000000 .gnu.lto_.opts
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000001 O *COM* 0000000000000001 __gnu_lto_v1
0000000000000001 O *COM* 0000000000000001 __gnu_lto_slim
Notice the -d option was used with objdump which will dissemble all code yet there is no assembly. This is because there is no assembly yet. Instead, there are now many new entries it in symbol table starting with gnu.lto_* which contain the functions compiled into GIMPLE. It also maintains an internal symbol table, list of jump function utilities, inline functions the compiler created from normal functions, and errors. This proves the code is not yet compiled even to assembly which will now take place at link time instead of compile time.
Because the code must first be optimized the compiler will use a linker helper program. In order to trigger the link phase of the compile to examine the GIMPLE of each file, use the -fuse-linker-plugin for the compiler. (though it is a linker option it should also be used on the compile command line if objects are compiled separately since the generic plugin interface can be called from either pass.) This will cause GCC to first load all the GIMPLE entries from all the object files, create one large function table for all functions in the program, and further optimize them before creating the assembly, and finally passing assembly code to the assembler to create the binary objects.
gcc particle.c main.c -o vesuvius -flto -fuse-linker-plugin `pkg-config ncurses --libs --cflags
This does mean that compile time will increase significantly since this extra pass is required. It also means the intermediate object files will generally be bigger (though not by a lot.) The biggest difference will be the time it takes to link a large program. The linker (with the help of the plugin) now must make an extra pass to load examine all the GIMPLE from the files then decide how to further optimize the code (mostly in inlining external functions from other files) and then finally write the assembly, call the assembler, and at last, link and create the executable or library. As you can imagine this can take quite some time for large programs.
Developer Alert
It is possible to specify the optimization level for a given section of code with GCC. This means as a developer you can profile your program to determine where the program spends most of the time, test that code in the highest optimization level, and then specify in the code what optimization level to use for a given set of functions. In this manner it is possible to use -Ofast or -O3 for core code where speed is more important than size and while using a different code level of optimization for the remainder of the code. You can, of course, specify different levels of optimization. Since the command line option is considered the default, a programmer can override the default with a higher (or lower) level of optimization.If you want to set optimization for a single function use the attribute after the function type declaration but before the function declaration
void __attribute__((optimize("O3"))) make_this_fast(int var1, int var2) {
...
}
If you want to optimize larger chunks of code, put them in a separate file and use the optimize pragma
# This file is optimized for level 3 to make these functions fast
#pragma GCC push_options
#pragma GCC optimize ("O3")
/* Put your code here */
int function1(void) {
...
}
int fast_function2(int a, int b) {
...
}
For those that develop for lower memory systems or need to fit binaries into ROMs optimization is usually set for size. But using the above compiler tricks you can optimize the program for size and still choose higher levels of optimization for execution speed where it makes sense.
Gold Linker
Since 2008, there is a alternate linker to the GNU linker that does not use the binary file description library (BFD.) The BFD system adds a level of abstraction to the object file as a way to support different object file formats for different architectures in order to handle byte-ordering differences, word size differences for data, and, of course, the object file specification itself. This is an essential tool for embedded developers and makes cross-compiling much easier (which we used to kick start the original tool set used to later build the base system.) But any layer of abstraction adds complexity and often degrades performance. The BDF library now supports more than 50 file formats and 25 different instruction set architectures which is a lot of overhead if all the system ever needs is ELF on X86 or X86_64.To produce a much cleaner implementation, the gold linker was created. It doesn't use the BFD library and can only handle ELF object files. But it is much faster then the GNU BFD linker. It is not a version of the GNU linker but a complete rewrite from scratch with the intention of completely removing all traces of the BFD library.
To use the gold linker, set the LD environment variable to gold or specify which linker to use as compile time using the -fuse-ld parameter such as -fuse-ld=gold.
To compile out test program using LTO and the gold linker do the following
rm *.o vesuvius ; gcc particle.c main.c -o vesuvius -fuse-ld=gold -flto -fuse-linker-plugin `pkg-config ncurses --libs --cflags`
Be sure the -fuse-ld=gold comes before any linker flags including the libraries to link and search paths for object files.
Now we are set to highly optimize our system not only for the specific CPU we are using, but also for higher optimization levels, and faster linking. Exactly how we will do this is the topic of discussion next time.

No comments:
Post a Comment