the_ELF_object_file_format_by_dissection.doc

资源描述

1、The ELF Object File Format by DissectionMay 01, 1995 By Eric YoungdaleThe Executable and Linking Format has been a popular topic lately, as people ask why the kernel configuration script asks whether or not to configure loading ELF executables. Since ELF will eventually be the common object file for

2、mat for Linux binaries, it is appropriate to document it a bit. Last month, Eric introduced us to ELF, and this month, he gives us a guided tour of real ELF files.Last month, we reached a point where were beginning to dissect some real ELF files. For this, I will use the readelf utility which I wrot

3、e when I was first trying to understand the ELF format itself. Later on, it became a valuable tool for debugging the linker as I added support for ELF. The sources to readelf should be on tsx-11.mit.edu in pub/linux/packages/GCC/src or in pub/linux/BETA/ibcs2.Let us start with a very simple programt

4、he hello world program we used last month.largo% cat hello.cmain()printf(“Hello Worldn“);largo% gcc-elf -c hello.cOn my laptop, the gcc-elf command invokes the ELF version of gcconce ELF becomes the default format, you will be able to use the regular gcc command which produces the ELF file hello.o.

5、Each ELF file starts with a header (struct elfhdr in /usr/include/linux/elf.h), and the readelf utility can display the contents of all of the fields:largo% readelf -h hello.oELF magic:7f 45 4c 46 01 01 01 00 00 00 00 0000 00 00 00Type, machine, version = 1 3 1Entry, phoff, shoff, flags = 0 0 440 0e

6、hsize, phentsize, phnum = 52 0 0shentsize, shnum, shstrndx = 40 11 8The ELF magic field is just a way of unambiguously identifying this as an ELF file. If a file does not contain those 16 bytes in the magic field, it is not an ELF file. The type, machine, and version fields identify this as an ET_RE

7、L file (i.e., an object file) for the i386. The ehsize field is just the sizeof(struct elfhdr).Each ELF file contains a table that describes the sections within the file. The shnum field indicates that there are 11 sections; the shoff field indicates that the section header table starts at byte offs

8、et 440 within the file. The shentsize field indicates that the entry for each section is 40 bytes long. All throughout ELF, the sizes of various structures are always explicitly stated. This allows for flexibility; the structures can be expanded as required for some hardware platforms and the standa

9、rd ELF tools do not have to know about this to be able to make sense of the binary. Also, it allows room for future expansion of the structures by newer versions of the standard.largo% readelf -S hello.oThere are 11 section headers, starting at offset 1b8:0 NULL 00000000 00000 00000 00 / 0 0 0 01 .t

10、ext PROGBITS 00000000 00040 00014 00 / 6 0 0 102 .rel.text REL 00000000 00370 00010 08 / 0 9 1 43 .data PROGBITS 00000000 00054 00000 00 / 3 0 0 44 .bss NOBITS 00000000 00054 00000 00 / 3 0 0 45 .note NOTE 00000000 00054 00014 00 / 0 0 0 16 .rodata PROGBITS 00000000 00068 0000d 00 / 2 0 0 17 .commen

11、t PROGBITS 00000000 00075 00012 00 / 0 0 0 18 .shstrtab STRTAB 00000000 00087 0004d 00 / 0 0 0 19 .symtab SYMTAB 00000000 000d4 000c0 10 / 0 a a 4a .strtab STRTAB 00000000 00194 00024 00 / 0 0 0 1Listing 1. Section Table for hello.oEach section header is just a struct ELF32_Shdr. You may notice that

12、 the name field is just a numberthis is not a pointer, but an offset into the .shstrtab section (we can find the index of the .shstrtab section from the file header in the shstrndx field). Thus we find the name of each section at the specified offset within the .shstrtab section. Let us dump the sec

13、tion table for this file; see figure 1. You will notice sections for nearly everything which we have already discussed. Each section has an identifier which specifies what the section contains (in general, you should never have to actually know the name of a section or compare it to anything).After

14、the type, there is a series of numbers. The first of these is the address in virtual memory where this section should be loaded. Since this is a .o file, it is not intended to be loaded into virtual memory, and this field is not filled in. Next is the offset within the file of the section, and then

15、is the size of the section. After this come a series of numbersI wont parse these in detail for you, but they contain things like the required alignment of the section, a set of flags which indicate whether the section is read-only, writable, and/or executable.The readelf program is capable of perfo

16、rming disassembly:largo% readelf -i 1 hello.o0x00000000 pushl %ebp0x00000001 movl %esp,%ebp0x00000003 pushl $0x00x00000008 call 0x080075590x0000000d addl $4,%esp0x00000010 movl %ebp,%esp0x00000012 popl %ebp0x00000013 retThe .rel.text section contains the relocations for the .text section of the file

17、, and we can display them as follows:largo% readelf -r hello.oRelocation section data:.rel.text (0x2 entries)Tag: 00004 Value 00301 R_386_32 (0 )Tag: 00009 Value 00b02 R_386_PC32 (0 printf)This indicates that the .text section has two relocations. As expected, there is a relocation for printf indica

18、ting that we must patch the address of printf into offset 9 from the beginning of the .text section, which happens to be the operand of the call instruction. There is also a relocation so that we pass the correct address to printf.Now let us see what happens when this file is linked into an executab

19、le. The section table now looks something like Listing 2.largo% readelf -S helloThere are 22 section headers, starting at offset 6d4:0 NULL 00000000 00000 00000 00 / 0 0 0 01 .interp PROGBITS 080000d4 000d4 00017 00 / 2 0 0 12 .hash HASH 080000ec 000ec 00094 04 / 2 3 0 43 .dynsym DYNSYM 08000180 001

20、80 00120 10 / 2 4 1 44 .dynstr STRTAB 080002a0 002a0 000b5 00 / 2 0 0 15 .rel.bss REL 08000358 00358 00010 08 / 2 3 11 46 .rel.plt REL 08000368 00368 00028 08 / 2 3 8 47 .init PROGBITS 08000390 00390 00008 00 / 6 0 0 108 .plt PROGBITS 08000398 00398 00060 04 / 6 0 0 49 .text PROGBITS 08000400 00400

21、000f4 00 / 6 0 0 10a .fini PROGBITS 08000500 00500 00008 00 / 6 0 0 10b .rodata PROGBITS 08000508 00508 0000d 00 / 2 0 0 1c .data PROGBITS 08001518 00518 00004 00 / 3 0 0 4d .ctors PROGBITS 0800151c 0051c 00008 00 / 3 0 0 4e .dtors PROGBITS 08001524 00524 00008 00 / 3 0 0 4f .got PROGBITS 0800152c 0

22、052c 00020 04 / 3 0 0 410 .dynamic DYNAMIC 0800154c 0054c 00098 08 / 3 4 0 411 .bss NOBITS 080015e4 005e4 00008 00 / 3 0 0 412 .comment PROGBITS 00000000 005e4 00056 00 / 0 0 0 113 .shstrtab STRTAB 00000000 0063a 0009a 00 / 0 0 0 114 .symtab SYMTAB 00000000 00a44 003c0 10 / 0 15 28 415 .strtab STRTA

23、B 00000000 00e04 001a7 00 / 0 0 0 1Figure 2. Section Table for hello.oWhen Linked Into an ExecutableListing 2. Section Table for hello.o When Linked Into an ExecutableThe first thing you will notice is a lot more sections than were in the simple .o file. Much of this because this file requires the E

24、LF shared library libc.so.1.At this point I should mention the mechanics of what happens when you run an ELF program. The kernel looks through the binary and loads it into the users virtual memory. If the application is linked to a shared library, the application will also contain the name of the dy

25、namic linker that should be used. The kernel then transfers control to the dynamic linker, not to the application. The dynamic loader is responsibile for first initializing itself, loading the shared libraries into memory, resolving all remaining relocations, and then transferring control to the app

26、lication.Going back to our executable, the .interp section simply contains an ASCII string that is the name of the dynamic loader. Currently this will always be /lib/elf/ld-linux.so.1 (the dynamic loader itself is also an ELF shared library).Next you will notice 3 sections, called .hash, .dynsym, an

27、d .dynstr. This is a minimal symbol table used by the dynamic linker when performing relocations. You will notice that these sections are mapped into virtual memory (the virtual address field is non-zero). At the very end of the image are the regular symbol and string tables, and these are not mappe

28、d into virtual memory by the loader. The .hash section is just a hash table that is used so that we can quickly locate a given symbol in the .dynsym section, thereby avoiding a linear search of the symbol table. A given symbol can typically be located in one or two tries through the use of the hash

29、table.The next section I want to mention is the .plt section. This contains the jump table that is used when we call functions in the shared library. By default the .plt entries are all initialized by the linker not to point to the correct target functions, but instead to point to the dynamic loader

30、 itself. Thus, the first time you call any given function, the dynamic loader looks up the function and fixes the target of the .plt so that the next time this .plt slot is used we call the correct function. After making this change, the dynamic loader calls the function itself.This feature is known

31、 as lazy symbol binding. The idea is that if you have lots of shared libraries, it could take the dynamic loader lots of time to look up all of the functions to initialize all of the .plt slots, so it would be preferable to defer binding addresses to the functions until we actually need them. This t

32、urns out to be a big win if you only end up using a small fraction of the functions in a shared library. It is possible to instruct the dynamic loader to bind addresses to all of the .plt slots before transferring control to the applicationthis is done by setting the environment variable LD_BIND_NOW

33、=1 before running the program. This turns out to be useful in some cases when you are debugging a program, for example. Also, I should point out that the .plt is in read-only memory. Thus the addresses used for the target of the jump are actually stored in the .got section. The .got also contains a

34、set of pointers for all of the global variables that are used within a program that come from a shared library.The .dynamic section contains some shorthand notes used by the dynamic loader. You will notice that the section table is not itself loaded into virtual memory, and in fact it would not be g

35、ood for performance for the dynamic loader to have to try to parse it to figure out what needs to be done. The .dynamic section is essentially just a distilled version of the section header table that contains just what is needed for the dynamic loader to do its job.You will notice that since the se

36、ction header table is not loaded into memory, neither the kernel nor the dynamic loader will be able to use that table when loading files into memory. A shorthand table of program headers is added to provide a distilled version of the section table containing just the information required to load a

37、file into memory. For the above file it looks something like:largo% readelf -l helloElf file is ExecutableEntry point 0x8000400There are 5 program headers, starting at offset 34:PHDR 0x00034 0x08000034 0x000a0 0x000a0 R EInterp 0x000d4 0x080000d4 0x00017 0x00017 RRequesting program interpreter /lib/

38、elf/ld-linux.so.1Load 0x00000 0x08000000 0x00515 0x00515 R ELoad 0x00518 0x08001518 0x000cc 0x000d4 RWDynamic 0x0054c 0x0800154c 0x00098 0x00098 RWShared library: libc.so.4 1As you can see, the program header contains a pointer to the name of the dynamic loader, instructions on what portions of the

39、file are to be loaded into virtual memory (and the virtual addresses they should be loaded to), the permissions of the segments of memory, and finally a pointer to the .dynamic section that the dynamic loader will need. Note that the list of required shared libraries is stored in the .dynamic sectio

40、n.I will not pick apart an ELF shared library for you herelibraries look quite similar to ELF executables. If you are interested, you can get the readelf utility and pick apart your own libraries.At the start of this article, I said one reason we were switching to ELF was that it was easier to build

41、 shared libraries with ELF. I will now demonstrate how. Consider two files:largo% cat hello1.cmain()greet();largo% cat english.cgreet()printf(“Hello Worldn“);The idea is that we want to build a shared library from english.c, and link hello1 against it. The commands to generate the shared library are

42、:largo% gcc-elf -fPIC -c english.clargo% gcc-elf -shared -o libenglish.so english.oThats all there is to it. Now we compile and link the hello1 program:largo% gcc-elf -c hello1.clargo% gcc-elf -o hello1 hello1.o -L. -lenglishAnd finally we can run the program. Normally the dynamic loader only looks

43、in certain locations for shared libraries, and the current directory is not one of the places it normally looks. Thus to run the program, you can use a command like:largo% LD_LIBRARY_PATH=. ./hello1Hello WorldThe environment variable LD_LIBRARY_PATH tells the dynamic loader to look in additional pla

44、ces for the shared libraries (this feature is disabled for setuid programs for security reasons).To avoid having to specify LD_LIBRARY_PATH, you have several options. You could copy your shared library to /lib/elf, but you can also link your program in the following way:largo% gcc-elf -o hello1 hell

45、o1.o /home/joe/libenglish.solargo% ./hello1Hello WorldTo build more complicated shared libraries, the procedure is not really that much different. Everything that you want to put into the shared library should be compiled with -fPIC; when you have compiled everything, you just link it all together w

46、ith the gcc -shared command.The procedure is so much simpler mainly because we bind addresses to functions at runtime. With a.out libraries, the addresses are bound at link time. This means that lots of special care must be taken to ensure that the .plt and .got have sufficient room for future expan

47、sion and that we keep the variables at the same addresses from one version of the library to the next. The tools for building a.out libraries help ensure all of this, but it makes the build procedure much more complicated.ELF offers one further feature that is not easily available with a.out. The dl

48、open() function can be used to dynamically load a shared library into the users memory, and you are then able to call the dynamic loader to find symbols within this shared libraryin other words, you can call functions that are defined in these modules. In addition, the dynamic loader is used to resolve any und

展开阅读全文