Header information: overall information about the file, such as the size of the code, name of the source file it was translated from, and creation date. Object code: Binary instructions and data generated by a compiler or assembler. Relocation: A list of the places in the object code that have to be fixed up when the linker changes the addresses of the object code. Symbols: Global symbols defined in this module, symbols to be imported from other modules or defined by the linker. Debugging information: Other information about the object code not needed for linking but of use to a debugger. This includes source file and line number information, local symbols, descriptions of data structures used by the object code such as C structure definitions.
Header – here, information on the target architecture and different options of the further file contents interpretation are stored. Load commands — these commands inform how and where to load Mach-O parts: segments (see below), symbol tables, and also informs which libraries this file depends on to load them first. Segments — these describe regions of memory where to load sections with the code or data.
Everything begins from a magic value (0xFEEDFACE or vice versa, depending on the agreement concerning the order of bytes in machine words). Then, the processor architecture type, number and size of load commands, and flags that describe other specifics are defined.
The existing load commands are listed below: LC_SEGMENT — contains different information on a certain segment: size, number of sections, offset in the file and in memory (after the load) LC_SYMTAB — loads the table of symbols and strings LC_DYSYMTAB — creates an import table; data on symbols is taken from the symbol table LC_LOAD_DYLIB — defines the dependency from a certain third-party library For example (for 32- and 64-bit versions, correspondingly) The most important segments are the following: __TEXT — the executed code and other read-only data __DATA — data available for writing; including import tables that can be changed by the dynamic loader during lazy binding __OBJC — different information of the standard library of Objective-C language of execution time __IMPORT — import table only for 32-bit architecture (I managed to generate it only on Mac OS 10.5) __LINKEDIT — here, the dynamic loader places its data for already loaded modules (symbol tables, string tables, etc.) The most interesting sections in the listed segments are the following: __TEXT,__text — the code itself __TEXT,__cstring — constant strings (in double quotes) __TEXT,__const — different constants __DATA,__data — initialized variables (strings and arrays) __DATA,__la_symbol_ptr — table of pointers to imported functions __DATA,__bss — non-initialized static variables
Of course, it’s worth mentioning that executable files and libraries “have learned” to store several variants of the executable code at once. It is due to the repeated gradual change of target architectures by the Apple Company (Motorola -> IBM -> Intel). In the general case, such files are called fat binary. In fact, these are several Mach-O gathered in one file but the header of the last is special. It contains information on the number and type of supported architectures and the offsets to each of them. Simple Mach-O with the structure described above are located by such offset. Where magic means 0xCAFEBABE (or vice versa, we should remember about different order of bytes in machine words on different processors). And then, exactly nfat_arch (number) structures of the described below type follow
Welcome to __TEXT, __symbol_stub1. This table is a set of JMP instructions for each imported function. In our case, we have only one such instruction that is presented above.
Each such instruction performs a jump to the address that is defined in the corresponding cell of the __DATA, __la_symbol_ptr table. The last one is an import table for this Mach-O
We get into the __TEXT, __stub_helper section. Actually, it’s a PLT (Procedure Linkage Table) for Mach-O. By means of the first instruction (in our case, it’s LEA in the connective with R11 but it could also be a simple PUSH), the dynamic linker remembers, which symbol requires the relocation. The second instruction always leads to one and the same address – to the beginning of the function - __dyld_stub_binding_helper, which will perform linking
After the dynamic linker performs relocations for puts(), the corresponding cell in __DATA, __la_symbol_ptr will look like the following: And this is the address of the puts() function from the libSystem.B.dylib module. It means that we will receive the required effect of the call redirection by replacing the address with our own one.
Now let's get armed with Mach-O View and explore the files been generated.