zSeries ELF Application Binary Interface Supplement | ||
---|---|---|
<<< Previous | Low-level system information | Next >>> |
This section describes example code sequences for fundamental operations such as calling functions, accessing static objects, and transferring control from one part of a program to another. Previous sections discussed how a program may use the machine or the operating system, and they specified what a program may and may not assume about the execution environment. Unlike previous material, the information in this section illustrates how operations may be done, not how they must be done.
As before, examples use the ANSI C language. Other programming languages may use the same conventions displayed below, but failure to do so does not prevent a program from conforming to the ABI. Two main object code models are available:
Instructions can hold absolute addresses under this model. To execute properly, the program must be loaded at a specific virtual address, making the program's absolute addresses coincide with the process' virtual addresses.
Instructions under this model hold relative addresses, not absolute addresses. Consequently, the code is not tied to a specific load address, allowing it to execute properly at various positions in virtual memory.
The following sections describe the differences between these models. When different, code sequences for the models appear together for easier comparison.
The examples below show code fragments with various simplifications. They are intended to explain addressing modes, not to show optimal code sequences or to reproduce compiler output. |
When the system creates a process image, the executable file portion of the process has fixed addresses and the system chooses shared object library virtual addresses to avoid conflicts with other segments in the process. To maximize text sharing, shared objects conventionally use position-independent code, in which instructions contain no absolute addresses. Shared object text segments can be loaded at various virtual addresses without having to change the segment images. Thus multiple processes can share a single shared object text segment, even if the segment resides at a different virtual address in each process.
Position-independent code relies on two techniques:
Control transfer instructions hold addresses relative to the Current Instruction Address (CIA), or use registers that hold the transfer address. A CIA-relative branch computes its destination address in terms of the CIA, not relative to any absolute address.
When the program requires an absolute address, it computes the desired value. Instead of embedding absolute addresses in instructions (in the text segment), the compiler generates code to calculate an absolute address (in a register or in the stack or data segment) during execution.
Because z/Architecture provides CIA-relative branch instructions and also branch instructions using registers that hold the transfer address, compilers can satisfy the first condition easily.
A Global Offset Table (GOT), provides information for address calculation. Position-independent object files (executable and shared object files) have a table in their data segment that holds addresses. When the system creates the memory image for an object file, the table entries are relocated to reflect the absolute virtual address as assigned for an individual process. Because data segments are private for each process, the table entries can change – unlike text segments, which multiple processes share.
Two position-independent models give programs a choice between more efficient code with some size restrictions and less efficient code without those restrictions. Because of the processor architecture, a GOT with no more than 512 entries (4096 bytes) is more efficient than a larger one. Programs that need more entries must use the larger, more general code. In the following sections, the term "small model position-independent code" is used to refer to code that assumes the smaller GOT, and "large model position-independent code" is used to refer to the general code.
This section describes the prolog and epilog code of functions . A function's prolog establishes a stack frame, if necessary, and may save any nonvolatile registers it uses. A function's epilog generally restores registers that were saved in the prolog code, restores the previous stack frame, and returns to the caller.
The prolog of a function has to save the state of the calling function and set up the base register for the code of the function body. The following is in general done by the function prolog:
Save all registers used within the function which the calling function assumes to be non-volatile.
Set up the base register for the literal pool.
Allocate stack space by decrementing the stack pointer.
Set up the dynamic chain by storing the old stack pointer value at stack location zero if the "back chain" is implemented.
Set up the GOT pointer if the compiler is generating position independent code.
(A function that is position independent will probably want to load a pointer to the GOT into a nonvolatile register. This may be omitted if the function makes no external data references. If external data references are only made within conditional code, loading the GOT pointer may be deferred until it is known to be needed.)
Set up the frame pointer if the function allocates stack space dynamically (with alloca).
The compiler tries to do as little as possible of the above; the ideal case is to do nothing at all (for a leaf function without symbolic references).
The epilog of a function restores the registers saved in the prolog (which include the stack pointer) and branches to the return address.
.section .rodata
.align 2
.LC0:
.string "hello, world\n"
.text
.align 4
.globl main
.type main,@function
main:
# Prolog
STMG 11,15,88(15) # Save callers registers
LARL 13,.LT0_0 # Load literal pool pointer
.section .rodata # Switch for literal pool
.align 2 # to read-only data section
.LT0_0:
.LC2:
.quad 65536
.LTN0_0:
.text # Back to text section
LGR 1,15 # Load stack pointer in GPR 1
AGHI 15,-160 # Allocate stack space
STG 1,0(15) # Store backchain
# Prolog end
LARL 2,.LC0
LG 3,.LC2-.LT0_0(13)
BRASL 14,printf
LGHI 2,0
# Epilog
LG 4,272(15) # Load return address
LMG 11,15,248(15) # Restore registers
BR 4 # Branch back to caller
# Epilog end
Figure 24. Prolog and epilog example
This section shows a way of providing profiling (entry counting) on zSeries systems. An ABI-conforming system is not required to provide profiling; however if it does this is one possible (not required) implementation.
If a function is to be profiled it has to call the _mcount routine after the function prolog. This routine has a special linkage. It gets an address in register 1 and returns without having changed any register. The address is a pointer to a word-aligned one-word static data area, initialized to zero, in which the _mcount routine is to maintain a count of the number of times the function is called.
For example Figure 25 shows how the code after the function prolog may look.
STMG 7,15,56(15) #
Save callers registers
LGR 1,15 # Stack pointer
AGHI 15,-160 # Allocate new
STG 1,0(15) # Save backchain
LGR 11,15 # Local stack pointer
.data
.align 4
.LP0: .quad 0 # Profile counter
.text
# Function profiler
STG 14,8(15) # Preserve r14
LARL 1,.LPO # Load address of profile counter
BRASL 14,_mcount # Branch to _mcount
LG 14,8(15) # Restore r14
Figure 25. Code for profiling
This section describes only objects with static storage duration. It excludes stack-resident objects because programs always compute their virtual addresses relative to the stack or frame pointers.
Because zSeries instructions cannot hold 64-bit addresses directly, a program has to build an address in a register and access memory through that register. In order to do so a function normally has a literal pool that holds the addresses of data objects used by the function. Register 13 is set up in the function prolog to point to the start of this literal pool.
Position-independent code cannot contain absolute addresses. In order to access a local symbol the literal pool contains the (signed) offset of the symbol relative to the start of the pool. Combining the offset loaded from the literal pool with the address in register 13 gives the absolute address of the local symbol. In the case of a global symbol the address of the symbol has to be loaded from the Global Offset Table. The offset in the GOT can either be contained in the instruction itself or in the literal pool. See Figure 26 for an example.
Figure 26 through Figure 28 show sample assembly language equivalents to C language code for absolute and position-independent compilations. It is assumed that all shared objects are compiled as position-independent and only executable modules may have absolute addresses. The code in the figures contains many redundant operations as it is only intended to show how each C statement could have been compiled independently of its context. The function prolog is not shown, and it is assumed that it has loaded the address of the literal pool in register 13.
Figure 27. Small model position-independent addressing
Programs can use the z/Architecture BRASL instruction to make direct function calls. A BRASL instruction has a self-relative branch displacement that can reach 4 GBytes in either direction. To call functions beyond this limit (inter-module calls) load the address in a register and use the BASR instruction for the call. Register 14 is used as the first operand of BASR to hold the return address as shown in Figure 29.
The called function may be in the same module (executable or shared object) as the caller, or it may be in a different module. In the former case, if the called function is not in a shared object, the linkage editor resolves the symbol. In all other cases the linkage editor cannot directly resolve the symbol. Instead the linkage editor generates "glue" code and resolves the symbol to point to the glue code. The dynamic linker will provide the real address of the function in the Global Offset Table. The glue code loads this address and branches to the function itself. See the Section called Procedure Linkage Table in the Chapter called Program loading and dynamic linking for more details.
Programs use branch instructions to control their execution flow. z/Architecture has a variety of branch instructions. The most commonly used of these performs a self-relative jump with a 128-Kbyte range (up to 64 Kbytes in either direction). For large functions another self-relative jump is available with a range of 4 Gbytes (up to 2 Gbytes in either direction).
C language switch statements provide multi-way selection. When the case labels of a switch statement satisfy grouping constraints the compiler implements the selection with an address table. The following examples use several simplifying conventions to hide irrelevant details:
The selection expression resides in register 2.
The case label constants begin at zero.
The case labels, the default, and the address table use assembly names .Lcasei, .Ldef and .Ltab respectively.
The GNU C compiler, and most recent compilers, support dynamic stack space allocation via alloca.
Figure 35 shows the stack frame before and after dynamic stack allocation. The local variables area is used for storage of function data, such as local variables, whose sizes are known to the compiler. This area is allocated at function entry and does not change in size or position during the function's activation.
The parameter list area holds "overflow" arguments passed in calls to other functions. (See the OTHER label in the Section called Parameter passing.) Its size is also known to the compiler and can be allocated along with the fixed frame area at function entry. However, the standard calling sequence requires that the parameter list area begin at a fixed offset (160) from the stack pointer, so this area must move when dynamic stack allocation occurs.
Data in the parameter list area are naturally addressed at constant offsets from the stack pointer. However, in the presence of dynamic stack allocation, the offsets from the stack pointer to the data in the local variables area are not constant. To provide addressability a frame pointer is established to locate the local variables area consistently throughout the function's activation.
Dynamic stack allocation is accomplished by "opening" the stack just above the parameter list area. The following steps show the process in detail:
After a new stack frame is acquired, and before the first dynamic space allocation, a new register, the frame pointer or FP, is set to the value of the stack pointer. The frame pointer is used for references to the function's local, non-static variables. The frame pointer does not change during the execution of a function, even though the stack pointer may change as a result of dynamic allocation.
The amount of dynamic space to be allocated is rounded up to a multiple of 8 bytes, so that 8-byte stack alignment is maintained.
The stack pointer is decreased by the rounded byte count, and the address of the previous stack frame (the back chain) may be stored at the word addressed by the new stack pointer. The back chain is not necessary to restore from this allocation at the end of the function since the frame pointer can be used to restore the stack pointer.
Figure 35 is a snapshot of the stack layout after the prolog code has dynamically extended the stack frame.
The above process can be repeated as many times as desired within a single function activation. When it is time to return, the stack pointer is set to the value of the back chain, thereby removing all dynamically allocated stack space along with the rest of the stack frame. Naturally, a program must not reference the dynamically allocated stack area after it has been freed.
Even in the presence of signals, the above dynamic allocation scheme is "safe." If a signal interrupts allocation, one of three things can happen:
The signal handler can return. The process then resumes the dynamic allocation from the point of interruption.
The signal handler can execute a non-local goto or a jump. This resets the process to a new context in a previous stack frame, automatically discarding the dynamic allocation.
The process can terminate.
Regardless of when the signal arrives during dynamic allocation, the result is a consistent (though possibly dead) process.
<<< Previous | Home | Next >>> |
Process initialization | Up | DWARF definition |