Coding examples

This section describes example code sequences for fundamental operations such as calling functions, accessing static objects, and transferring control from one part of a program to another. Previous sections discussed how a program may use the machine or the operating system, and they specified what a program may and may not assume about the execution environment. Unlike previous material, the information in this section illustrates how operations may be done, not how they must be done.

As before, examples use the ANSI C language. Other programming languages may use the same conventions displayed below, but failure to do so does not prevent a program from conforming to the ABI. Two main object code models are available:

Absolute code

Instructions can hold absolute addresses under this model. To execute properly, the program must be loaded at a specific virtual address, making the program's absolute addresses coincide with the process' virtual addresses.

Position-independent code

Instructions under this model hold relative addresses, not absolute addresses. Consequently, the code is not tied to a specific load address, allowing it to execute properly at various positions in virtual memory.

The following sections describe the differences between these models. When different, code sequences for the models appear together for easier comparison.

Note

The examples below show code fragments with various simplifications. They are intended to explain addressing modes, not to show optimal code sequences or to reproduce compiler output.

Code model overview

When the system creates a process image, the executable file portion of the process has fixed addresses and the system chooses shared object library virtual addresses to avoid conflicts with other segments in the process. To maximize text sharing, shared objects conventionally use position-independent code, in which instructions contain no absolute addresses. Shared object text segments can be loaded at various virtual addresses without having to change the segment images. Thus multiple processes can share a single shared object text segment, even if the segment resides at a different virtual address in each process.

Position-independent code relies on two techniques:

Because z/Architecture provides CIA-relative branch instructions and also branch instructions using registers that hold the transfer address, compilers can satisfy the first condition easily.

A Global Offset Table (GOT), provides information for address calculation. Position-independent object files (executable and shared object files) have a table in their data segment that holds addresses. When the system creates the memory image for an object file, the table entries are relocated to reflect the absolute virtual address as assigned for an individual process. Because data segments are private for each process, the table entries can change – unlike text segments, which multiple processes share.

Two position-independent models give programs a choice between more efficient code with some size restrictions and less efficient code without those restrictions. Because of the processor architecture, a GOT with no more than 512 entries (4096 bytes) is more efficient than a larger one. Programs that need more entries must use the larger, more general code. In the following sections, the term "small model position-independent code" is used to refer to code that assumes the smaller GOT, and "large model position-independent code" is used to refer to the general code.

Function prolog and epilog

This section describes the prolog and epilog code of functions . A function's prolog establishes a stack frame, if necessary, and may save any nonvolatile registers it uses. A function's epilog generally restores registers that were saved in the prolog code, restores the previous stack frame, and returns to the caller.

Prolog

The prolog of a function has to save the state of the calling function and set up the base register for the code of the function body. The following is in general done by the function prolog:

  • Save all registers used within the function which the calling function assumes to be non-volatile.

  • Set up the base register for the literal pool.

  • Allocate stack space by decrementing the stack pointer.

  • Set up the dynamic chain by storing the old stack pointer value at stack location zero if the "back chain" is implemented.

  • Set up the GOT pointer if the compiler is generating position independent code.

    (A function that is position independent will probably want to load a pointer to the GOT into a nonvolatile register. This may be omitted if the function makes no external data references. If external data references are only made within conditional code, loading the GOT pointer may be deferred until it is known to be needed.)

  • Set up the frame pointer if the function allocates stack space dynamically (with alloca).

The compiler tries to do as little as possible of the above; the ideal case is to do nothing at all (for a leaf function without symbolic references).

Epilog

The epilog of a function restores the registers saved in the prolog (which include the stack pointer) and branches to the return address.

Prolog and epilog example

.section  .rodata

          .align 2

.LC0:

          .string "hello, world\n"

.text

          .align 4

.globl    main

          .type  main,@function

main:

                                       # Prolog

          STMG   11,15,88(15)          # Save callers registers

          LARL   13,.LT0_0             # Load literal pool pointer

.section  .rodata                      # Switch for literal pool

          .align 2                     #  to read-only data section

.LT0_0:

.LC2:

          .quad  65536

.LTN0_0:

.text                                  # Back to text section

          LGR    1,15                  # Load stack pointer in GPR 1

          AGHI   15,-160               # Allocate stack space

          STG    1,0(15)               # Store backchain

                                       # Prolog end

          LARL   2,.LC0

          LG     3,.LC2-.LT0_0(13)

          BRASL  14,printf

          LGHI   2,0

                                       # Epilog

          LG     4,272(15)             # Load return address

          LMG    11,15,248(15)         # Restore registers

          BR     4                     # Branch back to caller

                                       # Epilog end

Figure 24. Prolog and epilog example

Profiling

This section shows a way of providing profiling (entry counting) on zSeries systems. An ABI-conforming system is not required to provide profiling; however if it does this is one possible (not required) implementation.

If a function is to be profiled it has to call the _mcount routine after the function prolog. This routine has a special linkage. It gets an address in register 1 and returns without having changed any register. The address is a pointer to a word-aligned one-word static data area, initialized to zero, in which the _mcount routine is to maintain a count of the number of times the function is called.

For example Figure 25 shows how the code after the function prolog may look.

          STMG    7,15,56(15)          #

Save callers registers

          LGR     1,15                 # Stack pointer

          AGHI    15,-160              # Allocate new

          STG     1,0(15)              # Save backchain

          LGR     11,15                # Local stack pointer

          .data

          .align 4

.LP0:     .quad   0                    # Profile counter

          .text

                                       # Function profiler

          STG   14,8(15)               # Preserve r14

          LARL  1,.LPO                 # Load address of profile counter

          BRASL 14,_mcount             # Branch to _mcount

          LG    14,8(15)               # Restore r14

Figure 25. Code for profiling

Data objects

This section describes only objects with static storage duration. It excludes stack-resident objects because programs always compute their virtual addresses relative to the stack or frame pointers.

Because zSeries instructions cannot hold 64-bit addresses directly, a program has to build an address in a register and access memory through that register. In order to do so a function normally has a literal pool that holds the addresses of data objects used by the function. Register 13 is set up in the function prolog to point to the start of this literal pool.

Position-independent code cannot contain absolute addresses. In order to access a local symbol the literal pool contains the (signed) offset of the symbol relative to the start of the pool. Combining the offset loaded from the literal pool with the address in register 13 gives the absolute address of the local symbol. In the case of a global symbol the address of the symbol has to be loaded from the Global Offset Table. The offset in the GOT can either be contained in the instruction itself or in the literal pool. See Figure 26 for an example.

Figure 26 through Figure 28 show sample assembly language equivalents to C language code for absolute and position-independent compilations. It is assumed that all shared objects are compiled as position-independent and only executable modules may have absolute addresses. The code in the figures contains many redundant operations as it is only intended to show how each C statement could have been compiled independently of its context. The function prolog is not shown, and it is assumed that it has loaded the address of the literal pool in register 13.

Table 15.

C

zSeries machine instructions (Assembler)

extern int src;

extern int dst;

extern int *ptr;



dst = src;







ptr = &dst;







*ptr = src;








           LARL  1,dst

           LARL  2,src

           MVC   0(4,1),0(2)



           LARL  1,ptr

           LARL  2,dst

           STG   2,0(1)



           LARL  2,ptr

           LG    1,0(2)

           LARL  2,src

           MVC  

0(4,1),0(2)

Figure 26. Absolute addressing

Table 16.

C

zSeries machine instructions (Assembler)

extern int src;

extern int dst;

extern int *ptr;



dst = src;











ptr = &dst;









*ptr = src;








           LARL  12,_GLOBAL_OFFSET_TABLE_

           LG    1,dst@GOT12(12)

           LG    2,src@GOT12(12)

           LGF   3,0(2)

           ST    3,0(1)



           LARL  12,_GLOBAL_OFFSET_TABLE_

           LG    1,ptr@GOT12(12)

           LG    2,dst@GOT12(12)

           STG   2,0(1)



           LARL  12,_GLOBAL_OFFSET_TABLE_

           LG    2,ptr@GOT12(12)

           LG    1,0(2)

           LG    2,src@GOT12(12)

           LGF   3,0(2)

           ST   

3,0(1)

Figure 27. Small model position-independent addressing

Table 17.

C

zSeries Assembler

extern int src;

extern int dst;

extern int *ptr;



dst = src;











ptr = &dst;











*ptr = src;








           LARL  2,dst@GOT

           LG    2,0(2)

           LARL  3,src@GOT

           LG    3,0(3)

           MVC   0(4,2),0(3)



           LARL  2,ptr@GOT

           LG    2,0(2)

           LARL  3,dst@GOT

           LG    3,0(3)

           STG   3,0(2)



           LARL  2,ptr@GOT

           LG    2,0(2)

           LG    1,0(2)

           LARL  3,src@GOT

           LG    3,0(3)

           MVC  

0(4,1),0(3)

Figure 28. Large model position-independent addressing

Function calls

Programs can use the z/Architecture BRASL instruction to make direct function calls. A BRASL instruction has a self-relative branch displacement that can reach 4 GBytes in either direction. To call functions beyond this limit (inter-module calls) load the address in a register and use the BASR instruction for the call. Register 14 is used as the first operand of BASR to hold the return address as shown in Figure 29.

The called function may be in the same module (executable or shared object) as the caller, or it may be in a different module. In the former case, if the called function is not in a shared object, the linkage editor resolves the symbol. In all other cases the linkage editor cannot directly resolve the symbol. Instead the linkage editor generates "glue" code and resolves the symbol to point to the glue code. The dynamic linker will provide the real address of the function in the Global Offset Table. The glue code loads this address and branches to the function itself. See the Section called Procedure Linkage Table in the Chapter called Program loading and dynamic linking for more details.

Table 18.

C

zSeries machine instructions (Assembler)

extern void func();

extern void (*ptr)();



ptr = func;







func();



(*ptr) ();






           LARL  1,ptr

           LARL  2,func

           STG   2,0(1)



           BRASL 14,func



           LARL  1,ptr

           LG    1,0(1)

           BASR 

14,1

Figure 29. Absolute function call

Table 19.

C

zSeries machine instructions (Assembler)

extern void func();

extern void (*ptr)();



ptr = func;









func();



(*ptr) ();






           LARL  12,_GLOBAL_OFFSET_TABLE_

           LG    1,ptr@GOT12(12)

           LG    2,func@GOT12(12)

           STG   2,0(1)



           BRASL 14,func@PLT



           LARL  12,_GLOBAL_OFFSET_TABLE_

           LG    1,ptr@GOT12(12)

           LG    1,0(1)

           BASR 

14,1

Figure 30. Small model position-independent function call

Table 20.

C

zSeries machine instructions (Assembler)

extern void func();

extern void (*ptr)();



ptr = func;











func();



(*ptr) ();






           LARL  2,ptr@GOT

           LG    2,0(2)

           LARL  3,func@GOT

           LG    3,0(3)

           STG   3,0(2)

           

           BRASL 14,func@PLT



           LARL  2,ptr@GOT

           LG    2,0(2)

           LG    2,0(2)

           BASR 

14,2

Figure 31. Large model position-independent function call

Branching

Programs use branch instructions to control their execution flow. z/Architecture has a variety of branch instructions. The most commonly used of these performs a self-relative jump with a 128-Kbyte range (up to 64 Kbytes in either direction). For large functions another self-relative jump is available with a range of 4 Gbytes (up to 2 Gbytes in either direction).

Table 21.

C

zSeries machine instructions (Assembler)

label:

        ...

        goto label;

        ...

        ...

        ...

farlabel:

        ...

        ...

        ...

        goto farlabel;

.L01:

           ...

           BRC 15,.L01

           ...

           ...

           ...

.L02:

           ...

           ...

           ...

           BRCL

15,.L02

Figure 32. Branch instruction

C language switch statements provide multi-way selection. When the case labels of a switch statement satisfy grouping constraints the compiler implements the selection with an address table. The following examples use several simplifying conventions to hide irrelevant details:

  1. The selection expression resides in register 2.

  2. The case label constants begin at zero.

  3. The case labels, the default, and the address table use assembly names .Lcasei, .Ldef and .Ltab respectively.

Table 22.

C

zSeries machine instructions (Assembler)

switch(j)

{

case 0:

        ...

case 1:

        ...

case 3:

        ...

default:

}

           LGHI  1,3

           CLGR  2,1

           BRC   2,.Ldef

           SLLG  2,2,3

           LARL  1,.Ltab

           LG    3,0(1,2)

           BR    3

.Ltab:     .quad .Lcase0

           .quad .Lcase1

           .quad .Ldef

           .quad

.Lcase3

Figure 33. Absolute switch code

Table 23.

C

zSeries machine instructions (Assembler)

switch(j)

{

case 0:

        ...

case 1:

        ...

case 3:

        ...

default:

}

                              # Literal

pool

.LT0:



                              # Code

            LGHI  1,3

            CLGR  2,1

            BRC   2,.Ldef

            SLLG  2,2,3

            LARL  1,.Ltab

            LG    3,0(1,2)

            AGR   3,13

            BR    3

.Ltab:      .quad .Lcase0-.LT0

            .quad .Lcase1-.LT0

            .quad .Ldef-.LT0

            .quad

.Lcase3-.LT0

Figure 34. Position-independent switch code, all models

Dynamic stack space allocation

The GNU C compiler, and most recent compilers, support dynamic stack space allocation via alloca.

Figure 35 shows the stack frame before and after dynamic stack allocation. The local variables area is used for storage of function data, such as local variables, whose sizes are known to the compiler. This area is allocated at function entry and does not change in size or position during the function's activation.

The parameter list area holds "overflow" arguments passed in calls to other functions. (See the OTHER label in the Section called Parameter passing.) Its size is also known to the compiler and can be allocated along with the fixed frame area at function entry. However, the standard calling sequence requires that the parameter list area begin at a fixed offset (160) from the stack pointer, so this area must move when dynamic stack allocation occurs.

Data in the parameter list area are naturally addressed at constant offsets from the stack pointer. However, in the presence of dynamic stack allocation, the offsets from the stack pointer to the data in the local variables area are not constant. To provide addressability a frame pointer is established to locate the local variables area consistently throughout the function's activation.

Dynamic stack allocation is accomplished by "opening" the stack just above the parameter list area. The following steps show the process in detail:

  1. After a new stack frame is acquired, and before the first dynamic space allocation, a new register, the frame pointer or FP, is set to the value of the stack pointer. The frame pointer is used for references to the function's local, non-static variables. The frame pointer does not change during the execution of a function, even though the stack pointer may change as a result of dynamic allocation.

  2. The amount of dynamic space to be allocated is rounded up to a multiple of 8 bytes, so that 8-byte stack alignment is maintained.

  3. The stack pointer is decreased by the rounded byte count, and the address of the previous stack frame (the back chain) may be stored at the word addressed by the new stack pointer. The back chain is not necessary to restore from this allocation at the end of the function since the frame pointer can be used to restore the stack pointer.

Figure 35 is a snapshot of the stack layout after the prolog code has dynamically extended the stack frame.

Figure 35. Dynamic Stack Space Allocation

The above process can be repeated as many times as desired within a single function activation. When it is time to return, the stack pointer is set to the value of the back chain, thereby removing all dynamically allocated stack space along with the rest of the stack frame. Naturally, a program must not reference the dynamically allocated stack area after it has been freed.

Even in the presence of signals, the above dynamic allocation scheme is "safe." If a signal interrupts allocation, one of three things can happen:

Regardless of when the signal arrives during dynamic allocation, the result is a consistent (though possibly dead) process.