Next: , Previous: Trace File Format, Up: Top


Appendix J .gdb_index section format

This section documents the index section that is created by save gdb-index (see Index Files). The index section is DWARF-specific; some knowledge of DWARF is assumed in this description.

The mapped index file format is designed to be directly mmapable on any architecture. In most cases, a datum is represented using a little-endian 32-bit integer value, called an offset_type. Big endian machines must byte-swap the values before using them. Exceptions to this rule are noted. The data is laid out such that alignment is always respected.

A mapped index consists of several areas, laid out in order.

  1. The file header. This is a sequence of values, of offset_type unless otherwise noted:
    1. The version number, currently 6. Versions 1, 2 and 3 are obsolete. Version 4 uses a different hashing function from versions 5 and 6. Version 6 includes symbols for inlined functions, whereas versions 4 and 5 do not. gdb will only read version 4 and 5 indices if the --use-deprecated-index-sections option is used.
    2. The offset, from the start of the file, of the CU list.
    3. The offset, from the start of the file, of the types CU list. Note that this area can be empty, in which case this offset will be equal to the next offset.
    4. The offset, from the start of the file, of the address area.
    5. The offset, from the start of the file, of the symbol table.
    6. The offset, from the start of the file, of the constant pool.
  2. The CU list. This is a sequence of pairs of 64-bit little-endian values, sorted by the CU offset. The first element in each pair is the offset of a CU in the .debug_info section. The second element in each pair is the length of that CU. References to a CU elsewhere in the map are done using a CU index, which is just the 0-based index into this table. Note that if there are type CUs, then conceptually CUs and type CUs form a single list for the purposes of CU indices.
  3. The types CU list. This is a sequence of triplets of 64-bit little-endian values. In a triplet, the first value is the CU offset, the second value is the type offset in the CU, and the third value is the type signature. The types CU list is not sorted.
  4. The address area. The address area consists of a sequence of address entries. Each address entry has three elements:
    1. The low address. This is a 64-bit little-endian value.
    2. The high address. This is a 64-bit little-endian value. Like DW_AT_high_pc, the value is one byte beyond the end.
    3. The CU index. This is an offset_type value.
  5. The symbol table. This is an open-addressed hash table. The size of the hash table is always a power of 2.

    Each slot in the hash table consists of a pair of offset_type values. The first value is the offset of the symbol's name in the constant pool. The second value is the offset of the CU vector in the constant pool.

    If both values are 0, then this slot in the hash table is empty. This is ok because while 0 is a valid constant pool index, it cannot be a valid index for both a string and a CU vector.

    The hash value for a table entry is computed by applying an iterative hash function to the symbol's name. Starting with an initial value of r = 0, each (unsigned) character `c' in the string is incorporated into the hash using the formula depending on the index version:

    Version 4
    The formula is r = r * 67 + c - 113.
    Versions 5 and 6
    The formula is r = r * 67 + tolower (c) - 113.

    The terminating `\0' is not incorporated into the hash.

    The step size used in the hash table is computed via ((hash * 17) & (size - 1)) | 1, where `hash' is the hash value, and `size' is the size of the hash table. The step size is used to find the next candidate slot when handling a hash collision.

    The names of C++ symbols in the hash table are canonicalized. We don't currently have a simple description of the canonicalization algorithm; if you intend to create new index sections, you must read the code.

  6. The constant pool. This is simply a bunch of bytes. It is organized so that alignment is correct: CU vectors are stored first, followed by strings.

    A CU vector in the constant pool is a sequence of offset_type values. The first value is the number of CU indices in the vector. Each subsequent value is the index of a CU in the CU list. This element in the hash table is used to indicate which CUs define the symbol.

    A string in the constant pool is zero-terminated.