C/C++ Development Suite
Login
Introduction
Summary of changes from 5.54 to 5.55
Summary of changes up to 5.60
Summary of changes from 5.61 to 5.64
Halfword memory access
Packed structs
Inline assembler
C99 pragmas
Constant data
Global data controls
New/changed options

Downloads
Downloads Year 2

   

Summary of changes from 5.57 to 5.60

Summary of changes from 5.57 to 5.60

  • Added support for switch on a long long expression.
  • Added complex and imaginary number support (, including C99 Annex G).
  • _Bool (and _Complex/_Imaginary) now available in pcc and C90 modes.
  • MLA instruction used more often, and handled more intelligently.
  • CSE aliasing optimisations take account of "restrict" qualifier.
  • Optimised handling of narrow (<32 bit) data and computations.
  • Will use ARMv5E's SMULxy and SMLAxy instructions where appropriate.
  • Improved checking of printf/scanf format strings.
  • Inlines signbit().
  • Improved compilation of (int) longlong, (int) longlong >> 32, and longlong >> 1 or << 1. Also, long long multiplication and division by powers of two transformed into shifts.
  • Can transform integer division by constant into a 32x32->64 multiplication (if available on CPU, and -Otime selected).
  • Pointer subtraction optimised to use only multiply and/or shift.
  • Added inter-statement compile-time evaluation of long long arithmetic.
  • Improved CSE handling, especially for FP constants and comparisons.
  • IEEE 754 conformance improved; generally edging the compiler closer to C99 Annex F.
  • asm keyword recognised in C++ mode.
  • -arch command-line parameter added.
  • Some improvements to treatment of volatile objects.
  • Changes to handling of floating arguments for -apcs /fpregargs.
  • Numerous code generation improvements and bug fixes.
  • Banner and help now sent to stderr instead of stdout.

Extra notes

restrict

Previously the restrict qualifier was recognised, but didn't affect code generation. As of version 5.59, restrict will lead to improved code in some circumstances. See the separate document for detailed examples.

SMULxy/SMLAxy

These instructions, new in architecture 5TE, provide the ability to work on 16-bit signed numbers packed into 32-bit words, and are of particular use in certain signal processing applications (such as video decompressors).

The compiler can now generate these instructions for multiplications of narrow, signed values. (Previously they could only be accessed from inline assembler).

For example,

      int mul1(short *a)
      {
          return a[1]*a[2] + a[3]*a[0];
      }
will compile as:
   mul1
        LDRH     a2,[a1,#2]
        LDRH     a3,[a1,#4]
        SMULBB   a2,a2,a3
        LDRH     a3,[a1,#6]
        LDRH     a1,[a1,#0]
        SMLABB   a1,a3,a1,a2
        MOV      pc,lr

As this example illustrates, the compiler actually loads shorts into separate registers, limiting its ability to take full advantage of the unpacking ability. To give the compiler a hint, you can either manually unpack the values from ints (using masks and shifts), or use bitfields.

For example:

    struct hl
    {
        signed int l:16,h:16;
    };

    int mul2(struct hl *a)
    {
        return a[0].h*a[1].l + a[1].h*a[0].l;
    }
will compile as:
   mul2
        LDR      a2,[a1,#0]
        LDR      a1,[a1,#4]
        SMULTB   a3,a2,a1
        SMLATB   a1,a1,a2,a3
        MOV      pc,lr

Such packing may result in reduced performance with other operations.

-apcs /fpregargs

The calling convention for /fpregargs has been changed to align better with ARM's later tools. Note that /fpregargs is not normally used under RISC OS. The changes are:
  1. FP arguments are never passed in FP registers for variadic functions. Instead they're passed in integer registers or on the stack. In -pcc mode, FP registers are not used unless the first argument to a function is floating. (See ARM's ATPCS documentation for an explanation of this heuristic.)
  2. Homogenous floating-point structure arguments are now passed in floating-point registers. For example, a structure consisting of 3 floats (only) will be passed in 3 adjacent floating-point registers. This includes complex numbers, which will be passed in a pair of registers. If the appropriate number of registers is not available, then the entire structure is passed on the stack - a structure cannot be split between FP registers and the stack.
  3. Homogenous floating-point structure returns, using __value_in_regs, are now returned in F0-F3, regardless of the APCS setting. Complex numbers are returned in F0 and F1.
-arch The compiler now tunes its output to match processor timing characteristics, depending on the setting of -cpu or the new option -arch. In particular, the number of cycles for LDR and MUL are considered.

The new command-line argument -arch, when used on its own, is equivalent to -cpu, except it only accepts architecture names (eg "-arch 5TE"). When both -cpu and -arch are used simultaneously, the compiler will optimise for the processor/architecture given by -cpu, but generate code that will run on the architecture given by -arch.

This allows the user to request optimising for a new processor while not ruling out the code being run on an older one.

For example:
default code optimised for ARM7, runs on ARMv3
-cpu ARM7TDMI code optimised for ARM7TDMI, runs on ARMv4T
-cpu 5 code optimised for a typical ARMv5 processor, runs on ARMv5
-arch 5 same as -cpu 5
-arch 3 -cpu XScale code optimised for XScale but runs on ARMv3
-cpu ARM2 -arch 5TE code optimised for ARM2, but requires ARMv5TE (nonsensical, but allowed)
-arch and -cpu can be specified in either order. Only the last -cpu and last -arch given on the command line are significant.

© 2003/2006 Castle Technology Ltd 32-bit RISC OS