*
*Home|Chinese|Japanese*About ARM|Forums|Events|News|Employment|Contact Us|Investors*
dotted rule
*ARM - the architecture for the digital worldARM - the architecture for the digital world
search
*
*
***
*MARKETS:PRODUCTS & SOLUTIONS:CONNECTED COMMUNITY:TECHNICAL SUPPORT:DOCUMENTATION*
*
technical support
*
*
****
*.Technical Support
*
*
*>>Home Page*
*
*.Obtaining Support*
*
*.FAQs*
*
**Development Tool FAQs*
**IP FAQs*
**Embedded Software FAQs*
**Artisan Physical IP FAQs (Login Required)*
*
*.Downloads*
*
*.Documentation*
*
*.Training*
*
*.Where To Buy*
*
*.Keil MCU Tools*
*
*.What's New*
*
*.ARM Newsgroups*
*
*.Active Assist On-site Services*
*
*
*
technical support FAQsask ARM*
*

Technical Support Search
*     (Advanced Search)
  FAQs   Documentation   Downloads   Forums

*

 
downarrowarmcc/tcc: Compiler optimizations
Applies to: Compilers, Software Development Toolkit (SDT)

The ARM compilers 'armcc' and 'tcc' are highly optimizing by default to ensure a small code size. The main optimizations carried out by the compilers are:

1. Common Subexpression Elimination (CSE)

Identify common sub-expressions in the code, and use the result for each instance, rather than re-evaluating them each time. For example, code may use the expression 'a+1' in several places, re-evaluating it each time it is used. The compiler will identify this and only evaluate it once, using that value several times. These expressions can be very complex. This is one of the most effective optimizations in the compilers.

2. Loop Invariant Motion (Expression Lifting)

This is the 'lifting of expressions' from loops. The compiler can identify that a particular expression inside a loop actually does not change as the loop is running. The continuous re-evaluation of the expression would be costly, and so the compiler will evaluate it only once.

3. Live range splitting (for dynamic register allocation)

This is the identification of the 'live' state of variables within a program section. For example, a variable could be used in one situation as a counter for a loop, then later as a working variable within a calculation. If these two uses are completely unrelated they can be allocated to different registers. Additionally when a variable is dead (when its value will not be used later), the register to which it has been assigned will be reused.

4. Constant Folding

Replacement of constant expressions with the value that the compiler evaluated for them.

5. Tail Call Optimization and Tail Recursion

A tailcall is a call immediately before a return. Normally this function will be called, and when it returns to the caller, the caller returns again. Tailcall optimization avoids this by restoring the saved registers before jumping to the tailcall. The called function will now return directly to the caller's caller, saving a return sequence.

The compiler also supports tail call recursion, which is possible when the tailcall is made to the same function. In this case it is possible to skip the entry and exit sequence altogether, converting the call into a loop.

Tailcall optimization is done by armcc only (as tcc has a limited branch range), tailcall recursion by both armcc and tcc.

6. Cross Jump Elimination

This is the combining of two or more instances of identical code. For example, multiple returns from functions generate often identical code, and will be optimized to a single return sequence. This optimization mainly saves space, and is disabled when optimizing for time.

7. Table Driven Peepholing

During the compiler's processing of source code into ARM or Thumb code, there is a point at which commonly found code sequences can be replaced with known optimal versions. This is achieved by viewing the code through a window (of some number of instructions) called a 'peephole', and then replacing identified instruction sequences with a 'hand crafted' version. The table of 'peepholes' is constantly growing as 'optimal' sequences are identified and added by ARM engineers.

8. Structure Splitting

Structure splitting is the action of dividing structures into their components. Once accomplished, the components may then be assigned to registers, for faster access. This is a particular advantage when returning a structure from a function, where the whole structure can be returned in registers rather than on the stack.

9. Conditional Execution (or Branch Elimination)

The armcc compiler uses conditional execution to avoid branches. Conditional execution saves both space and execution time, as many branches can be removed. The Thumb instruction set does not support conditional execution, so tcc does not have this optimization.

There is an Application Note which explains the practical uses of these optimizations in real code. See Application Note 34, Writing Efficient C for ARM, in the ARM Application Notes






back to top

*
**
*4 dots*Other ARM Websites
*
shadow *LEGAL STATEMENTshadow