This article is the first in a series of three articles looking at power-saving techniques in chip design.In this issue we look at implementation techniques using the ARM Power Management Kit (PMK), and in subsequent issues we will look at SoC design and software design techniques. ARM is very well known for developing low-power IP technology; ARM processors, fabric, software, reference methodologies and collaborations. Even ARM Physical IP is low-power, and this is enhanced by a collection of physical IP components: the ARM Power Manage-ment Kit (PMK). ARM Physical IP & Low-Power The ARM Artisan® Physical IP product portfolio includes embedded memory, standard cells, general purpose and high speed I/Os, and analog and mixed-signal components. ARM Physical IP products are available at several 65nm low-power processes, including TSMC CLN65LP and the Common Platform (CP) 10SF. These processes are especially important for mobile applications with low-leakage requirements. Products now available include several standard cell libraries and the PMK (see Table 1). 
Table 1: Standard Cell Library OverviewOne way to reduce power is to divide a design into separate power domains. These domains are just segments of a chip that are electrically isolated. This segmentation allows the voltage (and frequency) of each domain can be controlled independently, allowing each domain to be powered off, or to run at different voltages and/or frequencies. Each domain can consist of standard cell blocks with combinations of the above libraries, memories and hard macros – really any type of IP. The ARM Power Management Kit enables these domains and associated low-power techniques. ARM Power Management Kit The Power Management Kit is a collection of standard cells specifically designed to allow implementation of various low power techniques. The PMK is an add-on to the ARM Standard Cell Libraries and is available for all Metro, Advantage and Advantage-HS standard cell libraries at 130nm and below (note that not all three libraries are available at 130nm). The PMK contains the following general cell types: power gates, retention flip-flops, isolation cells, level shifters, always-on buffers, and back-bias (well-bias or well tap) cells. See Table 2. Each component offers a direct or indirect way to control dynamic and/or leakage power consumption, and each will be discussed in more detail. Chip designers can pick and choose which components to use to achieve desired power requirements. 
Table 2: Power Management Techniques - PMK Component UsagePMK Components in Detail Power Gates The power gates are high threshold-voltage switches used to shut off power to a domain, resulting in significant leakage savings. These power gates are implemented as NMOS and PMOS transistors, in the form of standard cells. They connect and disconnect local power from always-on power, making the local power “switch-able.” ARM provides coarse grain power gates that “switch” the power or ground rails of multiple cells at the domain level. There are two primary types of power gates: 1. Header switch • Connects local, switch-able power to always-on power • Available with or without integrated always-on buffer 2. Footer switch • Connects local, switch-able ground to always-on ground • Available with or without integrated always-on buffer Power gates are single height standard cells without abutment restrictions on placement. The power gates with the buffered output enable the creation of a chain of switches without the need for additional drivers. This built-in buffer is kept live and powered by the always-on power and ground. A given power gating cell is also available with different drive strengths (different cell widths) that have different resistances. A domain that uses power gates will have one always-on power supply, one always on ground supply, and one local (switch-able) power or ground supply. Using headers, the switch-able supply is a local power (i.e. VDD). There are several implementation methods for these power gates, as shown in Figures 1-3. 
Figure 1: Power gates place at intersection of power stripes.
Figure 2: Power gates placed in dedicated rows
Figure 3: Power gates placed as "ring"The power gates are designed for placement at the intersection of the always-on and local power and ground stripes (Figure 1). These gates could be used at the ends of rows to switch an entire row of cells. Or, they could be placed in dedicated rows as in Figure 2. Users could place these dedicated rows of power gates, for example, every tenth row. Additionally, power gates could be used to switch a power ring around the block; for example, to gate power from always-on rings to local rings or stripes. Shown in Figure 3, this method is similar to a memory power ring. Only headers or footers need to be used in any given domain. Using both in a given domain is unnecessary – this adds gate leakage plus area overhead with negligible leakage savings. Users should decide on the preferred method for a given domain. Only one type of power gate is needed to cut the leakage path between power and ground. Calculating the number of power gates to place in a domain is fairly straightforward. First, determine a target voltage drop across the power gates. Then, calculate the required current for the block. Each power gate will have a different current-carrying capability for a given voltage drop – this current, and the current required for the block is then used to determine the number of power gates. Static IR drop analysis will ensure the appropriate number of power gates has been placed. Footers will typically (process dependent) occupy less area than headers for the same current-carrying capability. However, thus far, it typically seems more comfortable for users to remove the power supply instead of the ground connection. An issue of concern in any domain whose power is removed, whether gated or powered down off chip, is the in-rush current when the power is restored. One way to control the in-rush current is to sub-divide the power gates in two sections. The first section will consist of about 10% of the power gates. These will turn on first, and should be turned on with a slow slew rate. The second section, 90% of the power gates, could be daisy chained together and come up second with a faster slew rate. Both the headers and footers are provided with (and without) always-on buffers for exactly this reason. Retention Flip-Flops The ARM retention flip-flops are provided to preserve data while power to a domain is gated. These special flops enable the last state to be regenerated after restoring power to a domain. The retention flops are designed for use in a domain power-gated by either a header or a footer. All retention flops consist of two stages: the main stage and the storage stage. The main stage is connected to the local (switch-able) power rail. The storage stage is connected to the always-on power and ground to maintain flip-flop state with minimal leakage current. The storage state uses high threshold voltage transistors. The retention flip-flops have almost identical performance as normal flip-flops, but will require larger area (due to the storage stage) and slightly higher dynamic power. These flops have the same functionality as their standard counterparts during normal operation. Once the retention-enabling signal is asserted, the flip-flop is in retention mode, which means power can be switched off. Once in retention mode, the clock and reset signals have no effect. If a particular setting requires off-chip power to be disconnected, the scan chain can be used to shift data out to an external memory. This requires that additional energy is consumed, but can be worthwhile if power will be off for an extended time. Measurement in silicon has shown that a domain with retention flops in sleep mode via power gates needs several hundred nanoseconds (~300ns depending on design details) to come out of sleep mode with the clock re-enabled. This is an extremely fast turn-on time compared with powering up an external supply & restoring data. Isolation Cells The isolation cells are used between power-gated domains of the same nominal voltage. When a domain is power-gated, the signals within the domain become unknown. Isolation cells prevent propagation of these unknown signals into the powered domain. Isolation cells stem from two basic cell functions: an AND gate and an OR gate. One of the inputs to the isolation cell is a signal from a power-gated domain. The other input is the enable, or control signal. When this enable signal is asserted, the OR-based isolation cell output is high; the AND-based cell output is low. Isolation cells should be used only when two domains are at the same nominal voltage whenever they are powered up. In other words, an isolation cell is used at the input to a domain if that input signal is: • from a source domain that may be powered down and • from a source domain that, powered up, runs at the same voltage as the receiving domain. Level Shifters Voltage level shifters are primarily used between domains running at different voltages. Primarily, these cells “shift” the voltage of one domain – up or down – to the voltage of another domain. Some of the level shifters can also be used between power-gated domains running at different voltages, combining the level shifter and isolation cell functionality into a single cell. There are two main types of level shifters: up shifters and down shifters. Within these two types, there are three basic cell functions used to implement the shifters: a buffer, an AND gate and an OR gate. The AND- and OR-based shifters include the isolation functionality; the buffers only shift voltages. All level shifters have multiple characterization models to account for the change in timing when input and output voltage levels are different. Always-on Buffers The always-on buffers are used to buffer signals in areas where the power rails can be switched off. This family of buffers and inverters, powered by the always-on power supplies, remain live within a power-gated domain. Always-on buffers can be used to buffer a signal that passes through a power-gated domain. Examples of control signals include those signals controlling the power gates and the retention-enabling input signal to the retention flops. Back-bias Cells The special fill cells have a basic function: to provide access to the wells and/or substrate. These cells serve two distinct purposes: 1. Provide power for wells of always-on cells while power is gated; cells include: • retention flip-flops • power gates with buffers • always-on buffers 2. Provide well access such that back (and/or forward) biasing can be implemented Back-bias support can provide both performance and leakage optimization. Currently, library characterizations assume no biasing. Forward-biasing can increase performance; however, this is the least popular way to improve performance. Reverse-biasing reduces leakage. For the highest leakage savings, a domain could be power-gated, then back biased. This would not require any additional characterization because the cells would be inactive. 
Figure 4: Compiled memory showing power gatesBack-biasing effectively changes the threshold voltage of a cell. For this reason, it is often called Variable Threshold CMOS (VTCMOS). Traditionally, the nwell is tied to VDD and the pwell is tied to VSS. Reverse back-biasing ties the nwell to some voltage above VDD and/or ties the pwell to some voltage below VSS. If a well is reverse biased, the threshold voltage of devices in that well is increased which results in decreased leakage. ARM Memories with Power Management ARM Physical IP offerings also include memory compilers with integrated power management. Advantage and Metro memory compilers – at 65nm and below (Metro at 90nm) – have a compile time option to include power gates. By selecting this option, the memory instance is generated with integrated power gates. The power gates are indicated by HVt switches in Figure 4. Note also that CORE VDD and LOGIC VDD (periphery) are two separate power supplies that can remain separate, or can be tied together. If these supplies remain independent, the resulting memory instance has three possible power modes: 1. Standby Mode: traditional low-power mode 2. Retention Mode: enabled by power-gates • Power is supplied to core array to retain state • Power is off for periphery 3. Shutdown Mode: power-off via power gates ARM I/Os with Power Management ARM General Purpose I/Os are also architected for low power. Along with a Metro family of smaller, less power-consuming I/O pads, the 65nm (and below) I/Os are software programmable, allowing a single pad to be configured for a range of input type, slew rate, drive strength, weak pull up/down and/or open-drain. Advanced power management features include the built-in level shifters that support an extended core-to-I/O voltage range, as well as the ability to retain state when the core VDD power ring is powered down. These features compliment the memories and PMK. ARM Intelligent Energy Management Moving up from the physical IP level, ARM offers a system-level power management scheme Intelligent Energy Manager (IEM™). This is a combination of hardware and software technology allowing dynamic voltage and frequency scaling (DVFS) to reduce overall energy consumption. The very basic idea of DVFS is the creation of power domains that can run at dynamically changing voltages and frequencies. The IEM software is able to predict, based on previous tasks and performance, when a specific task needs to complete. The software tells the hardware to adjust the voltage and frequency such that a task runs at the lowest possible voltage and frequency in order to finish just on time. Because both voltage and frequency are reduced, overall energy consumption is also reduced. All of this is done with no discernable impact upon end-user experience. None of this can be accomplished without the PMK. Because the basis of IEM is DVFS, which requires power domains, level shifters and/or isolation gates are required. Other PMK components – power gates, retention flops, etc. – can then be added to further reduce power and energy consumption. For the lowest possible power and energy, include power-gated memories and I/Os with power management. ARM at the Heart of Low Power ARM IEM technology is at work in several commercially available products. Product battery life has been seen to increase by 25-30% for designs using IEM technology. In designs implementing logic power gating with retention flops and memory power gating, CPU leakage power has been cut by ~96%. Even without power-gated memories, there is ~50% leakage savings. To ease the implementation of these power saving innovations ARM has, through its ongoing and close collaboration with major EDA vendors, released optimized reference flows for the majority of its processors, including those supporting IEM. Today, these reference flows also include the necessary physical IP to implement an ARM core. More recently ARM has released Processor Performance Packages for a number of ARM processors that include physical IP specially designed for a specific processor, foundry, node and process variant to provide ARM Partners with a rapid and risk-free route to implementing high-performance ARM processors in silicon. These packages include implementation guidelines and library prep; as well as pre-configured and optimized memory instances, and appropriate standard cells. Not only do these packages optimize the processor, but they also reduce time to market and design risk and thereby enable a new generation of ARM (low) Powered devices. In the September issue of IQ, Micheal Rockenhauser will look at low-power SoC design techniques such as : • how to make use of power and voltage domains • processor power modes • how to partition tasks between software running on the general processor and a task-specfic data-engine • designing an efficient memory hierarchy
|