Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.


  1. 1. 1262 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015 A Hardware Platform for Evaluating Low-Energy Multiprocessor Embedded Systems Based on COTS Devices Mohammad Salehi and Alireza Ejlali Abstract—Embedded systems are usually energy con- strained. Moreover, in these systems, increased pro- ductivity and reduced time to market are essential for product success. To design complex embedded systems while reducing the development time and cost, there is a great tendency to use commercial off-the-shelf (“COTS”) devices. At system level, dynamic voltage and frequency scaling (DVFS) is one of the most effective techniques for energy reduction. Nonetheless, many widely used COTS processors either do not have DVFS or apply DVFS only to processor cores. In this paper, an easy-to-implement COTS-based evaluation platform for low-energy embedded systems is presented. To achieve energy saving, DVFS is provided for the whole microcontroller (including core, phase-locked loop, memory, and I/O). In addition, facilities are provided for experimenting with fault-tolerance tech- niques. The platform is equipped with energy measurement and debugging equipment. Physical experiments show that applying DVFS on the whole microcontroller provides up to 47% and 12% energy saving compared with the sole use of dynamic power management and applying DVFS only on the core, respectively. Although the platform is designed for ARM-based embedded systems, our approach is general and can be applied to other types of systems. Index Terms—Embedded systems, energy management, hardware platform. I. INTRODUCTION Embedded systems are ubiquitous, and demand for these systems is growing progressively. A wide range of em- bedded systems are battery operated. As, for many of these systems, there is no possibility of frequently charging or re- placing their batteries, they are highly energy constrained [1]– [3]. Therefore, for these systems, low energy consumption has become one of the major design objectives. Examples include mobile robots and handheld devices such as personal digital assistants, cell phones, and portable medical care devices. Fur- thermore, the complexity of embedded systems is increasing as the number of parts and the number and types of interactions among them are increasing [3], [4]. Therefore, embedded sys- tem designers are always conducted at the request of designing complex embedded systems with several design objectives. Manuscript received October 27, 2013; revised March 23, 2014 and June 15, 2014; accepted July 21, 2014. Date of publication August 26, 2014; date of current version January 7, 2015. The authors are with the Department of Computer Engineering, Sharif University of Technology, Tehran 11365-11155, Iran (e-mail:; Color versions of one or more of the figures in this paper are available online at Digital Object Identifier 10.1109/TIE.2014.2352215 In dealing with today’s highly competitive embedded sys- tems markets and time-to-market pressure and in order to deliver correct-the-first-time products with multiple system re- quirements, the use of commercial off-the-shelf (COTS) de- vices [3], [5]–[7] is very beneficial in designing embedded systems. Some vendors offer reconfigurable hardware solu- tions to accelerate the design process and provide a variety of programmable logic device (PLD)-based evaluation kits (e.g., Xilinx [8] and many others). However, instead of focusing on embedded systems, these platforms allow to functionally test the SOC or ASIC devices to be produced. Embedded systems usually consist of a microcontroller that contains a microprocessor integrated with memory elements and periph- erals in a single chip [4]–[7]. Reference [5] has reported a laboratory activity on a microcontroller-based platform. Refer- ence [25] has presented a prototyping platform for ARM-based embedded systems. However, these platforms do not provide facilities to experiment with energy management techniques. Reference [23] has presented a platform for dynamic voltage and frequency scaling (DVFS) [11] in an ARM-based proces- sor. However, this work exploits DVFS only for the processor (and not for the other parts, e.g., phase-locked loop (PLL), memory, and I/O). In this paper, to meet the design requirements of multiob- jective embedded systems, we propose a hardware platform for experimenting with energy management techniques (i.e., dynamic power management (DPM) [12] and DVFS) (see Section III) and fault-tolerance techniques (see Section VI). Compared with previous related works (that proposed plat- forms for embedded systems), our platform: 1) provides DVFS capability for the microcontrollers, in- cluding not only the processor cores but also PLL, mem- ory, and I/O; it should be noted that many existing designs either do not have DVFS or apply DVFS only to processor cores [11], [13], [14], [23], whereas our study in this paper (see Section V) shows that applying DVFS to PLL, memory, and I/O is quite effective; 2) includes circuitry to accurately and separately measure energy/power consumption of different parts of the mi- crocontroller, including the processor core, PLL, mem- ory, and I/O; this provides the ability to determine the most energy-consuming part for a given application; 3) is general and based on an ARM-based COTS micro- controller; hence, it can be used for a wide range of existing microcontrollers (e.g., [13], [14], and [18]–[20]) and many other COTS devices. 0278-0046 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.
  2. 2. SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1263 Another advantage of the proposed platform is that it is suitable for research into energy management techniques in parallel processing. Since the proposed platform is general and is capable of implementing various design techniques and since it has the capability of parallel processing (because of the use of two ARM7-based and one AVR-based processors that can operate in parallel), the proposed platform can be useful for analyzing many design techniques (e.g., [1], [2], [12], and [22]), which exploit parallelism in energy management. Furthermore, we made new observations in our experiments that could provide useful information for embedded system designers. These are the five observations. 1) The high-to-low voltage scaling delay is greater than the low-to-high delay (about 45% for the processor core and PLL and about 110% for memory and I/O). 2) Voltage and frequency scaling is very effective in re- ducing power consumption not only for the processor core but also for the other parts of the microcontroller, including PLL, memory, and I/O. 3) Although PLL, memory, and I/O have less power con- sumption compared with the processor core, they have comparable energy consumption to that of the core. 4) Although PLL has a very small contribution in the total power consumption, as it is always operational, its energy consumption is comparable with that of the others. 5) Applying DVFS on the whole microcontroller results in a considerable energy savings compared with the sole use of DPM or applying DVFS only on the processor core. The remainder of this paper is organized as follows. In Section II, the architecture of the proposed hardware platform is described. The proposed energy management units and tech- niques are represented in Section III. In Section IV the power measurement, debug, and test units are described. Experimental results are given in Section V. In Section VI, we explain the capability of the proposed platform in experimenting with fault-tolerance techniques. Finally, we conclude this paper and describe future work in Section VII. II. HARDWARE PLATFORM DESIGN ARM7TDMI is the most widely used COTS processor in contemporary embedded systems because it is a low-cost, high- performance, and versatile processor [4], [6]. Many vendors (e.g., [9], [13], and [14]) combine the ARM7TDMI (hereafter ARM7) processor with internal memory devices and a wide range of peripherals on a single chip to obtain a microcontroller. It is noteworthy that the computational power of ARM7 is quite sufficient for the majority of embedded applications. For example, ARM7 can easily execute all benchmarks in MiBench benchmark suite [1], [21]. ARM7 can also execute fairly com- plex operating systems (e.g., Real-Time Executive for Mul- tiprocessor Systems (RTEMS) [1], [26] and Keil RTX [27]). Nevertheless, for highly computation-intensive applications, the performance of ARM7 might not be adequate. In this case, it should be noted that our proposed platform is not inherently dependent on ARM7. Indeed, any processor (e.g., i.MX27 [18] and PXA270 [20]) that allows changing operational frequency Fig. 1. ARM7-based microcontroller architecture [9]. and its supply voltage that can vary in an allowed range can be similarly used in our design. A. Architecture Overview Our design of the ARM7-based platform is founded on a member of AT91SAM7x series of microcontrollers [9]. The architecture of the microcontroller series is shown in Fig. 1. The microcontroller is composed of an ARM7 processor core, a system controller, memory elements, and peripheral de- vices. Most of ARM7-based microcontrollers adopt a sim- ilar architecture, e.g., [18]–[20]. As shown in Fig. 1, the microcontroller consists of Flash, ROM, and SRAM internal memory devices connected via the memory controller, and a wide range of peripherals, including universal synchronous/ asynchronous receiver–transmitter (USART), serial peripheral interface (SPI), analog-to-digital converter (ADC), universal serial bus (USB), Ethernet medium access control, controller area network (CAN), two-wire interface (TWI), synchronous serial controller (SSC), real-time timer (RTT), and pulsewidth- modulation controller (PWMC). Most I/O lines of the peripher- als are multiplexed with the parallel I/O (PIO) controller. Each PIO line may be assigned to a peripheral or used as general- purpose I/O. These features provide flexibility to designers and assure effective use of the components. B. Platform Architecture The architecture and physical implementation of the hardware platform are shown in Fig. 2(a) and (b), respectively. The platform contains two AT91SAM7x256 microcontrollers connected via a bus. Based on the facilities provided by AT91SAM7x series, this bus can be easily configured as SPI, UART, CAN, or a 16-bit parallel bus. AT91SAM7x256 contains an ARM7TDMI processor with in-circuit emulation (ICE), debug communication channel support, 64-KB internal SRAM, and 256-KB internal Flash memory. Two controllable power supplies are included in the board to provide power to the pe- ripherals and the processor core of each of the microcontrollers. The power supplies receive commands from the processors and control the power applied to each part of the microcontrollers (see Section III-B). The use of separate supply voltages not only helps conduct experiments with various DVFS schemes (where different supply voltages can be applied to each processor separately) but also can be used to shut off one processor to switch into a single-processor configuration. We have also provided the flexibility to users in choosing arbitrary DVFS or DPM schemes. The platform also is equipped with circuitry to measure the current drawn by the processor cores, PLLs,
  3. 3. 1264 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015 Fig. 2. Hardware platform. (a) Block diagram. (b) Implementation. Flash memory devices, and I/O peripherals. By the use of the measured current and the supply voltage of each part of the microcontrollers (the voltages are set by the controllable power supplies and reported to the measurement unit), the power consumption of each part is obtained. In addition, the execution time of applications running by the processors is reported to measure the consumed energy. The measurement data are sent to the host computer through the data logging port. Two debug- ging ports [RS232 and Joint Test Action Group (JTAG) ports] provide debug capabilities for each of the microcontrollers. JTAG is also used for ICE (see Section IV-B) and fault injection purposes (see Section VI). After designing and evaluating the target system, the platform can be customized for a specific application. III. ENERGY MANAGEMENT UNITS To manage the energy consumption, DVFS [11] and DPM [12] have been effectively used. DVFS varies the components’ voltage and, hence, frequency based on the system workload and other run-time factors. DPM selectively turns off the sys- tem components when they are idle. AT91SAM7x (like many microcontrollers such as [18]–[20]) only supports DPM (only controls the processor and peripheral clocks) and cannot exploit DVFS (does not provide variable supply voltage to its processor core and peripherals). In the following sections, we first explain how DPM can be employed (as an existing capability of most COTS microcontrollers), and then we introduce a methodology for adding DVFS capability to the microcontrollers that are not DVFS enabled. A. DPM The AT91SAM7x optimizes power consumption by con- trolling (enabling/disabling or scaling) the clock of pro- cessor and peripherals. The block diagram of the power management controller is shown in Fig. 3(b). It uses the clock outputs [see Fig. 3(a)] to supply clocks to the pro- cessor, USB, peripherals, and master clock, which is the clock provided to the memory controller and all the peripherals. Table I summarizes the power management techniques, which can be used for different parts of the microcontroller. As shown in Fig. 3, the master clock can be generated through scaling one of the clocks provided by the clock generator. A low-frequency Fig. 3. Power management unit. (a) Clock generator. (b) Power man- agement controller [9]. TABLE I POWER MANAGEMENT TECHNIQUES IN AT91SAM7X clock can be provided to the whole device by selecting the slow clock, or power consumption of the PLL can be saved by selecting the main clock. The processor power consumption can be reduced by switching off the processor clock when it enters to idle mode while waiting for an interrupt. After resetting the device or by any interrupt, the processor clock is automatically re-enabled. To reduce the power of each peripheral, the user can individually enable and disable the peripheral clock by controlling the master clock on each peripheral by the use of the peripheral clock controller. B. DVFS DPM usually has only two operational states for systems components, namely active and idle. The active power con- sumption of a clock-enabled component can be determined by its operating frequency and supply voltage, as denoted by PActive, as [1] PActive = ILeakageV + Ceff V 2 f (1) where ILeakageV is the static leakage power, and Ceff V 2 f is the dynamic power consumption (Ceff is the effective switched ca- pacitance). The dynamic power consumption can be efficiently eliminated by putting the component into the idle state by dis- abling the clock [12]. With special hardware support and under
  4. 4. SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1265 TABLE II POWER REQUIREMENTS IN AT91SAM7X Fig. 4. Power supply setup. (a) Typical power supply. (b) Proposed controllable power supply. software control, frequency scaling for system components can be used to exploit idle times for power saving. The active energy consumed by executing a task with N cycles at frequency f can be computed as PActiveN/f. As a result, although frequency scaling reduces the dynamic power consumption linearly, it has no effect on the static leakage power consumption. Fur- thermore, the consumed static energy for a given computation increases due to increasing the task execution time when reduc- ing the clock frequency. Hence, reduced energy consumption cannot be achieved by frequency scaling alone. Frequency scaling can be highly effective when employed in conjunc- tion with voltage scaling [1], [11]. Voltage scaling techniques employ software-controlled adjustable voltage regulators to set the supply voltage of the processor core and clock-enabled components. Software-controlled clock generators and voltage regulators allow the system to use DVFS. The basic idea behind DVFS techniques is to determine the minimum frequency that satisfies all timing constraints and then to adjust the lowest possible voltage that allows this speed [1], [11]. According to (1) and assuming a linear relationship between frequency and voltage [1], [11], the combined effects of voltage and frequency scaling result in decreasing the active power consumption proportional to V 3 and reducing the energy con- sumption proportional to V 2 . Therefore, by scaling both the voltage and frequency, the energy can be significantly reduced. However, this achievement does not come for free because a tradeoff exists between speed and energy consumption [1]. The AT91SAM7x microcontrollers have six power supply pins and a built-in (fixed output) voltage regulator, allowing the device to support a 3.3-V single-supply mode. Power specifica- tions of the power supply pins are shown in Table II. Fig. 4(a) shows the schematic of a typical single-power-supply mode where the 3.3-V power is supplied via a dc/dc voltage converter to VFLASH, VIO, and VIN. The input of the built-in voltage regulator is connected to the 3.3-V voltage source (i.e., the VIN pin), and its output (i.e., the VOUT pin) supplies 1.8-V fixed Fig. 5. Proposed controllable power supply schematic. voltage for the VCORE and VPLL pins. As Table II shows, the USB transceiver, Flash memory, and I/O lines power supply can range from 3.0 to 3.6 V, and in addition, the processor core and PLL power supply can range from 1.65 to 1.95 V. This provides the possibility for the device to vary the supply voltage rather than using just a single fixed voltage. To provide voltage scaling capability for this device, the dc/dc converter and embedded voltage regulator in Fig. 4(a) is replaced with a controllable power supply in Fig. 4(b) to feed the power pins with variable voltages. As shown in Fig. 4(b), variable supply voltage is provided for the power inputs of the microcontroller, except the embedded voltage regulator input, which remains unconnected, to disable the internal voltage regulator. The schematic of the proposed controllable power supply to provide dynamically scalable power supply is shown in Fig. 5. In this architecture, an adjustable version of a low- dropout linear voltage regulator (e.g., LM1117) is used. This regulator can provide an output voltage from 1.25 to 13.8 V with exploiting only two external resistors (i.e., Rref and Radj in Fig. 5). This device makes a 1.25-V reference voltage Vref between the output Vout and the adjust pin. As shown in Fig. 5, this voltage is applied across the resistor Rref to produce a constant current that flows through the adjustment resistor Radj and fixes the output voltage Vout to the desired level as Vout = VREF 1 + Radj Rref + IadjRadj. (2) Based on (2), to set Vout to a new voltage level, we need to change the adjustment resistor Radj. To provide the capability of dynamically adjusting the resistor, a digital potentiometer (e.g., AD8403) is used to provide a digitally controlled vari- able resistor that performs the same adjustment function as a potentiometer or variable resistor. As we aim at control- ling the voltage of the four power pins of AT91SAM7x256 [see Fig. 4(a)], a digital potentiometer, which includes four independent variable resistors, is used. Each resistor can be set separately by a digital code transferred into the de- vice. The code is loaded into the device via the standard three- wire SPI digital interface. The data bits clocked into the device are decoded to determine the resistor and its value. In summary, to dynamically scale the supply voltage of a power pin of the microcontroller at run time, a digital code indicating the resistor and its desired value is loaded by the microcontroller into the digital potentiometer; after changing the adjustment resistor, the voltage regulator’s output is scaled and set to the desired voltage value. Therefore, by the use of the proposed architecture, at run time, the microcontroller can dynamically set the voltage of the peripherals and the processor core power pins. Generally, the proposed technique can be used
  5. 5. 1266 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015 Fig. 6. Executing two tasks. (a) On a single-processor system. (b) On a dual-processor system. to provide scalable voltages for the COTS devices that their supply voltage can vary within a range. C. Opportunities Offered by Parallel Processing Since the proposed platform has multiple processing units (i.e., two ARM7-based and one AVR-based microcontrollers) and since it has the facilities for energy/power management (i.e., DVFS and DPM), one advantage of the platform is that it can be used to research into the possible opportunities for energy management that may be offered by parallel processing. To give an insight into this issue, we provide an example to illustrate when DVFS is used in executing parallel tasks; a two-processor system consumes less energy as compared with a single-processor system. Suppose that the slack time that is available to execute two tasks T1 and T2 (with N1 and N2 CPU cycles) is S. Fig. 6 shows how the tasks are executed on a single processor [see Fig. 6(a)] and on two processors [see Fig. 6(b)]. In Fig. 6, N1/fmax and N2/fmax are respectively the execution times of T1 and T2 at the maximum frequency fmax. For the single-processor system [see Fig. 6(a)], the minimum possible frequency that stretches the two tasks as long as possible and gives the minimum energy consumption can be calculated as fSP = N1/fmax + N2/fmax S . (3) Similarly, for the dual-processor system [see Fig. 6(b)], the minimum possible frequencies to execute T1 and T2 (i.e., fDP,1 and fDP,2, respectively) that give the minimum energy consumption can be calculated as fDP,1 = N1/fmax S fDP,2 = N2/fmax S . (4) By the use of (1) (that gives the active power consumption PActive) and considering that the energy consumed by execut- ing a task with N cycles at frequency f can be computed as PActiveN/f, the minimum energy consumption of the single- processor system [see Fig. 6(a)] can be written as (VSP is the minimum voltage that allows fSP) ESP = ILeakage VSP fSP + Ceff V 2 SP N1 + ILeakage VSP fSP + Ceff V 2 SP N2. (5) Fig. 7. Power measurement setup. Fig. 8. Debug and test schematic. Similarly, the minimum energy consumption of the dual- processor system can be written as (VDP,1 and VDP,2 are the minimum voltages that allow fDP,1 and fDP,2, respectively) EDP = ILeakage VDP,1 fDP,1 + Ceff V 2 DP,1 N1 + ILeakage VDP,2 fDP,2 + Ceff V 2 DP,2 N2. (6) In (3) and (4), it is shown that fDP,1 < fSP and fDP,2 < fSP (for N1 and N2 = 0). Therefore, the minimum voltages that are used in the dual-processor system can be less than the minimum voltage that is used in the single-processor system. Therefore, we have VDP,1 < VSP and VDP,2 < VSP. In addition, assuming an almost linear relationship between the voltage and fre- quency [1], [3], [11], we can write VSP/fSP ≈ VDP,1/fDP,1 ≈ VDP,2/fDP,2. Therefore, from (5) and (6), it can be concluded that EDP < ESP. This means that when DVFS is used in executing parallel tasks, a dual-processor system could provide more energy saving compared with a single-processor system. IV. POWER MEASUREMENT, DEBUG, AND TEST UNITS A. Power Measurement Unit To provide power measurement equipment to the platform, a resistor is placed between each microcontroller power pin and the power supply line, and the voltage drop across the resistor is measured. The measured value gives the current drawn by the power pin. The power measurement setup is shown in Fig. 7. As the current drawn by the power pins of the microcontroller is less than 100 mA and this value cannot be digitized by the ADC of microcontrollers, the voltage value is amplified using an operational amplifier. The amplified value is digitized by a 10-bit ADC, and the data are sent to the host computer. B. Debug Units The AT91SAM7x microcontrollers have a number of debug and test features, shown as a block diagram in Fig. 8. The UART debug unit provides a two-pin (i.e., TXD and RXD) UART interface that can be employed for various purposes, e.g., debug, trace the running application, and upload an application into internal SRAM. A general JTAG/ICE (see [9]) port is employed for commonly used operations, such as loading pro- gram code, and for standard debugging functions, such as single stepping through programs. IEEE 1149.1 JTAG Boundary Scan
  6. 6. SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1267 TABLE III POWER SUPPLY REQUIREMENTS FOR SOME WIDELY USED MICROCONTROLLERS Fig. 9. Experimental setup and monitoring. (a) Setup. (b) Voltage of I/O. (c) Voltage of the processor. Coupling: ac. allows pin-level access to the IEEE 1149.1 JTAG-compliant devices independent of the device packaging technology and is commonly used for test purposes. In a test environment for multiple on-board devices, a number of JTAG-compliant devices are connected to form a single scan chain, and test vectors are generated, transferred, and interpreted by a tester. V. EXPERIMENTAL RESULTS A survey of some widely used ARM-based microcontrollers suggests that most of them permit the power supply pins to be fed by a wide range of voltages, as shown in Table III. This pro- vides the opportunity of employing the proposed controllable power supply (see Section III-B) for them to achieve energy saving. In addition, all of the processors in Table III offer a number of modes to manage power in the system. These modes range widely in the level of power savings and the level of functionality. For instance, LPC11U6x series [14] provide four power modes, namely, Sleep, Deep-sleep, Power-down, and Deep power-down modes, and PXA270 [20] provides Turbo mode (i.e., low latency operation), Run mode (i.e., normal full- function mode), Idle and Deep-idle modes (allow stopping and resuming the CPU clock), Standby mode (all PLLs are dis- abled), Sleep mode (only keeps I/Os powered), and Deep-sleep (I/Os are powered down). To the best of our knowledge, most of the current embedded processors provide power management only through controlling the clock of the processor core and peripherals, and only a few of them (e.g., [13] and [14]) provide variable supply voltages. As we have discussed in Section III-B, lowering clock frequency solely is not effective for energy saving, and simultaneous frequency and voltage scaling are required for this purpose. Fig. 9(a) shows the experimental setup that includes an oscilloscope (for displaying voltages), a JTAG device (for programming), and two USB and RS232 connections (for data transfer). In this platform, we have four different voltages (see Fig. 4) for: 1) the processor; 2) PLL; 3) I/O peripheral; and 4) memory. These voltages can independently vary and can be determined regardless of others. In the experiments, the processor and PLL voltages could be any value from the set {1.65, 1.7, 1.75, 1.8, 1.85, 1.9, and 1.95 V}, and I/O and Fig. 10. High-to-low and low-to-high voltage scaling delays. (a) I/O. (b) Processor. Coupling: ac. memory voltages could be any value from the set {3.0, 3.1, 3.2, 3.3, 3.4, 3.5, and 3.6 V}. Like the works [1], [11], and [23], we have a set of voltage–frequency pairs to perform DVFS. Each voltage has a corresponding frequency level, and hence, there are seven levels, i.e., {36, 40, 45, 51, 55, 58, and 61 MHz}. The corresponding frequency for each voltage was empirically determined by measuring the highest frequency at which the processor still worked correctly and then subtracting 5% safety margin (similar to [23]). It should be noted that this measure- ment is carried out only once (by the board development team) and the end users just use the provided set and they do not need to repeat such measurements (although they can do it if they require). Although we provided only seven different levels of voltage, the platform can provide 256 voltage levels. As an example to show how the four voltages can independently vary, Fig. 9(b) shows the voltage of the processor and I/O when switching, respectively, between 1.75, 1.8 and 1.85 V, and 3.2, 3.3 and 3.4 V. We conducted a set of experiments to analyze the voltage scaling delay in the proposed platform. For example, Fig. 10 shows a timing diagram of voltage scaling between two consec- utive voltage levels, i.e., 1.75 and 1.8 V for the processor core and 3.2 and 3.3 V for I/O. In Fig. 10, the high-to-low voltage scaling delay is 34 and 118 μs, and the low-to-high voltage scaling delay is 23 and 55 μs for the processor core and I/O, respectively. In our experiments, we obtained almost the same result for the other voltage levels. An interesting observation from these experiments is that the high-to-low voltage scaling delay is greater than the low-to-high voltage scaling delay (i.e., about 45% for the processor core and PLL and about 110% for memory and I/O). To analyze the power consumption of different parts of the microcontroller (including the processor core, PLL, memory, and I/O) when working on different voltage levels, we executed a matrix multiplication task on the Keil RTX operating system [27]. This task multiplies two randomly generated matrices and sends the result to the host computer via USB. Based on the power consumption results that are shown in Fig. 11, for all the parts, lower supply voltage leads to lower power consumption. In addition, Fig. 11 shows that voltage scaling is very effective in reducing the power consumption of both the processor and the other parts of the microcontroller. Another set of experiments has been performed on the MiBench benchmarks [21] (as real applications) to determine the contribution of each part of the microcontroller in the total power consumption, execution time, and energy consumption. The results are shown in Fig. 12. In this experiment, the 1.8-V voltage is used for the processor and PLL, and the 3.3-V
  7. 7. 1268 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 62, NO. 2, FEBRUARY 2015 Fig. 11. Power consumption of AT91SAM7x. (a) Processor and PLL. (b) Memory and I/O. Fig. 12. Contribution of parts of AT91SAM7x in: (a) power consump- tion, (b) execution time, and (c) energy consumption. voltage is used for memory and I/O. As PLL is always op- erational during the application execution, it is not included in Fig. 12(b), and when we calculate energy consumption [in Fig. 12(c)], applications’ execution time is considered for PLL. From Fig. 12, we make two main observations: 1) Although the power consumption of PLL, memory, and I/O is less than that of the processor, they have energy consumption comparable with that of the processor; 2) although PLL has a very small contribution in the total power, as it is always operational, its energy consumption is comparable in most cases with that of the others. To evaluate the effectiveness of applying voltage scaling on the whole microcontroller, we measured and compared the energy consumption of the microcontroller when using three types of energy management techniques. 1) DPM: When there is an idle time, the microcontroller enters the low-power mode, which is provided by the microcontroller [9], as: memory is standby (is not ac- cessed at all), processor core is idle (its clock is switched off), main clock = 500 Hz, and all peripheral clocks are deactivated. 2) Core voltage and frequency scaling (CVFS): DVFS is used only for the processor core, and DPM is used for the other parts. In this case, the processor frequency is set to the slowest frequency (and its corresponding voltage) necessary to finish the application, selected from the set of available voltage–frequency pairs. 3) Microcontroller voltage and frequency scaling (MVFS): DVFS is used for the whole microcontroller, including the processor core, PLL, memory, and I/O. In this experiment, we analyzed the MiBench benchmarks, and the results are shown in Table IV. This experiment shows that, for the applications in this experiment, using MVFS results in energy savings in average of about 35% and 11% (at least 31% and 10%), as compared with the sole use of DPM and to TABLE IV ENERGY CONSUMPTION (IN MILLIJOULES) OF DPM, CVFS, AND MVFS TABLE V ENERGY CONSUMPTION (IN MILLIJOULES) FOR EXECUTING THE DUPLICATION TECHNIQUE ON A SINGLE PROCESSOR OR ON TWO PROCESSORS the use of the processor CVFS, respectively. In this experiment, we did not consider any fixed voltage–frequency pair for any benchmark. Rather, we executed each benchmark by all seven voltage–frequency pairs, and the average results are reported in Table IV. We conducted another set of experiments to show how the proposed platform can be used for parallelism in energy man- agement. As an example, consider the duplication technique [2] where each task is executed twice to detect possible errors. These two executions of each task can be performed on a single processor in series [see Fig. 6(a)] or on two processors in paral- lel [see Fig. 6(b)]. As Table V shows, for this example, parallel processing on two processors can provide in average of 25% (up to 29%) energy saving, as compared with implementing the technique on a single processor (the reason is discussed in Section III-C). To implement this application, we used RTX operating system [27] (other embedded operating systems (e.g., RTEMS [26]) could be also used with the platform). Then, we developed the source code of the application, where we used MailBox [26] feature of RTX for message passing and syn- chronization. MailBox can use commonly used communication protocols (e.g., SPI, UART, and CAN) that are supported by the platform (see Section II). For this experiment, we used UART for MailBox. Finally, we used Keil [26] (the compiler for RTX) to compile the source code and to load the object files into the platform through JTAG. VI. EXTENSIONS AND FUTURE WORK The proposed platform can provide an experimental setup for different research projects. For example, the platform can be used to experiment with fault-tolerance techniques as a direction for future work by the use of these facilities. 1) The two microcontrollers are connected such that they can interrupt, restart, and turn on/off each other. 2) Each of the microcontrollers can access the internal parts of the other via JTAG. This is helpful to implement fault detection mechanisms that require comparing parts of a processor with their correspondents in the other one.
  8. 8. SALEHI AND EJLALI: HARDWARE PLATFORM TO EVALUATE EMBEDDED SYSTEMS BASED ON COTS DEVICES 1269 3) There are interconnections to transfer data, internal states, and checkpoints between the microcontrollers. 4) A third smaller microcontroller is placed in the platform that can be used (for example, as a voter [15]) to imple- ment fault-tolerance techniques. These facilities provide the possibility of implementing fault- tolerance techniques such as standby sparing [1], duplication [2], and “2 out of 2” hardware redundancy with a voter [15]. The platform can be also used to implement software fault- tolerance techniques such as result checking [2] and N-version programming [10]. Such software mechanisms usually require communication and synchronization between the processors [28], which is supported by the proposed platform. The debugging features (see Section IV-B) can be also used for fault injection purposes [16]. For example, JTAG lets us change processor registers, flags, and data memory at run time arbitrarily. This can be used for injecting soft errors that are caused by transient faults (e.g., single event upset [17]) and cause one or more memory bits change [1], [2], [17]. Another possible extension is to adopt a motherboard– daughterboard architecture for the design of the board to be used for other microcontrollers but with a slight change. VII. CONCLUSION This paper has presented a hardware platform that consists of two ARM-based microcontrollers, each fed separately by variable voltages. This platform is very suitable for evaluating embedded systems with low energy consumption and fault- tolerance requirements. In this platform, we provide DVFS ca- pability for the whole microcontroller (including the processor core, PLL, memory, and I/O). Physical experiments show that applying DVFS on the whole microcontroller is considerably more efficient in reducing power/energy consumption com- pared with applying DVFS only on the processor core or using power-down policies that are currently used by most embedded processors. In addition, the platform is equipped with accu- rate energy/power measurement units, debugging ports, and facilities for evaluating fault-tolerance techniques. Although the platform is designed for ARM-based microcontrollers, it is general, and other COTS devices and embedded processors can be similarly used in the design of the platform. REFERENCES [1] A. Ejlali, B. M. Al-Hashimi, and P. Eles, “Low-energy standby-sparing for hard real-time systems,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 31, no. 3, pp. 329–342, Mar. 2012. [2] S. Aminzadeh and A. Ejlali, “A comparative study of system-level energy management methods for fault-tolerant hard real-time systems,” IEEE Trans. Comput., vol. 60, no. 9, pp. 1288–1299, Sep. 2011. [3] A. Malinowski and H. Yu, “Comparison of embedded system design for industrial applications,” IEEE Trans. Ind. Informat., vol. 7, no. 2, pp. 244– 254, May 2011. [4] J. Henkel and S. Parameswaran, Designing Embedded Processors: A Low Power Perspective. Berlin, Germany: Springer-Verlag, 2007. [5] P. Marti, M. Velasco, J. M. Fuertes, A. Camacho, and G. Buttazzo, “Design of an embedded control system laboratory experiment,” IEEE Trans. Ind. Electron., vol. 57, no. 10, pp. 3297–3307, Oct. 2010. [6] T. Yang, G. Zhang, and X. Hu, “System design of current transformer accuracy tester based on ARM,” in Proc. 8th IEEE Conf. Ind. Electron. Appl., Jun. 19–21, 2013, pp. 634–639. [7] H. Guzman-Miranda, L. Sterpone, M. Violante, M. A. Aguirre, and M. Gutierrez-Rizo, “Coping with the obsolescence of safety- or mission- critical embedded systems using FPGAs,” IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 814–821, Mar. 2011. [8] Virtex-6 FPGA ML605 Evaluation Kit, Xilinx, San Jose, CA, USA, 2012. [9] ARM-based Flash MCU SAM7x Series, Atmel Corp., San Jose, CA, USA, Feb. 11, 2014. [10] R.-T. Wang, “A dependent model for fault tolerant software systems dur- ing debugging,” IEEE Trans. Rel., vol. 61, no. 2, pp. 504–515, Jun. 2012. [11] J. Pouwelse, K. Langendoen, and H. Sips, “Dynamic voltage scaling on a low-power microprocessor,” in Proc. 7th ACM Int. Conf. MobiCom Netw., 2001, pp. 251–259. [12] Y. S. Hwang and K. S. Chung, “Dynamic power management technique for multicore based embedded mobile devices,” IEEE Trans. Ind. Infor- mat., vol. 9, no. 3, pp. 1601–1612, Aug. 2013. [13] STM32L15x: Ultra-Low-Power 32-Bit MCU ARM-Based Cortex-M3, STMicroelectronics, Geneva, Switzerland, Nov. 2013. [14] LPC11U6x 32-Bit ARM Cortex-M0 + Microcontroller, NXP Semiconduc- tors, Eindhoven, The Netherlands, Mar. 2014. [15] M. Idirin, X. Aizpurua, A. Villaro, J. Legarda, and J. Melendez, “Imple- mentation details and safety analysis of a microcontroller-based SIL-4 software voter,” IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 822–829, Mar. 2011. [16] M. Portela-Garcia, C. Lopez-Ongil, M. Garcia-Valderas, and L. Entrena, “Fault injection in modern microprocessors using on-chip debugging in- frastructures,” IEEE Trans. Dependable Secure Comput., vol. 8, no. 2, pp. 308–314, Mar./Apr. 2011. [17] M. Grosso, H. Guzman-Miranda, and M. A. Aguirre, “Exploiting fault model correlations to accelerate SEU sensitivity assessment,” IEEE Trans. Ind. Informat., vol. 9, no. 1, pp. 142–148, Feb. 2013. [18] i.MX27 and i.MX27L Multimedia Applications Processor, Freescale Semiconductor Inc., Austin, TX, USA, 2011. [19] High-Performance, Low-Power System-on-Chip with SDRAM and Digital Audio, Cirrus Logic, Inc., Austin, TX, USA, 2011. [20] Marvell PXA270 Processor: Electrical, Mechanical, Thermal Specifica- tion, Marvell, Santa Clara, CA, USA, 2009. [21] M. R. Guthaus et al., “Mibench: A free, commercially representative embedded benchmark suite,” in Proc. IEEE Int. Workshop Workload Characterization, Dec. 2001, pp. 3–14. [22] J. Castrillon, R. Leupers, and G. Ascheid, “MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs,” IEEE Trans. Ind. Infor- mat., vol. 9, no. 1, pp. 527–545, Feb. 2013. [23] T. Phatrapornnant and M. J. Pont, “Reducing jitter in embedded systems employing a time-triggered software architecture and dynamic voltage scaling,” IEEE Trans. Comput., vol. 55, no. 2, pp. 113–124, Feb. 2006. [24] H. Guo, K.-S. Low, and H.-A. Nguyen, “Optimizing the localization of a wireless sensor network in real time based on a low-cost microcontroller,” IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 741–749, Mar. 2011. [25] R. Wang and S. Yang, “The design of a rapid prototype platform for ARM based embedded system,” IEEE Trans. Consum. Electron, vol. 50, no. 2, pp. 746–751, May 2004. [26] RTEMS Operating System, 2010. [Online]. Available: http://www. [27] RTX Real-Time Operating System, 2013. [Online]. Available: http:// [28] Y. Jiang et al., “Bayesian-network-based reliability analysis of PLC systems,” IEEE Trans. Ind. Electron., vol. 60, no. 11, pp. 5325–5336, Nov. 2013. Mohammad Salehi received the M.S. degree in computer engineering from Sharif University of Technology, Tehran, Iran, in 2010, where he is currently working toward the Ph.D. degree in computer engineering. His current research interests include embedded systems, low-power design, and the tradeoff between fault tolerance and energy efficiency in real-time systems. Alireza Ejlali received the Ph.D. degree in computer engineering from Sharif University of Technology, Tehran, Iran, in 2006. He is currently an Associate Professor of computer engineering with Sharif University of Technology, where he is also the Director of the Computer Architecture Group and the Embedded Systems Research Laboratory, Department of Computer Engineering. His current research interests include low-power design, real-time embedded systems, and fault-tolerant embedded systems.