NU Learning Resource Center OPAC catalog › Details for: ACM Transactions on Design Automation of Electronic Systems

Normal view MARC view ISBD view

ACM Transactions on Design Automation of Electronic Systems

Material type: Text

TextSeries: ; ACM Transactions on Design Automation of Electronic Systems, Volume 22, Issue 1, 2017Publication details: New York : Association for Computing Machinery, c2017Description: various pagings : illustrations ; 25 cmISSN: 1084-4309Subject(s): HARDWARE -- TEMPERATURE CONTROL | ALGORITHMIC MICROFLUIDS | LOGIC OPTIMIZATION | COMPUTER SYSTEM ORGANIZATION | DESIGN AIDS -- PLACEMENT AND ROUTING | HARDWARE -- TEST DATA COMPRESSION | OPERATING SYSTEMS | SECURITY AND PRIVACY | DATA FLOW COMPUTATION | DYNAMIC MEMORY

Contents:

Hierarchical Dynamic Thermal Management Method for High-Performance Many-Core Microprocessors -- Error-Correcting Sample Preparation with Cyberphysical Digital Microfluidic Lab-on-Chip -- State Assignment and Optimization of Ultra-High-Speed FSMs Utilizing Tristate Buffers -- A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA Architecture -- Obstacle-Avoiding Wind Turbine Placement for Power Loss and Wake Effect Optimization -- Hardware Trojans: Lessons Learned after One Decade of Research -- Periodic Scan-In States to Reduce the Input Test Data Volume for Partially Functional Broadside Tests -- Efficient Security Monitoring with the Core Debug Interface in an Embedded Processor -- Improving PCM Endurance with a Constant-Cost Wear Leveling Design -- Ripple 2.0: Improved Movement of Cells in Routability-Driven Placement -- A Compact Implementation of Salsa20 and Its Power Analysis Vulnerabilities -- Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures -- Genetic-Algorithm-Based FPGA Architectural Exploration Using Analytical Models -- Hybrid Power Management for Office Equipment -- Probabilistic Model Checking for Uncertain Scenario-Aware Data Flow -- DReAM: An Approach to Estimate per-Task DRAM Energy in Multicore Systems -- Non-enumerative Generation of Path Delay Distributions and Its Application to Critical Path Selection -- An Adaptive Demand-Based Caching Mechanism for NAND Flash Memory Storage Systems -- ERfair Scheduler with Processor Suspension for Real-Time Multiprocessor Embedded Systems.

Summary: [Article Title: Hierarchical Dynamic Thermal Management Method for High-Performance Many-Core Microprocessors/ Hai Wang,Jian Ma,Sheldon X.-D. Tan,Chi Zhang,He Tang,Keheng Huang and Zhenghong Zhang, p. 1:1-1:21] Abstract: It is challenging to manage the thermal behavior of many-core microprocessors while still keeping them running at high performance since the control complexity increases as the core number increases. In this article, a novel hierarchical dynamic thermal management method is proposed to overcome this challenge. The new method employs model predictive control (MPC) with task migration and a DVFS scheme to ensure smooth control behavior and negligible computing performance sacrifice. In order to be scalable to many-core systems, the hierarchical control scheme is designed with two levels. At the lower level, the cores are spatially clustered into blocks, and local task migration is used to match current power distribution with the optimal distribution calculated by MPC. At the upper level, global task migration is used with the unmatched powers from the lower level. A modified iterative minimum cut algorithm is used to assist the task migration decision making if the power number is large at the upper level. Finally, DVFS is applied to regulate the remaining unmatched powers. Experiments show that the new method outperforms existing methods and is very scalable to manage many-core microprocessors with small performance degradation. https://doi.org/10.1145/2891409Summary: [Article Title: Error-Correcting Sample Preparation with Cyberphysical Digital Microfluidic Lab-on-Chip/ Sudip Poddar,Sarmishtha Ghoshal,Krishnendu Chakrabarty and Bhargab B. Bhattacharya, p. 2:1-2:29] Abstract: Digital (droplet-based) microfluidic technology offers an attractive platform for implementing a wide variety of biochemical laboratory protocols, such as point-of-care diagnosis, DNA analysis, target detection, and drug discovery. A digital microfluidic biochip consists of a patterned array of electrodes on which tiny fluid droplets are manipulated by electrical actuation sequences to perform various fluidic operations, for example, dispense, transport, mix, or split. However, because of the inherent uncertainty of fluidic operations, the outcome of biochemical experiments performed on-chip can be erroneous even if the chip is tested a priori and deemed to be defect-free. In this article, we address an important error recoverability problem in the context of sample preparation. We assume a cyberphysical environment, in which the physical errors, when detected online at selected checkpoints with integrated sensors, can be corrected through recovery techniques. However, almost all prior work on error recoverability used checkpointing-based rollback approach, that is, re-execution of certain portions of the protocol starting from the previous checkpoint. Unfortunately, such techniques are expensive both in terms of assay completion time and reagent cost, and can never ensure full error-recovery in deterministic sense. We consider imprecise droplet mix-split operations and present a novel roll-forward approach where the erroneous droplets, thus produced, are used in the error-recovery process, instead of being discarded or remixed. All erroneous droplets participate in the dilution process and they mutually cancel or reduce the concentration-error when the target droplet is reached. We also present a rigorous analysis that reveals the role of volumetric-error on the concentration of a sample to be prepared, and we describe the layout of a lab-on-chip that can execute the proposed cyberphysical dilution algorithm. Our analysis reveals that fluidic errors caused by unbalanced droplet splitting can be classified as being either critical or non-critical, and only those of the former type require correction to achieve error-free sample dilution. Simulation experiments on various sample preparation test cases demonstrate the effectiveness of the proposed method. https://doi.org/10.1145/2898999Summary: [Article Title: State Assignment and Optimization of Ultra-High-Speed FSMs Utilizing Tristate Buffers/ Robert Czerwinski, and Dariusz Kania, p. 3:1-3:25] Abstract: The logic synthesis of ultra-high-speed FSMs is presented. The state assignment is based on a well-known method that uses output vectors. This technique is adjusted to include elements of two-level minimization and takes into account the limited number of terms contained in the programmable-AND/fixed-OR logic cell. The state assignment is based on a special form of the binary decision tree. The second phase of the FSM design is logic optimization. The optimization method is based on tristate buffers, thus making possible a one-logic-level FSM structure. The key point is to search partition variables that control the tristate buffers. This technique can also be applied to combinational circuits or the output block of FSMs only. Algorithms for state assignment and optimization are presented and richly illustrated by examples. The method is dedicated to using specific features of complex programmable logic devices. Experimental results prove its effectiveness (e.g., the implementation of the the 16-bit counter requires 136 logic cells and one-logic-cell level instead of 213 cells and four levels). The optimization method using tristate buffers and a state assignment binary decision tree can be directly applied to FPGA-dedicated logic synthesis. https://doi.org/10.1145/2905366Summary: [Article Title: A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA Architecture/ Shirshendu Das and Hemangee K. Kapoor, p. 4:1-4:26] Abstract: Multicore processors have proliferated several domains ranging from small-scale embedded systems to large data centers, making tiled CMPs (TCMPs) the essential next-generation scalable architecture. NUCA architectures help in managing the capacity and access time for such larger cache designs. It divides the last-level cache (LLC) into multiple banks connected through an on-chip network. Static NUCA (SNUCA) has a fixed address mapping policy, whereas dynamic NUCA (DNUCA) allows blocks to relocate nearer to the processing cores at runtime. To allow this, DNUCA divides the banks into multiple banksets and a block can be placed in any bank within a particular bankset. The entire bankset may need to be searched to access a block. Optimal bankset searching mechanisms are essential for getting the benefits from DNUCA. This article proposes a DNUCA-based TCMP architecture called TLD-NUCA. It reduces the LLC access time of TCMP and also allows a heavily loaded bank to distribute its load among the underused banks. Instead of other DNUCA designs, TLD-NUCA considers larger banksets. Such relaxations result in more uniform load distribution than existing DNUCA-based TCMP (T-DNUCA). Considering larger banksets improves the utilization factor, but T-DNUCA cannot implement it because of its expensive searching mechanism. TLD-NUCA uses a centralized directory, called TLD, to search a block from all the banks. Also, the proposed block placement policy reduces the instances when the central TLD needs to be contacted. It does not require the expensive simultaneous search as needed by T-DNUCA. Better cache utilization and a reduction in LLC access time improve the miss rate as well as the average memory access time (AMAT). Improving the miss rate and AMAT results in improvements in cycles per instructions (CPI). Experimental analysis found that TLD-NUCA improves performance by 6.5% as compared to T-DNUCA. The improvement is 13% as compared to the SNUCA-based TCMP design. https://doi.org/10.1145/2907946Summary: [Article Title: Obstacle-Avoiding Wind Turbine Placement for Power Loss and Wake Effect Optimization/ Yu-Wei Wu,Yiyu Shi,Sudip Roy and Tsung-Yi Ho, p. 5:1-5:24] Abstract: As finite energy resources are being consumed at faster rate than they can be replaced, renewable energy resources have drawn extensive attention. Wind power development is one such example growing significantly throughout the world. The main difficulty in wind power development is that wind turbines interfere with each other. The produced turbulence—wake effect—directly reduces the power generation. In addition, wirelength of the collection network among wind turbines is not merely an economic factor; it also decides power loss on the wind farm. Moreover, in reality, obstacles (buildings, lakes, etc.) exist on the wind farm, which are unavoidable. Nevertheless, to the best of our knowledge, none of the existing works consider wake effect, wirelength, and avoidance of obstacles all together in the wind turbine placement problem. In this article, we propose an analytical method to obtain the obstacle-avoiding placement of wind turbines, thus minimizing both power loss and wake effect. We also propose a postprocessing method to fine-tune the solution obtained from the analytical method to find a better solution. Simulation results show that our tool is 12x faster than the state-of-the-art industrial tool AWS OpenWind and 203x faster than the state-of-the-art academic tool TDA with almost the same produced power. https://doi.org/10.1145/2905365Summary: [Article Title: Hardware Trojans: Lessons Learned after One Decade of Research/ K. Xiao,D. Forte,Y. Jin,R. Karri,S. Bhunia and M. Tehranipoor, p. 6:1-6:23] Abstract: Given the increasing complexity of modern electronics and the cost of fabrication, entities from around the globe have become more heavily involved in all phases of the electronics supply chain. In this environment, hardware Trojans (i.e., malicious modifications or inclusions made by untrusted third parties) pose major security concerns, especially for those integrated circuits (ICs) and systems used in critical applications and cyber infrastructure. While hardware Trojans have been explored significantly in academia over the last decade, there remains room for improvement. In this article, we examine the research on hardware Trojans from the last decade and attempt to capture the lessons learned. A comprehensive adversarial model taxonomy is introduced and used to examine the current state of the art. Then the past countermeasures and publication trends are categorized based on the adversarial model and topic. Through this analysis, we identify what has been covered and the important problems that are underinvestigated. We also identify the most critical lessons for those new to the field and suggest a roadmap for future hardware Trojan research. https://doi.org/10.1145/2906147Summary: [Article Title: Periodic Scan-In States to Reduce the Input Test Data Volume for Partially Functional Broadside Tests/ Irith Pomeranz, p. 7:1-7:22] Abstract: This article describes a procedure for test data compression targeting functional and partially functional broadside tests. The scan-in state of such a test is either a reachable state or has a known Hamming distance from a reachable state. Reachable states are fully specified, while the popular LFSR-based test data compression methods require the use of incompletely specified test cubes. The test data compression approach considered in this article is based on the use of periodic scan-in states. Such states require the storage of a period that can be significantly shorter than a scan-in state, thus providing test data compression. The procedure computes a set of periods that is sufficient for detecting all the detectable target faults. Considering the scan-in states that the periods produce, the procedure ranks the periods based on the distances of the scan-in states from reachable states, and the lengths of the periods. Functional and partially functional broadside tests are generated preferring shorter periods with smaller Hamming distances. The results are compared with those of an LFSR-based approach. https://doi.org/10.1145/2911983Summary: [Article Title: Efficient Security Monitoring with the Core Debug Interface in an Embedded Processor/ Jinyong Lee,Ingoo Heo,Yongje Lee and Yunheung Paek, p. 8:1-8:29] Abstract: For decades, various concepts in security monitoring have been proposed. In principle, they all in common in regard to the monitoring of the execution behavior of a program (e.g., control-flow or dataflow) running on the machine to find symptoms of attacks. Among the proposed monitoring schemes, software-based ones are known for their adaptability on the commercial products, but there have been concerns that they may suffer from nonnegligible runtime overhead. On the other hand, hardware-based solutions are recognized for their high performance. However, most of them have an inherent problem in that they usually mandate drastic changes to the internal processor architecture. More recent ones have strived to minimize such modifications by employing external hardware security monitors in the system. However, these approaches intrinsically suffer from the overhead caused by communication between the host and the external monitor. Our solution also relies on external hardware for security monitoring, but unlike the others, ours tackles the communication overhead by using the core debug interface (CDI), which is readily available in most commercial processors for debugging. We build our system simply by plugging our monitoring hardware into the processor via CDI, precluding the need for altering the processor internals. To validate the effectiveness of our approach, we implement two well-known monitoring techniques on our proposed framework: dynamic information flow tracking and branch regulation. The experimental results on our FPGA prototype show that our external hardware monitors efficiently perform monitoring tasks with negligible performance overhead, mainly with thanks to the support of CDI, which helps us reduce communication costs substantially. https://doi.org/10.1145/2907611Summary: [Article Title: Improving PCM Endurance with a Constant-Cost Wear Leveling Design/ Yu-Ming Chang,Pi-Cheng Hsiu,Yuan-Hao Chang,Chi-Hao Chen,Tei-Wei Kuo and Cheng-Yuan Michael Wang, p. 9:1-9:27] Abstract: Improving PCM endurance is a fundamental issue when it is considered as an alternative to replace DRAM as main memory. Memory-based wear leveling (WL) is an effective way to improve PCM endurance, but its major challenge is how to efficiently determine the appropriate memory pages for allocation or swapping. In this article, we present a constant-cost WL design that is compatible with existing memory management. Two implementations, namely bucket-based and array-based WL, with constant-time (or nearly zero) search cost are proposed to be integrated into the OS layer and the hardware layer, respectively, as well as to trade between time and space complexity. The results of experiments conducted based on an implementation in Android, as well as simulations with popular benchmarks, to evaluate the effectiveness of the proposed design are very encouraging. https://doi.org/10.1145/2905364Summary: [Article Title: Ripple 2.0: Improved Movement of Cells in Routability-Driven Placement/ Xu He,Yao Wang,Yang Guo and Evangeline F. Y. Young, p. 10:1-10:26] Abstract: Routability is one of the most important problems in high-performance circuit designs. From the viewpoint of placement design, two major factors cause routing congestion: (i) interconnections between cells and (ii) connections on macro blockages. In this article, we present a routability-driven placer, Ripple 2.0, which emphasizes both kinds of routing congestion. Several techniques will be presented, including (i) cell inflation with routing path consideration, (ii) congested cluster optimization, (iii) routability-driven cell spreading, and (iv) simultaneous routing and placement for routability refinement. With the official evaluation protocol, Ripple 2.0 outperforms other published academic routability-driven placers. Compared with top results in the ICCAD 2012 contest, Ripple 2.0 achieves a better detailed routing solution obtained by a commercial router. https://doi.org/10.1145/2925989Summary: [Article Title: A Compact Implementation of Salsa20 and Its Power Analysis Vulnerabilities/ Bodhisatwa Mazumdar,Sk. Subidh Ali and Ozgur Sinanoglu, p. 11:1-11:26] Abstract: In this article, we present a compact implementation of the Salsa20 stream cipher that is targeted towards lightweight cryptographic devices such as radio-frequency identification (RFID) tags. The Salsa20 stream cipher, ann addition-rotation-XOR (ARX) cipher, is used for high-security cryptography in NEON instruction sets embedded in ARM Cortex A8 CPU core-based tablets and smartphones. The existing literature shows that although classical cryptanalysis has been effective on reduced rounds of Salsa20, the stream cipher is immune to software side-channel attacks such as branch timing and cache timing attacks. To the best of our knowledge, this work is the first to perform hardware power analysis attacks, where we evaluate the resistance of all eight keywords in the proposed compact implementation of Salsa20. Our technique targets the three subrounds of the first round of the implemented Salsa20. The correlation power analysis (CPA) attack has an attack complexity of 219. Based on extensive experiments on a compact implementation of Salsa20, we demonstrate that all these keywords can be recovered within 20,000 queries on Salsa20. The attacks show a varying resilience of the key words against CPA that has not yet been observed in any stream or block cipher in the present literature. This makes the architecture of this stream cipher interesting from the side-channel analysis perspective. Also, we propose a lightweight countermeasure that mitigates the leakage in the power traces as shown in the results of Welch’s t-test statistics. The hardware area overhead of the proposed countermeasure is only 14% and is designed with compact implementation in mind. https://doi.org/10.1145/2934677Summary: [Article Title: Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures/ Prasenjit Chakraborty,Preeti Ranjan Panda and Sandeep Sen, p. 12:1-12:25] Abstract: Scratchpad memory (SPM) is considered a useful component in the memory hierarchy, solely or along with caches, for meeting the power and energy constraints as performance ceases to be the sole criteria for processor design. Although the efficiency of SPM is well known, its use has been restricted owing to difficulties in programmability. Real applications usually have regions that are amenable to exploitation by either SPM or cache and hence can benefit if the two are used in conjunction. Dynamically adjusting the local memory resources to suit application demand can significantly improve the efficiency of the overall system. In this article, we propose a compiler technique to map application data objects to the SPM-cache and also partition the local memory between the SPM and cache depending on the dynamic requirement of the application. First, we introduce a novel graph-based structure to tackle data allocation in an application. Second, we use this to present a data allocation heuristic to map program objects for a fixed-size SPM-cache hybrid system that targets whole program optimization. We finally extend this formulation to adapt the SPM and cache sizes, as well as the data allocation as per the requirement of different application regions. We study the applicability of the technique on various workloads targeted at both SPM-only and hardware reconfigurable memory systems, observing an average of 18% energy-delay improvement over state-of-the-art techniques. https://doi.org/10.1145/2934680Summary: [Article Title: Genetic-Algorithm-Based FPGA Architectural Exploration Using Analytical Models/ Hossein Mehri and Bijan Alizadeh, p. 13:1-13:17] Abstract: FPGA architectural optimization has emerged as one of the most important digital design challenges. In recent years, experimental methods have been replaced by analytical ones to find the optimized architecture. Time is the main reason for this replacement. Conventional Geometric Programming (GP) is a routine framework to solve analytical models, including area, delay, and power models. In this article, we discuss the application of the Genetic Algorithm (GA) to the design of FPGA architectures. The performance model has been integrated into the Genetic Algorithm framework in order to investigate the impact of various architectural parameters on the performance efficiency of FPGAs. This way, we are able to rapidly analyze FPGA architectures and select the best one. The main advantages of using GA versus GP are concurrency and speed. The results show that concurrent optimization of high-level architecture parameters, including lookup table size (K) and cluster size (N), and low-level parameters, like scaling of transistors, is possible for GA, whereas GP does not capture K and N under its concurrency and it needs to exhaustively search all possible combinations of K and N. The results also show that more than two orders of magnitude in runtime improvement in comparison with GP-based analysis is achieved. https://doi.org/10.1145/2939372Summary: [Article Title: Hybrid Power Management for Office Equipment/ Ganesh Gingade,Wenyi Chen,Yung-Hsiang Lu,Jan Allebach and Hernan Ildefonso Gutierrez-Vazquez, p. 14:1-14:22] Abstract: Office machines (such as printers, scanners, facsimile machines, and copiers) can consume significant amounts of power. Most office machines have sleep modes to save power. Power management of these machines is usually timeout-based: a machine sleeps after being idle long enough. Setting the time-out duration can be difficult: if it is too long, the machine wastes power during idleness. If it is too short, the machine sleeps too soon and too often—the wake-up delay can significantly degrade productivity. Thus, power management is a tradeoff between saving energy and keeping response time short. Many power management policies have been published and one policy may outperform another in some scenarios. There is no definite conclusion regarding which policy is always better. This article describes two methods for office equipment power management. The first method adaptively reduces power based on a constraint of the wake-up delay. The second is a hybrid method with multiple candidate policies and it selects the most appropriate power management policy. Using 6 months of request traces from 18 different printers, we demonstrate that the hybrid policy outperforms individual policies. We also discover that power management based on business hours does not produce consistent energy savings. https://doi.org/10.1145/2910582Summary: [Article Title: Probabilistic Model Checking for Uncertain Scenario-Aware Data Flow/ Joost-Pieter Katoen and Hao Wu, p. 15:1-15:27] Abstract: The Scenario-Aware Dataflow (SADF) model is based on concurrent actors that interact via channels. It combines streaming data and control to capture scenarios while incorporating hard and soft real-time aspects. To model data-flow computations that are subject to uncertainty, SADF models are equipped with random primitives. We propose to use probabilistic model checking to analyze uncertain SADF models. We show how measures such as expected time, long-run objectives like throughput, as well as timed reachability—can a given system configuration be reached within a deadline with high probability?—can be automatically determined. The crux of our method is a compositional semantics of SADF with exponential agent execution times combined with automated abstraction techniques akin to partial-order reduction. We present the semantics in detail and show how it accommodates the incorporation of execution platforms, enabling the analysis of energy consumption. The feasibility of our approach is illustrated by analyzing several quantitative measures of an MPEG-4 decoder and an industrial face recognition application. https://doi.org/10.1145/2914788Summary: [Article Title: DReAM: An Approach to Estimate per-Task DRAM Energy in Multicore Systems/ Qixiao Liu,Miquel Moreto,Jaume Abella,Francisco J. Cazorla and Mateo Valero, p. 16:1-16:26] Abstract: Accurate per-task energy estimation in multicore systems would allow performing per-task energy-aware task scheduling and energy-aware billing in data centers, among other applications. Per-task energy estimation is challenged by the interaction between tasks in shared resources, which impacts tasks’ energy consumption in uncontrolled ways. Some accurate mechanisms have been devised recently to estimate per-task energy consumed on-chip in multicores, but there is a lack of such mechanisms for DRAM memories. This article makes the case for accurate per-task DRAM energy metering in multicores, which opens new paths to energy/performance optimizations. In particular, the contributions of this article are (i) an ideal per-task energy metering model for DRAM memories; (ii) DReAM, an accurate yet low cost implementation of the ideal model (less than 5% accuracy error when 16 tasks share memory); and (iii) a comparison with standard methods (even distribution and access-count based) proving that DReAM is much more accurate than these other methods. https://doi.org/10.1145/2939370Summary: [Article Title: Non-enumerative Generation of Path Delay Distributions and Its Application to Critical Path Selection/ Ahish Mysore Somashekar,Spyros Tragoudas,Rathish Jayabharathi and Sreenivas Gangadhar, p. 17:1-17:21] Abstract: A Monte Carlo-based approach is proposed capable of identifying in a non-enumerative and scalable manner the distributions that describe the delay of every path in a combinational circuit. Furthermore, a scalable approach to select critical paths from a potentially exponential number of path candidates is presented. Paths and their delay distributions are stored in Zero Suppressed Binary Decision Diagrams. Experimental results on some of the largest ISCAS-89 and ITC-99 benchmarks shows that the proposed method is highly scalable and effective. https://doi.org/10.1145/2940327Summary: [Article Title: An Adaptive Demand-Based Caching Mechanism for NAND Flash Memory Storage Systems/ Yi Wang,Zhiwei Qin,Renhai Chen,Zili Shao and Laurence T. Yang, p. 18:1-18:22] Abstract: During past decades, the capacity of NAND flash memory has been increasing dramatically, leading to the use of nonvolatile flash in the system’s memory hierarchy. The increasing capacity of NAND flash memory introduces a large RAM footprint to store the logical to physical address mapping. The demand-based approach can effectively reduce and well control the RAM footprint. However, extra address translation overhead is also introduced which may degrade the system performance. https://doi.org/10.1145/2947658Summary: [Article Title: ERfair Scheduler with Processor Suspension for Real-Time Multiprocessor Embedded Systems/ Piyoosh Purushothaman Nair,Arnab Sarkar,N. M. Harsha,Megha Gandhi,P. P. Chakrabarti and Sujoy Ghose, p. 19:1-19:25] Abstract: Proportional fair schedulers with their ability to provide optimal schedulability along with hard timeliness and quality-of-service guarantees on multiprocessors form an attractive alternative in real-time embedded systems that concurrently run a mix of independent applications with varying timeliness constraints. This article presents ERfair Scheduler with Suspension on Multiprocessors (ESSM), an efficient, optimal proportional fair scheduler that attempts to reduce system wide energy consumption by locally maximizing the processor suspension intervals while not sacrificing the ERfairness timing constraints of the system. The proposed technique takes advantage of higher execution rates of tasks in underloaded ERfair systems and uses a procrastination scheme to search for time points within the schedule where suspension intervals are locally maximal. Evaluation results reveal that ESSM achieves good sleep efficiency and provides up to 50% higher effective total sleep durations as compared to the Basic-ERfair scheduler on systems consisting of 2 to 20 processors. https://doi.org/10.1145/2948979

Item type:

Serials

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes ( 40 )
Comments ( 0 )

Item type	Current library	Home library	Collection	Shelving location	Call number	Copy number	Status	Date due	Barcode
Serials	LRC - Main	National University - Manila	Gen. Ed. - CCIT	Periodicals	ACM Transactions on Design Automation of Electronic Systems, Volume 22, Issue 1, 2017 (Browse shelf (Opens below))	c.1	Available		PER000000529

Includes bibliographical references.

[Article Title: Hierarchical Dynamic Thermal Management Method for High-Performance Many-Core Microprocessors/ Hai Wang,Jian Ma,Sheldon X.-D. Tan,Chi Zhang,He Tang,Keheng Huang and Zhenghong Zhang, p. 1:1-1:21]

Abstract: It is challenging to manage the thermal behavior of many-core microprocessors while still keeping them running at high performance since the control complexity increases as the core number increases. In this article, a novel hierarchical dynamic thermal management method is proposed to overcome this challenge. The new method employs model predictive control (MPC) with task migration and a DVFS scheme to ensure smooth control behavior and negligible computing performance sacrifice. In order to be scalable to many-core systems, the hierarchical control scheme is designed with two levels. At the lower level, the cores are spatially clustered into blocks, and local task migration is used to match current power distribution with the optimal distribution calculated by MPC. At the upper level, global task migration is used with the unmatched powers from the lower level. A modified iterative minimum cut algorithm is used to assist the task migration decision making if the power number is large at the upper level. Finally, DVFS is applied to regulate the remaining unmatched powers. Experiments show that the new method outperforms existing methods and is very scalable to manage many-core microprocessors with small performance degradation.

https://doi.org/10.1145/2891409

[Article Title: Error-Correcting Sample Preparation with Cyberphysical Digital Microfluidic Lab-on-Chip/ Sudip Poddar,Sarmishtha Ghoshal,Krishnendu Chakrabarty and Bhargab B. Bhattacharya, p. 2:1-2:29]

Abstract: Digital (droplet-based) microfluidic technology offers an attractive platform for implementing a wide variety of biochemical laboratory protocols, such as point-of-care diagnosis, DNA analysis, target detection, and drug discovery. A digital microfluidic biochip consists of a patterned array of electrodes on which tiny fluid droplets are manipulated by electrical actuation sequences to perform various fluidic operations, for example, dispense, transport, mix, or split. However, because of the inherent uncertainty of fluidic operations, the outcome of biochemical experiments performed on-chip can be erroneous even if the chip is tested a priori and deemed to be defect-free. In this article, we address an important error recoverability problem in the context of sample preparation. We assume a cyberphysical environment, in which the physical errors, when detected online at selected checkpoints with integrated sensors, can be corrected through recovery techniques. However, almost all prior work on error recoverability used checkpointing-based rollback approach, that is, re-execution of certain portions of the protocol starting from the previous checkpoint. Unfortunately, such techniques are expensive both in terms of assay completion time and reagent cost, and can never ensure full error-recovery in deterministic sense. We consider imprecise droplet mix-split operations and present a novel roll-forward approach where the erroneous droplets, thus produced, are used in the error-recovery process, instead of being discarded or remixed. All erroneous droplets participate in the dilution process and they mutually cancel or reduce the concentration-error when the target droplet is reached. We also present a rigorous analysis that reveals the role of volumetric-error on the concentration of a sample to be prepared, and we describe the layout of a lab-on-chip that can execute the proposed cyberphysical dilution algorithm. Our analysis reveals that fluidic errors caused by unbalanced droplet splitting can be classified as being either critical or non-critical, and only those of the former type require correction to achieve error-free sample dilution. Simulation experiments on various sample preparation test cases demonstrate the effectiveness of the proposed method.

https://doi.org/10.1145/2898999

[Article Title: State Assignment and Optimization of Ultra-High-Speed FSMs Utilizing Tristate Buffers/ Robert Czerwinski, and Dariusz Kania, p. 3:1-3:25]

Abstract: The logic synthesis of ultra-high-speed FSMs is presented. The state assignment is based on a well-known method that uses output vectors. This technique is adjusted to include elements of two-level minimization and takes into account the limited number of terms contained in the programmable-AND/fixed-OR logic cell. The state assignment is based on a special form of the binary decision tree. The second phase of the FSM design is logic optimization. The optimization method is based on tristate buffers, thus making possible a one-logic-level FSM structure. The key point is to search partition variables that control the tristate buffers. This technique can also be applied to combinational circuits or the output block of FSMs only.

Algorithms for state assignment and optimization are presented and richly illustrated by examples. The method is dedicated to using specific features of complex programmable logic devices. Experimental results prove its effectiveness (e.g., the implementation of the the 16-bit counter requires 136 logic cells and one-logic-cell level instead of 213 cells and four levels). The optimization method using tristate buffers and a state assignment binary decision tree can be directly applied to FPGA-dedicated logic synthesis.

https://doi.org/10.1145/2905366

[Article Title: A Framework for Block Placement, Migration, and Fast Searching in Tiled-DNUCA Architecture/ Shirshendu Das and Hemangee K. Kapoor, p. 4:1-4:26]

Abstract: Multicore processors have proliferated several domains ranging from small-scale embedded systems to large data centers, making tiled CMPs (TCMPs) the essential next-generation scalable architecture. NUCA architectures help in managing the capacity and access time for such larger cache designs. It divides the last-level cache (LLC) into multiple banks connected through an on-chip network. Static NUCA (SNUCA) has a fixed address mapping policy, whereas dynamic NUCA (DNUCA) allows blocks to relocate nearer to the processing cores at runtime. To allow this, DNUCA divides the banks into multiple banksets and a block can be placed in any bank within a particular bankset. The entire bankset may need to be searched to access a block. Optimal bankset searching mechanisms are essential for getting the benefits from DNUCA.

This article proposes a DNUCA-based TCMP architecture called TLD-NUCA. It reduces the LLC access time of TCMP and also allows a heavily loaded bank to distribute its load among the underused banks. Instead of other DNUCA designs, TLD-NUCA considers larger banksets. Such relaxations result in more uniform load distribution than existing DNUCA-based TCMP (T-DNUCA). Considering larger banksets improves the utilization factor, but T-DNUCA cannot implement it because of its expensive searching mechanism. TLD-NUCA uses a centralized directory, called TLD, to search a block from all the banks. Also, the proposed block placement policy reduces the instances when the central TLD needs to be contacted. It does not require the expensive simultaneous search as needed by T-DNUCA. Better cache utilization and a reduction in LLC access time improve the miss rate as well as the average memory access time (AMAT). Improving the miss rate and AMAT results in improvements in cycles per instructions (CPI). Experimental analysis found that TLD-NUCA improves performance by 6.5% as compared to T-DNUCA. The improvement is 13% as compared to the SNUCA-based TCMP design.

https://doi.org/10.1145/2907946

[Article Title: Obstacle-Avoiding Wind Turbine Placement for Power Loss and Wake Effect Optimization/ Yu-Wei Wu,Yiyu Shi,Sudip Roy and Tsung-Yi Ho, p. 5:1-5:24]

Abstract: As finite energy resources are being consumed at faster rate than they can be replaced, renewable energy resources have drawn extensive attention. Wind power development is one such example growing significantly throughout the world. The main difficulty in wind power development is that wind turbines interfere with each other. The produced turbulence—wake effect—directly reduces the power generation. In addition, wirelength of the collection network among wind turbines is not merely an economic factor; it also decides power loss on the wind farm. Moreover, in reality, obstacles (buildings, lakes, etc.) exist on the wind farm, which are unavoidable. Nevertheless, to the best of our knowledge, none of the existing works consider wake effect, wirelength, and avoidance of obstacles all together in the wind turbine placement problem. In this article, we propose an analytical method to obtain the obstacle-avoiding placement of wind turbines, thus minimizing both power loss and wake effect. We also propose a postprocessing method to fine-tune the solution obtained from the analytical method to find a better solution. Simulation results show that our tool is 12x faster than the state-of-the-art industrial tool AWS OpenWind and 203x faster than the state-of-the-art academic tool TDA with almost the same produced power.

https://doi.org/10.1145/2905365

[Article Title: Hardware Trojans: Lessons Learned after One Decade of Research/ K. Xiao,D. Forte,Y. Jin,R. Karri,S. Bhunia and M. Tehranipoor, p. 6:1-6:23]

Abstract: Given the increasing complexity of modern electronics and the cost of fabrication, entities from around the globe have become more heavily involved in all phases of the electronics supply chain. In this environment, hardware Trojans (i.e., malicious modifications or inclusions made by untrusted third parties) pose major security concerns, especially for those integrated circuits (ICs) and systems used in critical applications and cyber infrastructure. While hardware Trojans have been explored significantly in academia over the last decade, there remains room for improvement. In this article, we examine the research on hardware Trojans from the last decade and attempt to capture the lessons learned. A comprehensive adversarial model taxonomy is introduced and used to examine the current state of the art. Then the past countermeasures and publication trends are categorized based on the adversarial model and topic. Through this analysis, we identify what has been covered and the important problems that are underinvestigated. We also identify the most critical lessons for those new to the field and suggest a roadmap for future hardware Trojan research.

https://doi.org/10.1145/2906147

[Article Title: Periodic Scan-In States to Reduce the Input Test Data Volume for Partially Functional Broadside Tests/ Irith Pomeranz, p. 7:1-7:22]

Abstract: This article describes a procedure for test data compression targeting functional and partially functional broadside tests. The scan-in state of such a test is either a reachable state or has a known Hamming distance from a reachable state. Reachable states are fully specified, while the popular LFSR-based test data compression methods require the use of incompletely specified test cubes. The test data compression approach considered in this article is based on the use of periodic scan-in states. Such states require the storage of a period that can be significantly shorter than a scan-in state, thus providing test data compression. The procedure computes a set of periods that is sufficient for detecting all the detectable target faults. Considering the scan-in states that the periods produce, the procedure ranks the periods based on the distances of the scan-in states from reachable states, and the lengths of the periods. Functional and partially functional broadside tests are generated preferring shorter periods with smaller Hamming distances. The results are compared with those of an LFSR-based approach.

https://doi.org/10.1145/2911983

[Article Title: Efficient Security Monitoring with the Core Debug Interface in an Embedded Processor/ Jinyong Lee,Ingoo Heo,Yongje Lee and Yunheung Paek, p. 8:1-8:29]

Abstract: For decades, various concepts in security monitoring have been proposed. In principle, they all in common in regard to the monitoring of the execution behavior of a program (e.g., control-flow or dataflow) running on the machine to find symptoms of attacks. Among the proposed monitoring schemes, software-based ones are known for their adaptability on the commercial products, but there have been concerns that they may suffer from nonnegligible runtime overhead. On the other hand, hardware-based solutions are recognized for their high performance. However, most of them have an inherent problem in that they usually mandate drastic changes to the internal processor architecture. More recent ones have strived to minimize such modifications by employing external hardware security monitors in the system. However, these approaches intrinsically suffer from the overhead caused by communication between the host and the external monitor. Our solution also relies on external hardware for security monitoring, but unlike the others, ours tackles the communication overhead by using the core debug interface (CDI), which is readily available in most commercial processors for debugging. We build our system simply by plugging our monitoring hardware into the processor via CDI, precluding the need for altering the processor internals. To validate the effectiveness of our approach, we implement two well-known monitoring techniques on our proposed framework: dynamic information flow tracking and branch regulation. The experimental results on our FPGA prototype show that our external hardware monitors efficiently perform monitoring tasks with negligible performance overhead, mainly with thanks to the support of CDI, which helps us reduce communication costs substantially.

https://doi.org/10.1145/2907611

[Article Title: Improving PCM Endurance with a Constant-Cost Wear Leveling Design/ Yu-Ming Chang,Pi-Cheng Hsiu,Yuan-Hao Chang,Chi-Hao Chen,Tei-Wei Kuo and Cheng-Yuan Michael Wang, p. 9:1-9:27]

Abstract: Improving PCM endurance is a fundamental issue when it is considered as an alternative to replace DRAM as main memory. Memory-based wear leveling (WL) is an effective way to improve PCM endurance, but its major challenge is how to efficiently determine the appropriate memory pages for allocation or swapping. In this article, we present a constant-cost WL design that is compatible with existing memory management. Two implementations, namely bucket-based and array-based WL, with constant-time (or nearly zero) search cost are proposed to be integrated into the OS layer and the hardware layer, respectively, as well as to trade between time and space complexity. The results of experiments conducted based on an implementation in Android, as well as simulations with popular benchmarks, to evaluate the effectiveness of the proposed design are very encouraging.

https://doi.org/10.1145/2905364

[Article Title: Ripple 2.0: Improved Movement of Cells in Routability-Driven Placement/ Xu He,Yao Wang,Yang Guo and Evangeline F. Y. Young, p. 10:1-10:26]

Abstract: Routability is one of the most important problems in high-performance circuit designs. From the viewpoint of placement design, two major factors cause routing congestion: (i) interconnections between cells and (ii) connections on macro blockages. In this article, we present a routability-driven placer, Ripple 2.0, which emphasizes both kinds of routing congestion. Several techniques will be presented, including (i) cell inflation with routing path consideration, (ii) congested cluster optimization, (iii) routability-driven cell spreading, and (iv) simultaneous routing and placement for routability refinement. With the official evaluation protocol, Ripple 2.0 outperforms other published academic routability-driven placers. Compared with top results in the ICCAD 2012 contest, Ripple 2.0 achieves a better detailed routing solution obtained by a commercial router.

https://doi.org/10.1145/2925989

[Article Title: A Compact Implementation of Salsa20 and Its Power Analysis Vulnerabilities/ Bodhisatwa Mazumdar,Sk. Subidh Ali and Ozgur Sinanoglu, p. 11:1-11:26]

Abstract: In this article, we present a compact implementation of the Salsa20 stream cipher that is targeted towards lightweight cryptographic devices such as radio-frequency identification (RFID) tags. The Salsa20 stream cipher, ann addition-rotation-XOR (ARX) cipher, is used for high-security cryptography in NEON instruction sets embedded in ARM Cortex A8 CPU core-based tablets and smartphones. The existing literature shows that although classical cryptanalysis has been effective on reduced rounds of Salsa20, the stream cipher is immune to software side-channel attacks such as branch timing and cache timing attacks. To the best of our knowledge, this work is the first to perform hardware power analysis attacks, where we evaluate the resistance of all eight keywords in the proposed compact implementation of Salsa20. Our technique targets the three subrounds of the first round of the implemented Salsa20. The correlation power analysis (CPA) attack has an attack complexity of 219. Based on extensive experiments on a compact implementation of Salsa20, we demonstrate that all these keywords can be recovered within 20,000 queries on Salsa20. The attacks show a varying resilience of the key words against CPA that has not yet been observed in any stream or block cipher in the present literature. This makes the architecture of this stream cipher interesting from the side-channel analysis perspective. Also, we propose a lightweight countermeasure that mitigates the leakage in the power traces as shown in the results of Welch’s t-test statistics. The hardware area overhead of the proposed countermeasure is only 14% and is designed with compact implementation in mind.

https://doi.org/10.1145/2934677

[Article Title: Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based Architectures/ Prasenjit Chakraborty,Preeti Ranjan Panda and Sandeep Sen, p. 12:1-12:25]

Abstract: Scratchpad memory (SPM) is considered a useful component in the memory hierarchy, solely or along with caches, for meeting the power and energy constraints as performance ceases to be the sole criteria for processor design. Although the efficiency of SPM is well known, its use has been restricted owing to difficulties in programmability. Real applications usually have regions that are amenable to exploitation by either SPM or cache and hence can benefit if the two are used in conjunction. Dynamically adjusting the local memory resources to suit application demand can significantly improve the efficiency of the overall system. In this article, we propose a compiler technique to map application data objects to the SPM-cache and also partition the local memory between the SPM and cache depending on the dynamic requirement of the application. First, we introduce a novel graph-based structure to tackle data allocation in an application. Second, we use this to present a data allocation heuristic to map program objects for a fixed-size SPM-cache hybrid system that targets whole program optimization. We finally extend this formulation to adapt the SPM and cache sizes, as well as the data allocation as per the requirement of different application regions. We study the applicability of the technique on various workloads targeted at both SPM-only and hardware reconfigurable memory systems, observing an average of 18% energy-delay improvement over state-of-the-art techniques.

https://doi.org/10.1145/2934680

[Article Title: Genetic-Algorithm-Based FPGA Architectural Exploration Using Analytical Models/ Hossein Mehri and Bijan Alizadeh, p. 13:1-13:17]

Abstract: FPGA architectural optimization has emerged as one of the most important digital design challenges. In recent years, experimental methods have been replaced by analytical ones to find the optimized architecture. Time is the main reason for this replacement. Conventional Geometric Programming (GP) is a routine framework to solve analytical models, including area, delay, and power models. In this article, we discuss the application of the Genetic Algorithm (GA) to the design of FPGA architectures. The performance model has been integrated into the Genetic Algorithm framework in order to investigate the impact of various architectural parameters on the performance efficiency of FPGAs. This way, we are able to rapidly analyze FPGA architectures and select the best one. The main advantages of using GA versus GP are concurrency and speed. The results show that concurrent optimization of high-level architecture parameters, including lookup table size (K) and cluster size (N), and low-level parameters, like scaling of transistors, is possible for GA, whereas GP does not capture K and N under its concurrency and it needs to exhaustively search all possible combinations of K and N. The results also show that more than two orders of magnitude in runtime improvement in comparison with GP-based analysis is achieved.

https://doi.org/10.1145/2939372

[Article Title: Hybrid Power Management for Office Equipment/ Ganesh Gingade,Wenyi Chen,Yung-Hsiang Lu,Jan Allebach and Hernan Ildefonso Gutierrez-Vazquez, p. 14:1-14:22]

Abstract: Office machines (such as printers, scanners, facsimile machines, and copiers) can consume significant amounts of power. Most office machines have sleep modes to save power. Power management of these machines is usually timeout-based: a machine sleeps after being idle long enough. Setting the time-out duration can be difficult: if it is too long, the machine wastes power during idleness. If it is too short, the machine sleeps too soon and too often—the wake-up delay can significantly degrade productivity. Thus, power management is a tradeoff between saving energy and keeping response time short. Many power management policies have been published and one policy may outperform another in some scenarios. There is no definite conclusion regarding which policy is always better. This article describes two methods for office equipment power management. The first method adaptively reduces power based on a constraint of the wake-up delay. The second is a hybrid method with multiple candidate policies and it selects the most appropriate power management policy. Using 6 months of request traces from 18 different printers, we demonstrate that the hybrid policy outperforms individual policies. We also discover that power management based on business hours does not produce consistent energy savings.

https://doi.org/10.1145/2910582

[Article Title: Probabilistic Model Checking for Uncertain Scenario-Aware Data Flow/ Joost-Pieter Katoen and Hao Wu, p. 15:1-15:27]

Abstract: The Scenario-Aware Dataflow (SADF) model is based on concurrent actors that interact via channels. It combines streaming data and control to capture scenarios while incorporating hard and soft real-time aspects. To model data-flow computations that are subject to uncertainty, SADF models are equipped with random primitives. We propose to use probabilistic model checking to analyze uncertain SADF models. We show how measures such as expected time, long-run objectives like throughput, as well as timed reachability—can a given system configuration be reached within a deadline with high probability?—can be automatically determined. The crux of our method is a compositional semantics of SADF with exponential agent execution times combined with automated abstraction techniques akin to partial-order reduction. We present the semantics in detail and show how it accommodates the incorporation of execution platforms, enabling the analysis of energy consumption. The feasibility of our approach is illustrated by analyzing several quantitative measures of an MPEG-4 decoder and an industrial face recognition application.

https://doi.org/10.1145/2914788

[Article Title: DReAM: An Approach to Estimate per-Task DRAM Energy in Multicore Systems/ Qixiao Liu,Miquel Moreto,Jaume Abella,Francisco J. Cazorla and Mateo Valero, p. 16:1-16:26]

Abstract: Accurate per-task energy estimation in multicore systems would allow performing per-task energy-aware task scheduling and energy-aware billing in data centers, among other applications. Per-task energy estimation is challenged by the interaction between tasks in shared resources, which impacts tasks’ energy consumption in uncontrolled ways. Some accurate mechanisms have been devised recently to estimate per-task energy consumed on-chip in multicores, but there is a lack of such mechanisms for DRAM memories. This article makes the case for accurate per-task DRAM energy metering in multicores, which opens new paths to energy/performance optimizations. In particular, the contributions of this article are (i) an ideal per-task energy metering model for DRAM memories; (ii) DReAM, an accurate yet low cost implementation of the ideal model (less than 5% accuracy error when 16 tasks share memory); and (iii) a comparison with standard methods (even distribution and access-count based) proving that DReAM is much more accurate than these other methods.

https://doi.org/10.1145/2939370

[Article Title: Non-enumerative Generation of Path Delay Distributions and Its Application to Critical Path Selection/ Ahish Mysore Somashekar,Spyros Tragoudas,Rathish Jayabharathi and Sreenivas Gangadhar, p. 17:1-17:21]

Abstract: A Monte Carlo-based approach is proposed capable of identifying in a non-enumerative and scalable manner the distributions that describe the delay of every path in a combinational circuit. Furthermore, a scalable approach to select critical paths from a potentially exponential number of path candidates is presented. Paths and their delay distributions are stored in Zero Suppressed Binary Decision Diagrams. Experimental results on some of the largest ISCAS-89 and ITC-99 benchmarks shows that the proposed method is highly scalable and effective.

https://doi.org/10.1145/2940327

[Article Title: An Adaptive Demand-Based Caching Mechanism for NAND Flash Memory Storage Systems/ Yi Wang,Zhiwei Qin,Renhai Chen,Zili Shao and Laurence T. Yang, p. 18:1-18:22]

Abstract: During past decades, the capacity of NAND flash memory has been increasing dramatically, leading to the use of nonvolatile flash in the system’s memory hierarchy. The increasing capacity of NAND flash memory introduces a large RAM footprint to store the logical to physical address mapping. The demand-based approach can effectively reduce and well control the RAM footprint. However, extra address translation overhead is also introduced which may degrade the system performance.

https://doi.org/10.1145/2947658

[Article Title: ERfair Scheduler with Processor Suspension for Real-Time Multiprocessor Embedded Systems/ Piyoosh Purushothaman Nair,Arnab Sarkar,N. M. Harsha,Megha Gandhi,P. P. Chakrabarti and Sujoy Ghose, p. 19:1-19:25]

Abstract: Proportional fair schedulers with their ability to provide optimal schedulability along with hard timeliness and quality-of-service guarantees on multiprocessors form an attractive alternative in real-time embedded systems that concurrently run a mix of independent applications with varying timeliness constraints. This article presents ERfair Scheduler with Suspension on Multiprocessors (ESSM), an efficient, optimal proportional fair scheduler that attempts to reduce system wide energy consumption by locally maximizing the processor suspension intervals while not sacrificing the ERfairness timing constraints of the system. The proposed technique takes advantage of higher execution rates of tasks in underloaded ERfair systems and uses a procrastination scheme to search for time points within the schedule where suspension intervals are locally maximal. Evaluation results reveal that ESSM achieves good sleep efficiency and provides up to 50% higher effective total sleep durations as compared to the Basic-ERfair scheduler on systems consisting of 2 to 20 processors.

https://doi.org/10.1145/2948979

There are no comments on this title.

to post a comment.