ACM Transactions on Design Automation of Electronic Systems
Material type:

Item type | Current library | Home library | Collection | Shelving location | Call number | Copy number | Status | Date due | Barcode |
---|---|---|---|---|---|---|---|---|---|
![]() |
LRC - Main | National University - Manila | Gen. Ed. - CCIT | Periodicals | ACM Transactions on Design Automation of Electronic Systems, Volume 22, Issue 3, 2017 (Browse shelf (Opens below)) | c.1 | Available | PER000000531 |
Includes bibliographical references.
HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs -- PeaPaw: Performance and Energy-Aware Partitioning of Workload on Heterogeneous Platforms -- CDTA: A Comprehensive Solution for Counterfeit Detection, Traceability, and Authentication in the IoT Supply Chain -- Generation of Transparent-Scan Sequences for Diagnosis of Scan Chain Faults -- Application-Specific Residential Microgrid Design Methodology -- Layer Assignment of Escape Buses with Consecutive Constraints in PCB Designs -- Leak Stopper: An Actively Revitalized Snoop Filter Architecture with Effective Generation Control -- Topological Approach to Automatic Symbolic Macromodel Generation for Analog Integrated Circuits -- Content-Aware Bit Shuffling for Maximizing PCM Endurance -- SSAGA: SMs Synthesized for Asymmetric GPGPU Applications -- Low-Power Clock Tree Synthesis for 3D-ICs -- TEI-power: Temperature Effect Inversion--Aware Dynamic Thermal Management -- Using CoreSight PTM to Integrate CRA Monitoring IPs in an ARM-Based SoC -- Fundamental Challenges Toward Making the IoT a Reachable Reality: A Model-Centric Investigation -- Obfuscation-Based Protection Framework against Printed Circuit Boards Unauthorized Operation and Reverse Engineering -- A Fast Hierarchical Adaptive Analog Routing Algorithm Based on Integer Linear Programming -- A Single-Tier Virtual Queuing Memory Controller Architecture for Heterogeneous MPSoCs -- Accelerated Soft-Error-Rate (SER) Estimation for Combinational and Sequential Circuits.
[Article Title: HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs/ Jaehyun Park,Seungcheol Baek,Hyung Gyu Lee,Chrysostomos Nicopoulos,Vinson Young,Junghee Lee and Jongman Kim, p. 40:1-40:25]
Abstract: Data compression plays a pivotal role in improving system performance and reducing energy consumption, because it increases the logical effective capacity of a compressed memory system without physically increasing the memory size. However, data compression techniques incur some cost, such as non-negligible compression and decompression overhead. This overhead becomes more severe if compression is used in the cache. In this article, we aim to minimize the read-hit decompression penalty in compressed Last-Level Caches (LLCs) by speculatively decompressing frequently used cachelines. To this end, we propose a Hot-cacheline Prediction and Early decompression (HoPE) mechanism that consists of three synergistic techniques: Hot-cacheline Prediction (HP), Early Decompression (ED), and Hit-history-based Insertion (HBI). HP and HBI efficiently identify the hot compressed cachelines, while ED selectively decompresses hot cachelines, based on their size information. Unlike previous approaches, the HoPE framework considers the performance balance/tradeoff between the increased effective cache capacity and the decompression penalty. To evaluate the effectiveness of the proposed HoPE mechanism, we run extensive simulations on memory traces obtained from multi-threaded benchmarks running on a full-system simulation framework. We observe significant performance improvements over compressed cache schemes employing the conventional Least-Recently Used (LRU) replacement policy, the Dynamic Re-Reference Interval Prediction (DRRIP) scheme, and the Effective Capacity Maximizer (ECM) compressed cache management mechanism. Specifically, HoPE exhibits system performance improvements of approximately 11%, on average, over LRU, 8% over DRRIP, and 7% over ECM by reducing the read-hit decompression penalty by around 65%, over a wide range of applications.
https://doi.org/10.1145/2999538
[Article Title: PeaPaw: Performance and Energy-Aware Partitioning of Workload on Heterogeneous Platforms/ Li Tang,Richard F. Barrett,Jeanine Cook and X. Sharon Hu, p. 41:1-41:26]
Abstract: Performance and energy are two major concerns for application development on heterogeneous platforms. It is challenging for application developers to fully exploit the performance/energy potential of heterogeneous platforms. One reason is the lack of reliable prediction of the system’s performance/energy before application implementation. Another reason is that a heterogeneous platform presents a large design space for workload partitioning between different processors. To reduce such development cost, this article proposes a framework, PeaPaw, to assist application developers to identify a workload partition (WP) that has high potential leading to high performance or energy efficiency before actual implementation. The PeaPaw framework includes both analytical performance/energy models and two sets of workload partitioning guidelines. Based on the design goal, application developers can obtain a workload partitioning guideline from PeaPaw for a given platform and follow it to design one or multiple WPs for a given workload. Then PeaPaw can be used to estimate the performance/energy of the designed WPs, and the WP with the best estimated performance/energy can be selected for actual implementation. To demonstrate the effectiveness of PeaPaw, we have conducted three case studies. Results from these case studies show that PeaPaw can faithfully estimate the performance/energy relationships of WPs and provide effective workload partitioning guidelines.
https://doi.org/10.1145/2999540
[Article Title: CDTA: A Comprehensive Solution for Counterfeit Detection, Traceability, and Authentication in the IoT Supply Chain/ Kun Yang,Domenic Forte and Mark M. Tehranipoor, p. 42:1-42:31]
Abstract: The Internet of Things (IoT) is transforming the way we live and work by increasing the connectedness of people and things on a scale that was once unimaginable. However, the vulnerabilities in the IoT supply chain have raised serious concerns about the security and trustworthiness of IoT devices and components within them. Testing for device provenance, detection of counterfeit integrated circuits (ICs) and systems, and traceability of IoT devices are challenging issues to address. In this article, we develop a novel radio-frequency identification (RFID)-based system suitable for counterfeit detection, traceability, and authentication in the IoT supply chain called CDTA. CDTA is composed of different types of on-chip sensors and in-system structures that collect necessary information to detect multiple counterfeit IC types (recycled, cloned, etc.), track and trace IoT devices, and verify the overall system authenticity. Central to CDTA is an RFID tag employed as storage and a channel to read the information from different types of chips on the printed circuit board (PCB) in both power-on and power-off scenarios. CDTA sensor data can also be sent to the remote server for authentication via an encrypted Ethernet channel when the IoT device is deployed in the field. A novel board ID generator is implemented by combining outputs of physical unclonable functions (PUFs) embedded in the RFID tag and different chips on the PCB. A light-weight RFID protocol is proposed to enable mutual authentication between RFID readers and tags. We also implement a secure interchip communication on the PCB. Simulations and experimental results using Spartan 3E FPGAs demonstrate the effectiveness of this system. The efficiency of the radio-frequency (RF) communication has also been verified via a PCB prototype with a printed slot antenna.
https://doi.org/10.1145/3005346
[Article Title: Generation of Transparent-Scan Sequences for Diagnosis of Scan Chain Faults/ Irith Pomeranz, p. 43:1-43:17]
Abstract: Diagnosis of scan chain faults is important for yield learning and improvement. Procedures that generate tests for diagnosis of scan chain faults produce scan-based tests with one or more functional capture cycles between a scan-in and a scan-out operation. The approach to test generation referred to as transparent-scan has several advantages in this context. (1) It allows functional capture cycles and scan shift cycles to be interleaved arbitrarily. This increases the flexibility to assign to the scan cells values that are needed for diagnosis. (2) Test generation under transparent-scan considers a circuit model where the scan logic is included explicitly. Consequently, the test generation procedure takes into consideration the full effect of a scan chain fault. It thus produces accurate tests. (3) For the same reason, it can also target faults inside the scan logic. (4) Transparent-scan results in compact test sequences. Compaction is important because of the large volumes of fail data that scan chain faults create. The cost of transparent-scan is that it requires simulation procedures for sequential circuits, and that arbitrary sequences would be applicable to the scan select input. Motivated by the advantages of transparent-scan, and the importance of diagnosing scan chain faults, this article describes a procedure for generating transparent-scan sequences for diagnosis of scan chain faults. The procedure is also applied to produce transparent-scan sequences for diagnosis of faults inside the scan logic.
https://doi.org/10.1145/3007207
[Article Title: Application-Specific Residential Microgrid Design Methodology/ Korosh Vatanparvar and Mohammad Abdullah Al Faruque, p. 44:1-44:21]
Abstract: In power systems, the traditional, non-interactive, and manually controlled power grid has been transformed to a cyber-dominated smart grid. This cyber-physical integration has provided the smart grid with communication, monitoring, computation, and controlling capabilities to improve its reliability, energy efficiency, and flexibility. A microgrid is a localized and semi-autonomous group of smart energy systems that utilizes the above-mentioned capabilities to drive modern technologies such as electric vehicle charging, home energy management, and smart appliances. Design, upgrading, test, and verification of these microgrids can get too complicated to handle manually. The complexity is due to the wide range of solutions and components that are intended to address the microgrid problems. This article presents a novel Model-Based Design (MBD) methodology to model, co-simulate, design, and optimize microgrid and its multi-level controllers. This methodology helps in the design, optimization, and validation of a microgrid for a specific application. The application rules, requirements, and design-time constraints are met in the designed/optimized microgrid while the implementation cost is minimized. Based on our novel methodology, a design automation, co-simulation, and analysis tool, called GridMAT, is implemented. Our experiments have illustrated that implementing a hierarchical controller reduces the average power consumption by 8% and shifts the peak load for cost saving. Moreover, optimizing the microgrid design using our MBD methodology considering smart controllers has decreased the total implementation cost. Compared to the conventional methodology, the cost decreases by 14% and compared to the MBD methodology where smart controllers are not considered, it decreases by 5%.
https://doi.org/10.1145/3007206
[Article Title: Layer Assignment of Escape Buses with Consecutive Constraints in PCB Designs/ Jin-Tai Yan, p. 45:1-45:25]
Abstract: It is important for cost and reliability consideration to minimize the number of the used layers in a PCB design. In this article, given a set of n circular escape buses with their escape directions between two adjacent components and a set of m consecutive constraints on the escape buses, the problem of assigning the given escape buses between two adjacent components onto the minimized layers is first formulated for bus-oriented escape routing. Furthermore, an efficient approach is proposed to minimize the number of the used layers for the given escape buses with the consecutive constraints and assign the escape buses onto the available layers. Compared with Yan's approach [Yan and Chen 2012] for the layer assignment of the linear escape buses with no consecutive constraint and Ma's approach [Ma et al. 2011a] for the layer assignment of the circular escape buses with consecutive constraints, the experimental results show that the proposed approach obtains the same optimal results on the number of the used layers and reduces 43.6% and 90.5% of CPU time for the tested examples on the average, respectively.
https://doi.org/10.1145/3012010
[Article Title: Leak Stopper: An Actively Revitalized Snoop Filter Architecture with Effective Generation Control/ Yin-Chi Peng,Chien-Chih Chen,Hsiang-Jen Tsai,Keng-Hao Yang,Pei-Zhe Huang,Shih-Chieh Chang,Wen-Ben Jone and Tien-Fu Chen, p. 46:1-46:27]
Abstract: To alleviate high energy dissipation of unnecessary snooping accesses, snoop filters have been designed to reduce snoop lookups. These filters have the problem of decreasing filtering efficiency, and thus usually rely on partial or whole filter reset by detecting block evictions. Unfortunately, the reset conditions occur infrequently or unevenly (called passive filter deletion). This work proposes the concept of revitalized snoop filter (RSF) design, which can actively renew the destination filter by employing a generation wrapping-around scheme for various reference behaviors. We further utilize a sampling mechanism for RSF to timely trigger precise filter revitalizations, so that unnecessary RSF flushing can be minimized. The proposed RSF can be integrated to various existent inclusive snoop filters with only a minor change to their designs. We evaluate our proposed design and demonstrate that RSF eliminates 58.6% of snoop energy compared to JETTY on average while inducing only 6.5% of revitalization energy overhead. In addition, RSF eliminates 45.5% of snoop energy compared to stream registers on average and only induces 2.5% of revitalization energy overhead. Overall, these RSFs reduce the total L2 cache energy consumption by 52.1% (58.6% -- 6.5%) as compared to JETTY and by 43% (45.5% -- 2.5%) as compared to stream registers. Furthermore, RSF improves the overall performance by 1% to 1.4% on average compared to JETTY and stream registers for various benchmark suites.
https://doi.org/10.1145/3015770
[Article Title: Topological Approach to Automatic Symbolic Macromodel Generation for Analog Integrated Circuits/ Guoyong Shi,Hanbin Hu and Shuwen Deng, p. 47:1-47:25]
Abstract: In the field of analog integrated circuit (IC) design, small-signal macromodels play indispensable roles for developing design insight and sizing reference. However, the subject of automatically generating symbolic low-order macromodels in human readable circuit form has not been well studied. Traditionally, work has been published on reducing full-scale symbolic transfer functions to simpler forms but without the guarantee of interpretability. On the other hand, methodologies developed for interconnect circuits (mainly resistor-capacitor-inductor (RCL) networks) are not suitable for analog ICs. In this work, a topological reduction method is introduced that is able to automatically generate interpretable macromodel circuits in symbolic form; that is, the circuit elements in the compact model maintain analytical relations of the parameters of the original full circuit. This type of symbolic macromodel has several benefits that other traditional modeling methods do not offer: First, reusability, namely that designer need not repeatedly generate macromodels for the same circuit even it is re-sized or re-biased; second, interpretability, namely a designer may directly identify circuit parameters (in the original circuit) that are closely related to the dominant frequency characteristics, such as dc gain, gain/phase margins, and dominant poles/zeros. The effectiveness and computational efficiency of the proposed method have been validated by several operational amplifier (opamp) circuit examples.
https://doi.org/10.1145/3015782
[Article Title: Content-Aware Bit Shuffling for Maximizing PCM Endurance/ Miseon Han,Youngsun Han,Seon Wook Kim,Hokyoon Lee and Il Park, p. 48:1-48:26]
Abstract: Recently, phase change memory (PCM) has been emerging as a strong replacement for DRAM owing to its many advantages such as nonvolatility, high capacity, low leakage power, and so on. However, PCM is still restricted for use as main memory because of its limited write endurance. There have been many methods introduced to resolve the problem by either reducing or spreading out bit flips. Although many previous studies have significantly contributed to reducing bit flips, they still have the drawback that lower bits are flipped more often than higher bits because the lower bits frequently change their bit values. Also, interblock wear-leveling schemes are commonly employed for spreading out bit flips by shifting input data, but they increase the number of bit flips per write. In this article, we propose a noble content-aware bit shuffling (CABS) technique that minimizes bit flips and evenly distributes them to maximize the lifetime of PCM at the bit level. We also introduce two additional optimizations, namely, addition of an inversion bit and use of an XOR key, to further reduce bit flips. Moreover, CABS is capable of recovering from stuck-at faults by restricting the change in values of stuck-at cells. Experimental results showed that CABS outperformed the existing state-of-the-art methods in the aspect of PCM lifetime extension with minimal overhead. CABS achieved up to 48.5% enhanced lifetime compared to the data comparison write (DCW) method only with a few metadata bits. Moreover, CABS obtained approximately 9.7% of improved write throughput than DCW because it significantly reduced bit flips and evenly distributed them. Also, CABS reduced about 5.4% of write dynamic energy compared to DCW. Finally, we have also confirmed that CABS is fully applicable to BCH codes as it was able to reduce the maximum number of bit flips in metadata cells by 32.1%.
https://doi.org/10.1145/3017445
[Article Title: SSAGA: SMs Synthesized for Asymmetric GPGPU Applications/ Shamik Saha,Prabal Basu,Chidhambaranathan Rajamanikkam,Aatreyi Bal,Koushik Chakraborty and Sanghamitra Roy, p. 49:1-49:20]
Abstract: The emergence of GPGPU applications, bolstered by flexible GPU programming platforms, has created a tremendous challenge in maintaining high energy efficiency in modern GPUs. In this article, we demonstrate that customizing a Streaming Multiprocessor (SM) of a GPU at a lower frequency is significantly more energy efficient compared to employing DVFS on an SM designed for a high-frequency operation. Using a system-level CAD technique, we propose SSAGA—Streaming Multiprocessors Synthesized for Asymmetric GPGPU Applications—an energy-efficient GPU design paradigm. SSAGA creates architecturally identical SM cores, customized for different voltage-frequency domains. Our rigorous cross-layer methodology demonstrates an average of 20% improvement in energy efficiency over a spatially multitasking GPU across a range of GPGPU applications.
https://doi.org/10.1145/3014163
[Article Title: Low-Power Clock Tree Synthesis for 3D-ICs/ Tiantao Lu and Ankur Srivastava, p. 50:1-50:24]
Abstract: We propose efficient algorithms to construct a low-power clock tree for through-silicon-via (TSV)-based 3D-ICs. We use shutdown gates to save clock trees’ dynamic power, which selectively turn off certain clock tree branches to avoid unnecessary clock activities when the modules in these tree branches are inactive. While this clock gating technique has been extensively studied in 2D circuits, its application in 3D-ICs is unclear. In 3D-ICs, a shutdown gate is connected to a control signal unit through control TSVs, which may cause placement conflicts with existing clock TSVs in the layout due to TSV’s large physical dimension. We develop a two-phase clock tree synthesis design flow for 3D-ICs: (1) 3D abstract clock tree generation based on K-means clustering and (2) clock tree embedding with simultaneous shutdown gates’ insertion based on simulated annealing (SA) and a force-directed TSV placer. Experimental results indicate that (1) the K-means clustering heuristic significantly reduces the clock power by clustering modules with similar switching behavior and close proximity, and (2) the SA algorithm effectively inserts the shutdown gates to a 3D clock tree, while considering control TSV’s placement. Compared with previous 3D clock tree synthesis techniques, our K-means clustering-based approach achieves larger reduction in clock tree power consumption while ensuring zero clock skew.
https://doi.org/10.1145/3019610
[Article Title: TEI-power: Temperature Effect Inversion--Aware Dynamic Thermal Management/ Woojoo Lee,Kyuseung Han,Yanzhi Wang,Tiansong Cui,Shahin Nazarian and Massoud Pedram, p. 51:1-51:25]
Abstract: FinFETs have emerged as a promising replacement for planar CMOS devices in sub-20nm technology nodes. However, based on the temperature effect inversion (TEI) phenomenon observed in FinFET devices, the delay characteristics of FinFET circuits in sub-, near-, and superthreshold voltage regimes may be fundamentally different from those of CMOS circuits with nominal voltage operation. For example, FinFET circuits may run faster in higher temperatures. Therefore, the existing CMOS-based and TEI-unaware dynamic power and thermal management techniques would not be applicable. In this article, we present TEI-power, a dynamic voltage and frequency scaling--based dynamic thermal management technique that considers the TEI phenomenon and also the superlinear dependencies of power consumption components on the temperature and outlines a real-time trade-off between delay and power consumption as a function of the chip temperature to provide significant energy savings, with no performance penalty—namely, up to 42% energy savings for small circuits where the logic cell delay is dominant and up to 36% energy savings for larger circuits where the interconnect delay is considerable.
https://doi.org/10.1145/3019941
[Article Title: Using CoreSight PTM to Integrate CRA Monitoring IPs in an ARM-Based SoC/ Yongje Lee,Jinyong Lee,Ingoo Heo,Dongil Hwang and Yunheung Paek, p. 52:1-52:25]
Abstract: The ARM CoreSight Program Trace Macrocell (PTM) has been widely deployed in recent ARM processors for real-time debugging and tracing of software. Using PTM, the external debugger can extract execution behaviors of applications running on an ARM processor. Recently, some researchers have been using this feature for other purposes, such as fault-tolerant computation and security monitoring. This motivated us to develop an external security monitor that can detect control hijacking attacks, of which the goal is to maliciously manipulate the control flow of victim applications at an attacker’s disposal. This article focuses on detecting a special type of attack called code reuse attacks (CRA), which use a recently introduced technique that allows attackers to perform arbitrary computation without injecting their code by reusing only existing code fragments. Our external monitor is attached to the outside of the host system via the system bus and ARM CoreSight PTM, and is fed with execution traces of a victim application running on the host. As a majority of CRAs violates the normal execution behaviors of a program, our monitor constantly watches and analyzes the execution traces of the victim application and detects a symptom of attacks when the execution behaviors violate certain rules that normal applications are known to adhere. We present two different implementations for this purpose: a hardware-based solution in which all CRA detection components are implemented in hardware, and a hardware/software mixed solution that can be employed in a more resource-constrained environment where the deployment of full hardware-level CRA detection is burdensome.
https://doi.org/10.1145/3035965
[Article Title: Fundamental Challenges Toward Making the IoT a Reachable Reality: A Model-Centric Investigation/ Yuankun Xue,Ji Li,Shahin Nazarian and Paul Bogdan, p. 53:1-53:25]
Abstract: Constantly advancing integration capability is paving the way for the construction of the extremely large scale continuum of the Internet where entities or things from vastly varied domains are uniquely addressable and interacting seamlessly to form a giant networked system of systems known as the Internet-of-Things (IoT). In contrast to this visionary networked system paradigm, prior research efforts on the IoT are still very fragmented and confined to disjoint explorations of different applications, architecture, security, services, protocol, and economical domains, thus preventing design exploration and optimization from a unified and global perspective. In this context, this survey article first proposes a mathematical modeling framework that is rich in expressivity to capture IoT characteristics from a global perspective. It also sets forward a set of fundamental challenges in sensing, decentralized computation, robustness, energy efficiency, and hardware security based on the proposed modeling framework. Possible solutions are discussed to shed light on future development of the IoT system paradigm.
https://doi.org/10.1145/3001934
[Article Title: Obfuscation-Based Protection Framework against Printed Circuit Boards Unauthorized Operation and Reverse Engineering/ Zimu Guo,Jia Di,Mark M. Tehranipoor and Domenic Forte, p. 54:1-54:31]
Abstract: Printed circuit boards (PCBs) are a basic necessity for all modern electronic systems but are becoming increasingly vulnerable to cloning, overproduction, tampering, and unauthorized operation. Most efforts to prevent such attacks have only focused on the chip level, leaving a void for PCBs and higher levels of abstraction. In this article, we propose the first ever obfuscation-based framework for the protection of PCBs. Central to our approach is a permutation block that hides the inter-chip connections between chips on the PCB and is controlled by a key. If the correct key is applied, then the correct connections between chips are made. Otherwise, the connections are incorrectly permuted, and the PCB/system fails to operate. We propose a permutation network added to the PCB based on a Benes network that can easily be implemented in a complex programmable logic device or field-programmable gate arrays. Based on this implementation, we analyze the security of our approach with respect to (i) brute-force attempts to reverse engineer the PCB, (ii) brute-force attempts at guessing the correct key, and (iii) physical and logistic attacks by a range of adversaries. Performance evaluation results on 12 reference designs show that brute force generally requires prohibitive time to break the obfuscation. We also provide detailed requirements for countermeasures that prevent reverse engineering, unauthorized operation, and so on, for different classes of attackers.
https://doi.org/10.1145/3035482
[Article Title: A Fast Hierarchical Adaptive Analog Routing Algorithm Based on Integer Linear Programming/ Mohammad Torabi and Lihong Zhang, p. 55:1-55:23]
Abstract: The shrinking design window and high parasitic sensitivity in advanced technologies have imposed special challenges on analog and radio frequency (RF) integrated circuit design. The state-of-the-art analog routing research tends to favor linear programming to achieve various analog constraints, which, although effective, fail to offer high routing efficiency on its own. In this article, we propose a new methodology to address such a deficiency based on integer linear programming (ILP) but without compromising the capability of handling any special constraints for the analog routing problems. Our proposed method supports hierarchical routing, which can divide the entire routing area into multiple small heterogeneous regions where the ILP can efficiently derive routing solutions. Distinct from the conventional methods, our algorithm utilizes adaptive resolutions for various routing regions. For a more congested region, a routing grid with higher resolution is employed, whereas a lower-resolution grid is adopted to a less-crowded routing region. For a large empty space, routing efficiency can be even boosted by creating more routing hierarchy levels. This scheme is especially beneficial to the analog and RF layouts, which are far sparser than their digital counterparts. The experimental results show that our proposed adaptive ILP-based router is much faster than the conventional ones, since it spends much less time in the areas that need no accurate routing anyway. The higher efficiency is demonstrated for large circuits and especially sparse layouts along with promising routing quality in terms of analog constraints.
https://doi.org/10.1145/3035464
[Article Title: A Single-Tier Virtual Queuing Memory Controller Architecture for Heterogeneous MPSoCs/ Yang Song,Kambiz Samadi and Bill Lin, p. 56:1-56:23]
Abstract: Heterogeneous MPSoCs typically integrate diverse cores, including application CPUs, GPUs, and HD coders. These cores commonly share an off-chip memory to save cost and energy, but their memory accesses often interfere with each other, leading to undesirable consequences like a slowdown of application performance or a failure to sustain real-time performance. The memory controller plays a central role in meeting the QoS needs of real-time cores while maximizing CPU performance. Previous QoS-aware memory controllers are based on a classic two-tier queuing architecture that buffers memory transactions at the first tier, followed by a second tier that buffers translated DRAM commands. In these designs, QoS-aware policies are used to schedule competing transactions at the first stage, but the translated DRAM commands are served in FIFO order at the second stage. Unfortunately, once the scheduled transactions have been forwarded to the command stage, newly arriving transactions that may be more critical cannot be served ahead of those translated commands that are already queued at the second stage. To address this, we propose a scalable memory controller architecture based on single-tier virtual queuing (STVQ) that maintains a single tier of request queues and employs an efficacious scheduler that considers both QoS requirements and DRAM bank states. In comparison with previous QoS-aware memory controllers, the proposed STVQ memory controller reduces CPU slowdown by up to 13.9% while satisfying all frame rate requirements. We propose further optimizations that can significantly increase row-buffer hits by up to 66.2% and reduce memory latency by up to 19.8%.
https://doi.org/10.1145/3035481
[Article Title: Accelerated Soft-Error-Rate (SER) Estimation for Combinational and Sequential Circuits/ Ji Li and Jeffrey Draper, p. 57:1-57:21]
Radiation-induced soft errors have posed an increasing reliability challenge to combinational and sequential circuits in advanced CMOS technologies. Therefore, it is imperative to devise fast, accurate and scalable soft error rate (SER) estimation methods as part of cost-effective robust circuit design. This paper presents an efficient SER estimation framework for combinational and sequential circuits, which considers single-event transients (SETs) in combinational logic and multiple cell upsets (MCUs) in sequential elements. A novel top-down memoization algorithm is proposed to accelerate the propagation of SETs, and a general schematic and layout co-simulation approach is proposed to model the MCUs for redundant sequential storage structures. The feedback in sequential logic is analyzed with an efficient time frame expansion method. Experimental results on various ISCAS85 combinational benchmark circuits demonstrate that the proposed approach achieves up to 560.2X times speedup with less than 3% difference in terms of SER results compared with the baseline algorithm. The average runtime of the proposed framework on a variety of ISCAS89 benchmark circuits is 7.20s, and the runtime is 119.23s for the largest benchmark circuit with more than 3,000 flip-flops and 17,000 gates.
There are no comments on this title.