^ Architectures for Media-processing Applications S. Adve,* C. J. Hughes, R. Jain, P. Kaul, C. Park, P. Ranganathan, J. Srinivasan Alfred P. Sloan Research Foundation; National Science Foundation CCR-0096126; University of Illinois
This research seeks to develop general-purpose architectures for emerging media-processing applications. These applications require orders-of-magnitude higher performance than available today with high predictability. The research team used detailed simulation to develop a quantitative understanding of these applications and now is using this understanding to develop novel architectures with higher performance and predictability. A key attribute of the designs is flexibility, which allows for responding to changing application requirements and system constraints. Specifically, the reconfigurable cache architectures can be used for activities other than conventional caches for different applications. The next stage for research is to focus on more general adaptive architectures.
^ Code Transformations to Increase Memory Parallelism with Instruction-Level Parallel Processors S. Adve,* V. Pai Alfred P. Sloan Research Foundation; University of Illinois
Current microprocessors employ techniques to aggressively exploit instruction-level parallelism (ILP). Previous work has shown that these techniques are often ineffective in providing parallelism for the memory system. This is because compilers often generate code that does not exploit the available hardware resources for parallelism. In this project, researchers developed code transformations that enable multiple loads to overlap with each other in modern out-of-order processors, thereby increasing memory parallelism and hiding memory latency. This research has shown that these techniques also significantly benefit alternate latency hiding techniques such as prefetching.
^ Simulation Techniques for Systems with ILP Processors S. Adve,* C. J. Hughes, R. Jain, P. Kaul, V. Pai, C. Park, P. Ranganathan, J. Srinivasan Alfred P. Sloan Research Foundation; IBM Corp.; National Science Foundation CCR-0096126; University of Illinois
Previous simulators for shared-memory multiprocessors have imposed a large tradeoff between simulation accuracy and speed. Most such simulators model simple processors that do not exploit common instruction-level parallelism (ILP) features, consequently exhibiting large errors when used to model current systems. A few newer simulators model current ILP processors in detail, but are much slower. This research team is exploring the use of execution sampling to alleviate this accuracy versus speed tradeoff. Work includes conducting a validation of the ILP-based simulator against real machines to quantify simulation error.
^ Tolerating Memory Latencies in Multiprocessors with Merged DRAM-Logic S. Adve,* C. J. Hughes Alfred P. Sloan Research Foundation; University of Illinois
Memory latency and bandwidth have long been an impediment to high performance. Recent advances in integrating processor logic and DRAM onto the same chip, called a PIM (processor-in-memory), address this problem for applications that have memory requirements met by a single chip. However, important applications with more demanding memory requirements exist and will require the use of additional techniques to hide memory latency for remote data transfers. This research examines techniques that take advantage of PIMs to hide such latency.
^ Building Dynamic Interoperable Security Architecture for Active Networks R. Campbell,* M. D. Mickunas, Z. Liu, P. Naldburg, S. Yi Defense Advanced Research Projects Agency, F30602-98-1-0192
Current active network research efforts propose novel network architectures to enable fast protocol and service deployment. However, the dynamic and proactive nature of these active networks increases the malicious usage of the networks. There is little research into the nature of the security provisions. Researchers working on this project believe that the security architecture should be dynamic, reconfigurable, extensible, and interoperable. The active security architecture supports dynamic security policies interoperability among different security domains, active capabilities providing application-specific security functions, and defense against distributed denial of science attacks.
^ Dynamic Security System for Distributed Objects R. Campbell,* M. D. Mickunas, A. Kapadia, J. Al-Muhtadi, S. Yi U.S. Department of Defense, MDA 904-98-C-A895
The Cherubim security architecture supports dynamic security systems for distributed object systems and supports the requirements of many emerging applications. In particular, it permits run-time adaptation in the face of unexpected security attacks. This is vital to mission-critical environments like those of military systems. The team will build upon the architecture to research and develop solutions to the problems of interoperability between security systems, cooperative security policies between independent administrative security domains, and integrity assurance for distributed objects running on commercial and research operating systems.
^ Architectural Support for Speculative Thread-Level Parallelization J. Torrellas,* M. Cintra, J. Martinez, M. Prvulovic, M. Garzaran National Science Foundation, EIA 99-75018, CCR 99-70488, EIA 00-81307, EIA 00-72102; IBM Corp.; Intel Corp.
Speculative thread-level parallelization is a technique to execute in parallel code that the compiler cannot fully analyze. In this technique, a program is dynamically divided into tasks and assigned to different threads. The threads execute in parallel, optimistically assuming that sequential semantics will not be violated. As the threads run, the data that they access are tracked. If a dependence violation is detected, the offending threads are stopped. Then a repair action re-executes the offending tasks, possibly after recovering some old, safe state. In this project, researchers examine the support for this technique in both chip multiprocessors and scalable multiprocessors.
^ Architectural and Application Support to Crack the Protein Folding Problem J. Torrellas,* K. Ekanadham, G. Martyna, J. Nakano, D. Newns, M. Tuckerman National Science Foundation, EIA 00-81307; IBM Corp.
Solving the protein folding problem would have an enormous pay-off: providing a true understanding of diseases and the discovery of more effective drugs. Luckily, advances in processor-memory integration allow us to build computers that, thanks to being more integrated, faster and cheaper, may solve the problem. In this project, researchers in computer hardware, software, and computational biology team up with IBM researchers to design improved algorithms for protein folding and to design a Processor-In-Memory architecture that can speed up the solution of the problem. The work involves using the IBM Blue Gene prototype and will help design its next generation system.
^ Automatically and Manually Programming a Server J. Torrellas,* D. Padua, B. Fraguela, J. Lee, Y. Solihin National Science Foundation, EIA 99-75018, CCR 99-70488, EIA 00-81307, EIA 00-72102; IBM Corp.; Intel Corp.
This project addresses the problem of how to program a server with an intelligent memory system, such as FlexRAM. Such a machine includes two classes of processors: one or several off-the-shelf powerful processors and a myriad of simple processors in the memory system. Researchers examine two approaches to program this machine. The first approach is to implement compiler algorithms so that the compiler can automatically map the code onto the architecture. The second approach is to design language constructs and a programming methodology that make it possible and easy for a programmer to directly write code for the machine.
^ Dynamically Managing Energy in Chips for Energy and Efficiency J. Torrellas,* M. Huang, J. Renau, S. Yoo National Science Foundation, EIA 99-75018, CCR 99-70488, EIA 00-81307, EIA 00-72102; IBM Corp.; Intel Corp.
While technology is delivering increasingly sophisticated and powerful chip designs, it is also imposing alarmingly high energy requirements on the chips. Many techniques have been proposed to manage the energy consumed, including voltage scaling and various forms of storage reconfiguration. In this project, the different techniques are compared. A framework that applies techniques dynamically, in a fine-grained manner and according to a given policy is built. Goals are to maximize energy savings without extending application execution time beyond a given tolerable limit and to guarantee that the temperature remains below a given limit while minimizing any resulting slowdown.
^ FlexRAM: An Intelligent Memory Architecture System J. Torrellas,* D. Padua, D. Reed, M. Huang, J. Lee, J. Renau, Y. Solihin, S. Yoo National Science Foundation, EIA 99-75018, CCR 99-70488, EIA 00-81307, EIA 00-72102; IBM Corp.; Intel Corp.
Major advances in Merged Logic DRAM (MLD) technology coupled with the popularization of memory-intensive applications provide fertile ground for architectures based on Processors-in-Memory (PIM). In the FlexRAM project, researchers use PIM chips as the memory of an intelligent server. If such a server runs an application without recompiling it, the intelligent memory appears as plain memory. However, if the application is recompiled, the memory becomes a high-performance accelerator. In this project, the focus is on issues related to architecture design, energy dissipation, programming, compilation, run-time system, and applications for the FlexRAM intelligent memory system.
^ M3T: Morphable Multithreaded Memory Tiles J. Torrellas,* B. Abbott, T. Bapty, H. Franke, J. Moreira, C. Myers, J. Renau Defense Advanced Research Projects Agency, F30602-01-C-0078
The M3T system is a novel malleable computing system composed of polymorphous hardware and polymorphous software that can adapt to changing mission demands. The system is built by tiling M3T processor chips. Each processor chip is composed of many general-purpose RISC cores interleaved with memory blocks. Cores and memories are reconfigured at run-time, allowing the chip to morph into a superscalar, VLIW, systolic array, MIMD, SIMD, or fault-tolerant engine, or even a combination of these. The system has a compiler that reconfigures the platform (hardware and run-time system) for which it is generating code.
^ Using Architectural Support for Speculation to Provide Fault Tolerance J. Torrellas,* M. Garzaran, M. Prvulovic, Z. Zhang National Science Foundation, EIA 99-75018, CCR 99-70488, EIA 00-81307, EIA 00-72102; IBM Corp.; Intel Corp.
Microprocessors occasionally suffer transient faults. This research is aimed at finding ways to detect transient faults and recover from them. While past work has addressed this problem in different ways, speculative parallelization opens up a potential new approach to detect and recover from transient faults. Under speculative parallelization, speculative threads must keep their memory state buffered until they are proved to be correct. In addition, they must be able to roll back to a safe state after a violation is detected. These same architectural supports can be used to provide fault detection and fault recovery. This project extends existing architectural support for speculation to provide inexpensive support for fault tolerance.