^ Compiler Optimizations for Multilevel Memory Hierarchies V. Adve,* Q. Yi (Rice Univ.), K. Kennedy (Rice Univ.) U.S. Department of Energy ASCI Academic Strategic Alliances Program, B347884
Managing performance on deep memory hierarchies is widely considered to be a critical open problem for high-performance systems. This research team is exploring a novel class of compiler transformations that provide improved locality at multiple levels of memory hierarchy simultaneously. The transformations exploit the property that recursive algorithms have identical reuse patterns at each level of recursion, providing a hierarchy of working sets. Researchers are developing compiler algorithms to transform existing loop-based codes into efficient recursive form automatically. Such a transformation has wide applicability, including automatic blocking for multiple levels of cache hierarchy and improving communication locality in shared memory codes.
^ Compiler Support for Performance Modeling of Parallel and Distributed Programs V. Adve* Defense Advanced Research Projects Agency, N66001-97-C-8533
Researchers are developing compiler techniques that enable fast, accurate, and automatic performance modeling of highly scalable applications. One focus of this effort is a compiler-generated program representation that allows one to automate a wide range of analytical, simulation, and hybrid models of parallel programs. A second focus is to use additional compiler analysis, together with this representation, to enable efficient simulation of highly scalable applications. One such compiler technique achieved 10-2000x reduction in memory usage and 2-10x reduction in simulation time for the simulation of large message-passing programs. This work is part of a broader collaboration with five other universities.
^ Error Recovery in Compilers M. D. Mickunas* University of Illinois
In this project, researchers devised a syntactic error recovery scheme for programming languages. The scheme is based on LR parsing and is driven by information that is directly and automatically obtainable from the LR tables. The method shows good promise for providing excellent error diagnostics. Continuing work will focus on improving the efficiency of the method and relating the mechanism to other parsing schemes.
^ Multiprocessor Compilation M. D. Mickunas* University of Illinois
This project investigates techniques for compiling in parallel. One phase was to devise an extension to standard shift-reduce parsing, called "piecewise" LR parsing, which permits multiple parsers to be situated at arbitrary places in an input string. This research with piecewise LR parsers demonstrated that it is not only possible to parse in a truly parallel fashion, but that significant speedup can often be achieved using such piecewise LR parsers. The continuing goal is to characterize the class of languages acceptable by such parsers and to devise unconventional programming language constructs which exploit the power of the technique.
^ Tools for Compiler Construction M. D. Mickunas* University of Illinois
This project is focused on developing a generalized shift-reduce parser generating system which allows interactive generation and modification of noncanonical LR-style parsers. This is coupled with a tree evaluator which provides interpretive execution of attribute-like semantics for the language. This will be used as a tool to produce parsers for unconventional languages, to experiment with error recovery techniques, and to investigate aspects of parallel parsing.
^ A MATLAB Just-in-Time Compiler (MaJIC) D. Padua,* G. Almasi, L. DeRose National Science Foundation, ACI98-70687
In cooperation with researchers at Cornell University, a University of Illinois team is developing MaJIC, a just-in-time compiler for MATLAB. The main goal of the project is to maintain a MATLAB-like working environment while achieving significant speedups. The analysis techniques under investigation are type analysis, dependence analysis (for generating SMP code), code signature generation, and array bounds check analysis (to eliminate superfluous array bounds checks). The compilation and execution phases in MaJIC overlap and can be executed in parallel on multiprocessor machines. Researchers also are considering stepwise refinement of compiled code to improve performance.
^ Analysis of Irregular Memory Accesses D. Padua,* Y. Lin Defense Advanced Research Projects Agency, DABT63-95-C-0097
This project aims to develop compiler techniques to extract precise information about irregular memory accesses. Traditional loop optimization methods require array subscripts to be closed-form expressions of loop indices. However, in sparse/irregular programs, array subscripts usually do not have this form, leaving many codes unoptimized as a result. Researchers have developed a demand-driven interprocedural array property analysis method for indirectly accessed arrays and a single-indexed array analysis method for irregular single-indexed arrays. The team has successfully improved the precision of data dependence and privatization tests, which has resulted in significant performance improvements for the collection of sparse codes.
^ Compile-Time Performance Prediction D. Padua,* D. Reed,* C. Cascaval Defense Advanced Research Projects Agency, N66001-97-C-8532
As part of the Delphi project, University of Illinois researchers have developed compiler techniques to predict the performance of scientific codes on different architectures. The compiler is both the primary tool for estimating the performance and the beneficiary of the results obtained from predicting the program behavior at compile time. The research team developed a simple compile-time performance prediction model that, when augmented with profiling data from very light instrumentation, can be accurate within an average of 20% of the measured performance for codes using both dense and sparse computational methods on uniprocessor machines. A next step is to extend this model to multiprocessors.
^ Compiler and Run-Time Environment for Numerical Java D. Padua,* L. Kale,* J. Decker, Z. Sura, D. Wong, P. Wu National Science Foundation, NSF DMS 98-78945
In this project, compiler and run-time techniques will be developed to achieve a performance comparable to that of Fortran programs for numerical Java applications. The system consists of a Java restructurer, a JVM that supports vector operations efficiently, an optimized native numerical library, and a source-to-source Fortran 95 to Java translator. The restructurer automatically identifies macro-operations (vector operations and BLAS routines) in the source program and replaces them with calls to library routines. The source-to-source translator provides a tool to convert legacy Fortran codes to Java and to produce a convincing testing benchmark suite for the system.
^ Compilers for Processors in Memory D. Padua,* J. Torrellas,* J. Lee IBM Partnership Award
Processors in memory (PIM) technology integrates processor logic and DRAM in the same chip. The goal of this research is to explore compiler techniques for FlexRam, a computer system that consists of a commodity host microprocessor and several PIM chips. The research team is exploring compiler techniques to identify macro-operations (such as vector operations) from the source code. The strategy is to partition the source code into several functions based on the profitability of running the functions on the PIM chips or the host processor. This research encompasses performance prediction, parallelism extraction, cache locality enhancement, synchronization overhead reduction, and load balancing.
^ Compiling for PC Clusters D. Padua,* J. Hoeflinger, Y. M. Kim, J. Zhu National Science Foundation, ACI 96-19019 COOP
In this project, researchers are retargeting the Polaris parallelizing compiler to a network of Sun workstations and a Windows NT-based cluster using the TreadMarks virtual shared memory system to provide a shared view of memory. Researchers ported TreadMarks to the NT cluster and made it compatible with FORTRAN77 codes. The basic functionality of the compiler for TreadMarks has been built. Some programs from the Spec95 and Perfect Benchmarks have been tested and valid results obtained. Next steps are to measure the performance of a group of benchmark codes and design optimizations to improve their performance.
^ Signal Processing Algorithm Compiling Engine (SPACE) D. Padua,* A. Huang, J. Xiong Defense Advanced Research Projects Agency, DABT63-98-1-0004
Fast signal processing algorithms have been obtained by properly factorizing the transformation matrix into several matrices. The Tensor Product Language (TPL) was designed to describe linear transformations. Using predefined parameterized matrices and their operations, TPL can represent several well-known signal processing algorithms. The goal of this project (which is part of a larger project involving researchers from Carnegie-Mellon, Drexel, and the University of Southern California) is to build a compiler to generate efficient programs from TPL formulas. Issues to be studied include techniques to predict the effect of loop unrolling/rerolling on performance and optimization techniques to very long basic blocks.