^ Compilation Technology for Scalable Parallel Computing G. Agha,* W. Kim National Science Foundation, CCR96-19522
This project involves developing compiler technology to provide efficient execution of actor programs on heterogeneous distributed networks. In particular, it focuses on efficient execution of fine-grain concurrent programs. Current implementations are on multicomputers. Issues in resource management, such as automatic reclamation of inaccessible potentially active objects, are being investigated.
^ Data Parallel Programming L. V. Kale* University of Illinois
Highly regular, array-oriented computations constitute a significant majority of computation-intensive scientific and engineering applications. Data parallel languages, such as High Performance Fortran, simplify and support the parallelizations of such applications. This project explores novel techniques for making such languages more efficient and flexible. One of the techniques developed causes different parallel subcomputations to overlap, increasing the tolerance of the performance to communication latencies. A rich set of intrinsics is being developed. The system provides flexible intermodule connectivity, so a high degree of reuse of parallel software becomes possible. Also, data parallel components can be integrated with nondata parallel components at will.
^ Debugging and Performance Feedback for Parallel Programs L. V. Kale* University of Illinois
Debugging parallel programs and improving their performance is a daunting task. Researchers are developing debugging and performance feedback tools that can maintain and exploit refined application-specific data better than is possible with traditional techniques. Such tools can be significantly beneficial in developing and improving parallel programs. This work is carried out using the Charm/Charm++ object parallel programming system. It is being extended to other languages in a multilingual framework. By exploiting "specificity" at language constructs, and by recording specific events, it becomes possible to provide visual feedback on performance and to suggest improvement paths via expert analysis.
^ Dynamic Load-balancing Strategies L. V. Kale* University of Illinois
A small grain of parallelism is essential and natural in executing irregular symbolic computations. The efficiency of a parallel processing system depends on how uniformly these granules of action are distributed to processors. Researchers have developed a dynamic load-balancing scheme that speedily distributes newly created work to the "needy" processors. It employs a corrective redistribution component and saturation control (not moving pieces of work around while everyone has sufficient work). Also, new strategies appropriate for the current generation of multicomputers are being designed. Some of these strategies support prioritized load balancing and control memory requirements simultaneously.
^ Highly Parallel Discrete Event Simulation L. V. Kale,* T. Wilmarth University of Illinois
Many complex systems are characterized by the presence of asynchronous events. A flexible manufacturing system, a digital circuit, and vehicular traffic in cities are examples of such systems. Neither analytical solutions nor time-marching simulations are appropriate for modeling such systems; they are modeled by discrete event simulations. The efficiency of many industrial processes is dependent on their speedy modeling. Researchers are engaged in developing a parallel computing-based approach for this challenging problem. The objective is to develop easy-to-use modeling tools and enable their simulation on highly parallel supercomputers and in workstation clusters.
^ Java Extensions for Parallel Computing L. V. Kale,* M. Bhandarkar University of Illinois
Java has emerged as a dominant object-oriented language that could replace or augment C++ for building large-scale parallel applications. Several studies show that Java boosts programmer productivity because of its well thought out design, garbage collection, and standard libraries. For this project, researchers implemented a prototype parallel extension to Java that provides dynamic creation of remote objects with load balancing and object groups. The language constructs are based on those of Charm++. The prototype is implemented using the Converse interoperability framework, which makes it possible to integrate, in a single application, parallel libraries written in Java with modules in other parallel languages.
^ Parallel Execution of Speculative Computations L. V. Kale* University of Illinois
A large class of interesting computational problems has the following property: if one attempts to solve them in parallel, one often ends up solving subproblems that are not needed or that are not solved in a sequential execution. This leads to anomalous behavior. The speedups may vary from sublinear to superlinear from run to run and may increase or decrease with the addition of processors. This research is aimed at obtaining consistent and monotomically increasing speedups for such computations. Such problems arise in state-space search, branch-and-bound, game-tree search, planning, and theorem proving. Each of them requires a different set of techniques. Successful in obtaining satisfactory results for state-space search, the research team is working on other problems.
^ Parallel and Distributed Object-oriented Programming L. V. Kale,* M. Bhandarkar, R. Brunner University of Illinois
This project extends earlier research on Charm, a portable parallel programming language based on message-driven objects. Message-driven execution makes it possible to adaptively overlap computation and communication, even across multiple modules. Charm supports dynamic creation and load balancing of parallel objects and specific information-sharing abstractions. Its branched chare and chare array constructs facilitate interfacing parallel modules and implementation of distributed data structures. Charm is highly suitable for irregular parallel computations. Several applications and libraries are written using Charm. Current research includes support for heterogeneity and client-server environments.
^ Run-Time Framework for Multilingual Interoperability among Parallel Languages L. V. Kale,* M. Bhandarkar, J. Yelon, R. Brunner University of Illinois
To tackle the difficult problem of parallel software development, many parallel languages are being developed, each with its own unique features and advantages. To benefit from this multitude of languages, one should be able to compose modules written in different languages into a single application program. This is difficult because of different scheduling models assumed by each language run time. This research is aimed at developing Converse, a framework for facilitating such interoperability. Converse also simplifies development of the run-time systems for new languages. Converse includes components supporting flexible threads, message passing, and processor scheduling. Several languages are being implemented using Converse.
^ CADRE: A National Facility for High-Performance I/O Characterization and Optimization D. A. Reed,* D. Israel, G. Wang, D. Wells, Y. Zhang National Science Foundation, EDA 99-75248
To catalyze research on I/O system design, analysis, and optimization for scalable, parallel, and distributed systems, this work will create a national facility to disseminate I/O characterization and optimization tools and data for quantitative study of I/O systems. The work will extend, document, and distribute a multilevel I/O characterization toolkit with logical and physical I/O instrumentation, statistical and visual data analysis tools, and documentation; instrumented I/O libraries, including MPI-IO and HDF; instrumented, I/O-intensive applications; documented I/O traces in a portable data metaformat; and I/O trace analyses based on statistical analyses, hidden Markov models, and time series analysis that can be used for constructing configurable I/O benchmarks.
^ Intelligent, Adaptive Parallel File Systems D. A. Reed,* N. Tran, H. Simitci, J. Mainzer, L. Taveva National Science Foundation, ASC 97-20202
This research explores the thesis that adaptive file systems based on real-time performance data streams, automatic access pattern classification using neural networks and hidden Markov models, and fuzzy logic file system policy selection can dramatically improve input/output performance for parallel scientific applications. The results of this exploration are being embodied in a parallel file system prototype.
^ Language-directed Performance Prediction and Analysis D. A. Reed,* D. A. Padua,* R. Aydt, K. Mahesh, Y. Zhang Defense Advanced Research Projects Agency, N66001-97-C-8532 (A multi-institution collaborative project)
Achieving a large fraction of peak performance on parallel and metacomputing systems has proven difficult. To address this challenge, researchers are developing an integrated performance modeling, measurement, analysis, and prediction environment that will allow application and system developers to explore the performance implications of software and hardware design choices for extant systems, hypothetical systems, and combinations of the two. This work is based on compiler-supported program annotation, symbolic performance scalability models, integrated performance instrumentation, and comparative analysis of multiple system configurations.
^ Parallel I/O Characterization and Optimization D. A. Reed,* D. Wells, Y. Zhang U.S. Department of Energy, CIT PC 228906 (A multi-institution collaborative project)
As part of the Caltech Computational Facility for Simulating the Dynamic Response of Materials, this research team is working closely with application and computing researchers at three national laboratories (Los Alamos National Laboratory, Sandia National Laboratories, and Lawrence Livermore National Laboratory) to instrument and analyze the input/output behavior of large-scale applications in the Accelerated Strategic Computing Initiative (ASCI). This effort focuses on instrumentation of input/output libraries, such as the NCSA Hierarchical Data Format (HDF) and the laboratory input/output libraries. In addition, the team is developing adaptive parallel input/output libraries and working closely with Caltech researchers to optimize their parallel application codes.
^ Performance Analysis and Adaptive Software D. A. Reed,* J. Oly DOE Center for Simulation of Advanced Rockets
As part of the Center for Simulation of Advanced Rockets, this research team is developing performance and analysis tools, input/output analysis and optimization software, and an adaptive run-time system for dynamic performance optimization. The focus of the work is on creation of a software environment for optimization of multidisciplinary computational simulations on massively parallel systems.
^ Scalable Performance Analysis Tools D. A. Reed,* R. A. Aydt, D. Israel, J. Karim, J. Mainzer, B. Schaeffer, J. Wendling National Computational Science Alliance (A multi-institution collaborative project)
As part of the national Partnerships for an Advanced Computational Infrastructure (PACI), this research team is focusing on three interrelated areas of performance analysis for scalable parallel systems: augmentation of current tools for performance analysis of parallel codes; analysis of the interactions among application I/O patterns, file systems, and I/O hardware configurations; and design and testing of an adaptive resource management infrastructure that uses real-time performance data to interactively and automatically choose and configure run-time resource management policies. The results of this work are being shared throughout the national PACI alliance.
^ High-Performance Input/Output for Parallel Computing M. S. Winslett,* Y. Cho, S. Kuo, J. Lee National Aeronautics and Space Administration, NAGW 4244, NCC5 106; DOE Center for Simulation of Advanced Rockets
Often I/O is a bottleneck for applications running on parallel platforms. Researchers are developing algorithms to provide high-performance parallel I/O with an easy-to-use, portable interface. The current focus of this work is automatic selection of optimal plans for handling sequences of high-level I/O requests.