DATABASES AND INFORMATION SYSTEMS

Indexing Highly Multidimensional Data

S. Mehrotra,Principal Investigator K. Chakrabarti
U.S. Army Research Laboratory, DAAL01-96-2-0003

Increasingly, emerging database applications pose a requirement of databases to provide support for storage and retrieval of highly multidimensional data where dimensionality may be of the order of 100. Existing multidimensional index structures (e.g., grid files, R-trees) do not scale to such high dimensions. They either exhibit exponential complexity in the number of dimensions or degrade to a linear search as the dimensionality increases. We are studying mechanisms to overcome this dimensionality curse. Methods being studied range from design of new multidimensional data structures that provide guaranteed good performance to developing distance-preserving transforms from a high-dimensional to a lower dimensional space. The lower dimensional data can then be indexed using existing multidimensional data structures.


Multimedia Retrieval System

S. Mehrotra,Principal Investigator T. Huang, M. Ortega, Y. Rui
NSF/DARPA/NASA Digital Library Initiative, 94-11318

Recent advances in digital storage technology, image analysis and computer vision, and database management have created an exciting possibility of developing powerful retrieval systems that support complex multimedia data, that have traditionally been treated as a raw uninterpreted sequence of bits, as first-class objects that can be queried and retrieved based on their rich internal structure. Such visual and aural data pose new challenges to data management. We are examining techniques for (1) effective modeling and description of visual objects, (2) support for content-based and similarity based retrieval, (3) query evaluation techniques for composite queries that involve multiple approximate matches based on similarity (e.g., a Boolean combination of these matches), and (4) integration of multimedia objects with other traditional data.


Concurrent Text Retrieval Systems

S. MehrotraPrincipal Investigator
University of Illinois

With an explosive growth of the Internet World Wide Web, and the increasing demands to retrieve documents based on their contents over the net, it has become imperative to develop effective techniques for text retrieval. At the center of these retrieval systems is a full-text index that accelerates retrieval of documents based on the presence or absence of keyboards and their proximity to each other. In this project, we are examining text retrieval techniques that are aimed primarily toward search over the Internet. Specifically, we are developing effective techniques to support concurrent operations over the text index.


Remote Backup Systems

S. MehrotraPrincipal Investigator
University of Illinois

B usiness organizations are increasingly demanding systems that provide continuous service with zero down time. The key to developing such systems is replication. A viable approach is the maintenance of a remote backup system in which two copies of the database are maintained. Transaction processing takes place at the primary copy, and the log records generated propagate to the remote backup, which uses them to reconstruct a recent state of the database at the primary. We are examining efficient scalable techniques for maintaining remote backups. Existing approaches either result in high overhead and low system throughput, or risk loss of transactions during failures, thereby sacrificing persistence for throughput. Our approach overcomes the limitations of existing backup techniques.


Rule Processing in Distributed and Parallel Environments

S. MehrotraPrincipal Investigator
University of Illinois

We are examining how rule processing can be supported in distributed and parallel databases. In most existing database systems (both prototypes and commercial systems) that provide support for production rules, the rules respond to operations on centralized data, and rule processing is performed in a centralized, sequential manner. With the increasing interest in parallel and distributed database systems, the techniques for processing rules in these environments are gaining importance. We are interested in developing both the theory of rule processing and, based on the theory, developing effective and efficient rule processing techniques for distributed and parallel database management systems.


Supporting Concurrent and Recoverable Operations on Multidimensional Data Structures

S. Mehrotra,Principal Investigator K. Chakrabarti
U. S. Army Research Laboratory, DAAL01-96-2-0003

Even though research on multidimensional indexing has been ongoing for over a decade and almost no commercial database system supports any of the proposed multidimensional data structures as an access method. One of the primary impediments is the lack of algorithms to support concurrent and recoverable operations. Existing concurrency control and recovery schemes developed for data access using single dimensional data structures (e.g., B-trees) depend upon the presence of an ordering of data and hence do not easily generalize to multidimensional data structures. To address these shortcomings, we are exploring protocols to guarantee consistency of multidimensional data structures in the presence of concurrent access and failures and techniques for protecting the accesses from phantom insertions and deletions.


Transaction Processing in Emerging Database Applications

S. Mehrotra,Principal Investigator K. Hu, F. Ramsar
National Aeronautics and Space Administration, NAG 1-613

The objective of this project is to study the feasibility of designing transaction processing systems that provide adequate support for cooperative and long-duration computations found in emerging database applications like concurrent engineering, cooperative design environments, and office workflow automation. We are interested in developing transaction processing systems that provide a programmable interface using applications that can specify to the system their own desired computation model (and protocols to support the model). This research will pave the way for developing transaction processing systems that provide adequate support for long-duration and cooperative computations found in emerging database applications.


Transient Versioning in Distributed Database Systems

S. Mehrotra,Principal Investigator D. Xu
University of Illinois

We are studying how transient versioning can be efficiently supported in distributed databases. Transient versioning is used in database systems to reduce the data contention caused by long-duration read-only transactions. The system maintains an older version of data for the read-only transactions (that do not mind reading slightly old but still consistent data) to read. Transient versioning eliminates the unnecessary interference between such read-only queries and other short update transactions. We are examining how transient versioning can be efficiently supported in distributed databases and how the version control protocol used to implement versioning interacts with the various optimizations of the atomic commitment in distributed databases.


Database Analysis

M. E. Williams,Principal Investigator A. Knackstedt, D. Du Vall
University of Illinois
(Conducted in the Coordinated Science Laboratory)

Analyses of data in the database of databases are run annually. Analyses included number and percentage of databases by field of science, type of database, storage media, country, geographic area, and sector of the economy. Statistics are also generated regarding numbers of records within databases according to field, country, and sector of the economy. Various correlations between data items are generated and published each year.


An Assessment of Scientific and Technical Information in the United States

M. E. WilliamsPrincipal Investigator
National Science Foundation via University of Tennessee

The objective of this project was to examine the status, trends, opportunities, and problems of scientific and technical information dissemination in the United States. In the first phase of the project we focused on estimating and examining the size and characteristics of the demand for STI services by various user groups and the magnitude, quality, and costs of supplying STI by various sources. Issues were identified and categorized as information technology; policy, structure and institutional; legal and ethical, economic, marketing and financial; information content and access; attitudinal and behavioral; educational and training; and international. A book on this topic will be completed in 1997 by J. M. Griffiths, D. King, and M. E. Williams (permission to use material granted by NSF).


Database Support for Arrays in High-Performance Computing

M. S. Winslett,Principal Investigator Y. Chen, Y. Cho, S. Kuo
National Aeronautics and Space Administration, NAGW 4244, NCC5 106

Scientific applications often make use of large multidimensional arrays, a data type not supported in current databases. We are examining the question of support for array handling on traditional and massively parallel platforms, with an emphasis on parallel I/O.


Secure Access to Services in an Open Networked Environment

M. S. Winslett,Principal Investigator V. Jones, N. Ching, I. Slepchin
Defense Advanced Research Projects Agency, DACA94-C-0029

With the growth and commercialization of the Internet and the popularity of new information services such as the World Wide Web, we find a need for clients to be able to interact without prior knowledge and servers of one another. Often a server will require proof that a new client possesses certain properties, e.g., student status, local res idency, or an ability to pay for services to be rendered. The client may also wish guarantees that it is interacting with a bona fide server, as well as guarantees for the privacy of its interactions with the server. In this project, we are extending current-day authentication and authorization mechanisms to be applicable to such a scenario, with a focus on the needs of database applications.