Jack Orenstein
 

home | education | industry | research | teaching | contact


 

Citations for the publications referred to below can be found here.

Erdös number: 3.

Scalable Data Structures (2003-present)

At Archivas, I was responsible for the design and implementation of the Metadata Manager, the component which maintained metadata on archived files. This metadata includes POSIX metadata, and locations of file copies within the archive. We initially considered one large database that would scale up as needed. But this database would have been a single point of failure. It would not scale as well as the rest of the system, and would become a bottlneck. As nodes are added to the cluster, throughput would increase, but only up to a limit imposed by the database.

For these reasons, we decided to distribute the database, using techniques from the field of scalable data structures. I conducted original research in this area. This has not been published, but patent applications have been filed. Some of the problems we had to deal with included:

  • How to avoid rehashing everything: When nodes fail and leave the cluster, or when nodes are added to the cluster, data needs to be load-balanced across the modified set of nodes. But rehashing everything is prohibitively expensive. The traditional approach to this problem is to use consistent hashing. We pursued an approach based on linear hashing instead.
  • Maintaining synchronous replicas: Each linear hash bucket has multiple copies. These copies have to be kept tightly synchronized so that, following a node failure, surviving copies can be used to continue servicing request.
  • Mapping hierarchical filesystem to hash-based data structure: POSIX semantics require directory metadata to be updated each time a file is created or deleted. There is no easy way to map a hierarchy to a set of hash buckets -- does a directory go to the same bucket as the child or the parent? I designed a mapping that maintains POSIX semantics correctly with correct transaction semantics and good performance. The implementation supports up to 200,000 files per directory.

Object/Relational Mapping (1995-2000)

At Object Design, I became interested in the problems of linking the object-oriented and relational data models, based on some problems faced by one of our customers, USWest. USWest liked the programming model supported by Object Design's product, ObjectStore, but wanted to be able to use that model in conjunction with traditional relational database systems. I wrote a series of internal papers describing technical approaches to these problems. This design work led to the establishment of a group whose goal was to create products mapping in both directions between object-oriented and relational databases. A customer application built on top of one of these tools was described in a 1995 conference publication.

In late 1995 I decided to build an object/relational mapping system for the Java language, which had just been released by Sun. Object Design was not interested in pursuing this work so I left to build the product on my own.

A few other object/relational mapping systems existed, mostly for C++. However, they all had serious performance problems. An object-oriented application spends much of its time navigating from one object to another. Turning each navigation into a retrieval from a relational database is an easy translation, but this is an inefficient way to access a relational database. I defined and solved a number of research problems in producing a system whose performance was comparable to what could be achieved by hand-coded relational database applications. A paper describing some of this research was published in 1999. Other original work was not published.

I sold my company to Novera Software, Inc. in 1996. My product became a central piece of the Novera product line.

Object-Oriented Database Systems (1989-1996)

Traditional database systems are based on the relational model of data. Virtually all relational database systems use a server-centric architecture. Many applications cannot use relational database systems because the data model is not adequate for modeling the concepts on which the application is based, and because the performance characteristics of the application call for a more client-centric architecture.

Object-oriented database systems offer an alternative. The data model of an object-oriented database system is much closer to that of a programming language, (an object-oriented one), and most of the persistence mechanism is implemented on the client side. These approaches combine to produce a database system that is well-suited to applications involving detailed analysis of complex data, e.g. geographic information systems, ECAD, MCAD systems.

At CCA, on the DARPA-funded PROBE research project, I started working on the problem of how to extend the capabilities of database systems with object-oriented features. My work on spatial data processing represented part of this work. Papers describing the PROBE project were published in several journals and conference proceedings.

PROBE was focused on adding object-oriented capabilities to traditional database systems. Another project I worked on, Persistent Ada, dealt with a new approach to persistence, adding persistence to a programming language. Persistent Ada was a brief, DARPA-funded research project, and produced no software.

I worked on ideas related to Persistent Ada at Object Design. The company's goal was to create a commercial product based on C++, so in addition to producing a database system based on an object-oriented model, we also had to develop a client-centric architecture. My contributions were in the area of data modeling, query languages and query processing. I surveyed the relevant research literature in these areas, selected a set of the more practical ideas, and implemented them. In a number of instances, the implementation work required solutions to new research problems. Conference and journal publications on the novel aspects of the system appeared between 1991 and 1993.

Spatial Data Processing (1982-1991)

Traditional database systems are designed to handle alphanumeric information, such as names and numbers. Many important applications need both the data management capabilities of traditional database systems, as well as the ability to handle spatial data, e.g. maps, solid models, and terrain models. Three approaches have been used to build these applications: 1) Ad hoc implementations; 2) Systems geared to applications within a domain, e.g. Geographic Information Systems; 3) Hybrid systems built by combining a traditional database system with extensions for geographic data (as an example).

My research in this area has pursued a fourth possibility: extending the capabilities of database systems to handle arbitrary spatial data types. This work has been carried out in the context of traditional relational database systems, as well as extended-relational and object-oriented database systems. The results have been described in conference and journal publications appearing between 1983 and 1991. The research has also led to the creation of a software product, a spatial index library named Geophile. The product has been licensed to two object database vendors (Object Design, Objectivity), and is in use in deployed applications by their customers.

There are at least two other implementations of these research ideas:
  • I consulted with IBMs Santa Teresa Labs on my research, and they have used it in conjunction with a medical imaging application.
  • Microsoft has adopted my algorithms for use in their TerraServer application, which is a showcase for their SQL Server database system.
Professional Activities
  • Member of the program committees for various database conferences, including SIGMOD and VLDB, (several times each).
  • Reviewer for several journals.
  • Reviewer for several M.Sc. dissertations.