An ODMG-compatible Testbed Architecture for Scalable Management and Analysis of Physics Data

Paper: 342
Session: C (talk)
Speaker: Malon, David, Argonne National Laboratory, Argonne
Keywords: ODBMS's, data management, hierarchical storage management, large systems, mass storage


An ODMG-compatible Testbed Architecture for
Scalable Management and Analysis of Physics Data

David M. Malon and Edward N. May
Argonne National Laboratory
9700 South Cass Avenue, Building 900
Argonne, IL 60439 USA


Abstract


This paper describes a testbed architecture for the investigation and
development of scalable approaches to the management and analysis of
massive
amounts of high energy physics data, and summarizes the lessons we have
learned from its implementation. The architecture has two components:
an
interface layer that is compliant with a substantial subset of the
ODMG-93
Version 1.2 specification, and a lightweight object persistence manager
that
provides flexible storage and retrieval services on a variety of single-
and
multi-level storage architectures, and on a range of parallel and
distributed
computing platforms.
Understanding scalability requires investigating approaches to data
organization and clustering, caching and migration, replication, multiple
data access paths, nonuniform data access and multilevel storage,
parallelism,
and more. The roles of parallel file systems, mass storage
architectures,
and concurrent use of a heterogeneous mix of storage devices must also be
understood. To undertake these studies, we have developed a lightweight
object persistence manager that meets the following design criteria:
- access to every persistent object from every query node;
- support for efficient reorganization of data, including striping and
reclustering, without knowledge of object schemata;
- support for data replication;
- support for multiple access paths to data;
- extensible support for a variety of storage mechanisms, including
local
and remote disk, raw RAID, Unitree file systems, raw device access to
DD2
and 8mm tape, parallel file systems such as (formerly) IBM's Vesta and
(currently) IBM's PIOFS, and Internet data access via standard FTP and
HTTP mechanisms or cgi-bin scripts;
- portability to heterogeneous distributed architectures.
We have tested this software on a range of platforms, including Argonne's
IBM SP PowerParallel system, and on a heterogeneous collection of UNIX
workstations. Experiments have been conducted using both Fermilab D0
data
and the output of ISAJET Monte Carlo simulations.
While the investigations outlined above could not have been undertaken
with
commercial database software, we have tried nonetheless to provide an
interface that does not needlessly inhibit coexistence with (and,
perhaps,
eventual migration to) commercial object-oriented databases. To this
end,
we have defined an interface layer that is compliant with a substantial
subset
of the ODMG-93 Version 1.2 specification. In the process, we have
learned a
great deal about potential deficiencies and scalability implications of
the
ODMG specification in general, and of its C++ binding in particular. We
have
also sought to define a minimal interface that any lightweight object
persistence manager should support in order that an ODMG-compliant system
be
buildable above it. In our discussion of lessons learned from the
testbed
implementation, we describe our analysis of the ODMG-93 specification as
well.