D0 Run II Data Management and Access

Paper: 352
Session: C (talk)
Speaker: Lueking, Lee, Fermilab, Batavia
Keywords: configuration management, data bases, data management, hierarchical storage management, mass storage


D0 Run II Data Management and Access
------------------------------------


by
Lee Lueking

representing
The D0 Collaboration at Fermilab





Abstract


During the Run II data taking period at Fermilab scheduled to begin in 1999, D0
plans to accumulate raw and reconstructed data at a rate of nearly one-half
Terabyte per day. During the run we will accumulate more than 10**9 event
triggers. This quantity of information will challenge many of the current
models for data management and access. Several new approaches and technologies
will be required in order to provide efficient and timely access. Many areas
of off-line processing are considered including: 1. The data model, 2. Event
and object processing strategies, 3. The computing model, 4. The data storage
model and 5. anticipated usage and access patterns.

We are in the process of developing object-oriented code for our reconstruction
and other processing tasks, and the design from this activity is providing a
data model which should strongly influence our data processing strategies. The
computing and data storage designs will be closely coupled and largely
centralized, avoiding network bottlenecks whenever possible. Analysis computing
may require as much as 30,000 MIPS to enable us to analyze the data in a time
frame similar to Run I. This central computing will be connected to mass
storage systems via high speed links. It will consist of several hundreds ofTB of robotic and operator assisted tape archives which will provide access to
data with latencies matched to the needs for various categories of information.

The access patterns of Run I will be used as a first guess as to the needs and
flexibility required for Run II systems. Configuration management for these
investigating the possibilities of maintaining meta-data information for a
file-based system and/or a completely object-based data repository. These
options are being studied in the context of their ease of use and the speed of
access for various kinds of data analysis activities. The experience gained
from Run I, technology trends and new strategies will be included to provide a
system which is hoped to be fast, reliable, easy to use and flexible.