Design and Implementation of the CLEO III Data Analysis Model

Paper: 380
Session: A (talk)
Speaker: Patton, Simon, University of Minnesota, Ithaca
Keywords: analysis, C++, data presentation, object-oriented methods, software tools


Design and Implementation of the CLEO III Data Analysis
Model

Paul Avery, Chris Jones, Martin Lohner, Simon Patton

Department of Physics, University of Florida, Gainesville, FL 32611 USA
Wilson Laboratory, Cornell University, Ithaca, NY 14853 USA

CLEO Collaboration

The CLEO III experiment will collect approximately 100 TB of data
during the first few years of its lifetime. The challenges facing CLEO
III are how to analyze such a large dataset efficiently and flexibly
while providing a smooth transition from the current CLEO II
environment to the new environment.

To cope with these challenges have developed a new model for data
analysis, consisting of:


Heterogeneous and high performance data servers.

Analysis frameworks, e.g. C++, Fortran or script based
languages.

Interfaces between data servers and the analysis frameworks.


The requirements that the system be extensible, permit users to
incorporate their own data and allow for inevitable changes in data
formats led to the incorporation of the following ideas. First, the
model abstracts the idea of a data server to allow format changes,
replacement of one data server by another, or even multiple
servers, without the user being aware of the changes.

Second, data are presented as streams of records which are ordered in
time. At any place within this time sequence, a user can request all
the records that describes the state of CLEO at that instance. An
analysis of the data consists of user actions which are performed when
a new record appears in a stream. A particularly attractive
consequence of this feature is that correlations of data with time can
now be studied with the same programming ease as that used for event
analyses.

A prototype analysis system, built using object-oriented principles,
is being tested on CLEO II data using real analysis code. Benchmark
results are presented and analyzed.