A High Performance Data Server Optimized for HEP Data
Session: C (talk)
Speaker: Ogg, Michael, University of Texas, Austin
Keywords: data bases, data management, file systems
Title: A High Performance Data Server Optimized for HEP Data
Authors: Paul Avery, Karp Jeong, Ted Johnson, and Hankil Yoon
Affiliation: University of Florida at Gainesville
HEP event analysis provides a challange for data management because each
analysis job can potentially access the entire data set. A system that
performs analyses within a reasonable time must pay careful attention to
ensuring high-performance data access. Fortunately, this application also
provides great potential for high performance data access because:
* the user explicitly specifies all the input data which is datasets,
index files (i.e., sets of pointers to events), or skims (i.e., sets of
* events are generally read either sequentially (datasets or skims) or
sparse-sequentially (index files).
* the access order of events is not important.
* data is read-only after creation.
We have been developing a high performance data server whose design is
optimized for the HEP application. Major features aimed at high performance
data access are:
* Aggressively prefetch events based on the given information about input
* Improve disk access performance by asynchronously reading data from
* Reduce disk IO by vertically partitioning events and reading only
components of an event necessary for the analysis job. A
``merge-style'' method is used for efficient event reconstruction on
* Provide an efficient and convenient interface for parallel processing.
Besides, our data server provides mechanisms for facilitating management:
* Allow re-partitioning events vertically and horizontally.
* Move data (e.g., a collection of events) to another server as a single
We have several performance measurements of raw I/O, "interesting" event
throughput, and sequential and random access. We have used this prototype
for several "standard" physics analyses.