Simulation meeting, CERN, November 15/16 ======================================== The goal of this meeting was to try to start a coherent effort within the linear collider community to move the simulation software to a common standard, with as many common system parts as possible. The meeting was spit into two parts: Thursday: a review of what exists in the linear collider community, and a summary of the experiences gained in the LHC community Friday: an intense discussion session about how to start, where to start, and what to do. This text is not an attempt to summarize every talk or presentation, but rather to document the main points of discussion and the decisions taken. Status of software in the LC community -------------------------------------- The ECFA/DESY study so far has largely relied on a GEANT3 based simulation, with reconstruction code written in Fortran. There has been a significant effort in France to build a GEANT4 based simulation environment which at the moment is primarily used for the calorimeter. BRAHMS/ SIMDET: Suite of full and fast simulation programs, both with simulation and reconstruction in one package. Fortran is the main language. Both packages read from standard HEPEVT generator files. In theory output can be written in files of an agreed structure (ascii files) though the majority of users so far preferred to do the analysis on the fly, and store PAW n-tuples or histogram files. MOKKA: GEANT4 based development of a full simulation program used for the simulation of the calorimeter. It has the following features: kinematical input through standardHEPEVT geometries are defined in external databases (MySQL) which are then read by a dedicated wrapper to translate the information into GEANT4 geometrical objects. The translator is specific to a particular detector part, and can thus do dedicated construction of complicated geometries from a few numbers in the database. The database is used to maintain a version of the geometry together with the version of the driver needed to build the geometry. After the simulation events are stored in plain ascii files. The UNIX directory structure is utilized to structure the event output, in that directories are created for each run, subdirectories for each event in a run, and for each subdetector in an event. Standard UNIX tools like tar and zip are used to compactify these into files which can be stored on ass media. US LCsim The America groups have worked on an object oriented simulation framework for quite some time now. They have developed a system based primarily on JAS, a Java based analysis tool, though a root based version exists as well. They are at the moment switching to GEANT4 as their main simulation engine. Geometries for the simulation are defined through a XML file, which defines the objects, the materials, the size and the position. Thus only a very general interface is needed to translate this into a GEANT4 geometry description. The output of the simulation is stored in simple SIO (binary ascii) files. This output can be read by the reconstruction, which runs within a JAS framework, and is largely written in Java. In addition to the full simulation route the system allows for a fast parametrised simulation route instead. At the moment three types of fast simulation packages are available in the LC community: SIMDET is a fast parametrised Monte Carlo, which tries to reproduce distributions for the TESLA TDR detector as obtained with the full simulation. SGV is another fast simulation package, which works less on parameterised distributions, but rather tries to reproduce the main features of say the tracker or the calorimeter based on a simplified detector model and simplified tracking approach. Finally the American fast simulation is similar in concept to SIMDET though it is tried to provide tools which automatize the creation of the parametrisations. Reports from the GEANT4 user community -------------------------------------- The afternoon of the first day was dedicated to reports from both the authors of GEANT4 and the users of GEANT4. For details please look at the transparencies, available on the ECFA/DESY WEB page. GEANT4 as a tool is reaching a stable form, which makes it usable for production runs. A number of groups reported on large runs of events produced in GEANT4 frameworks, generally with little problems. A new major release of GEANT4 is expected to happen at the end of the year. Reports were then given by Babar, CMS, Atlas, HARP and Alice on their use of GEANT4 as the main simulation engine. Babar has produced many millions of events in GEANT4 is now using this as their main MC production tool. In the discussions a number of topics which all experiments work on crystalised: - geometry definition: All experiments are faced with the problem of defining the geometry in a simple and transparent way. One system, which has gained considerable support, is based on an external geometry definition in some sort of combination of scripting language and database (e.g. XML is used), which is then read by an appropriate wrapper and translated into GEANT4 geometry objects. Alice uses a C++ like interface which is interpreted at run time, to define the geometry. In general there was broad agreement that one very essential tool is a check system, which validates the geometries. Overlapping volumes are a particular problem. Different approaches exist to find these semi-automatically, but none seems to be widely available at this moment. A much debated subject is which model for persistency is used. The official CERN policy until recently was objectivity. This is still the primary choice for CMS and Atlas, as well as for Babar. Alice is based on a root model. A lot of effort is going into the validation of GEANT4 physics processes. While some problems are still there in some regions of the phase space, in particular in the simulation of the hadronic showers, there is in general rather good agreement between GEANT3 and GEANT4. Furthermore extensive checks have been made with test beam data of different detector components, and overall rather good agreement between the GEANT4 prediction and the data is found. General Purpose Software Libraries ---------------------------------- For many years now discussions have been ongoing about a replacement of the old CERN tools CERNLIB, PAW, etc. Reports from three systems, with different functionality and support, were given. ROOT: Root was developed by largely the same people who developed PAW and many of the traditional CERN applications. Root by now has matured and offers a stable and large system. In addition to the conventional presentation functionalities like histogram presenter, n-tuple engine, it also offers a local and distributed event store. This is used by a number of experiments (Alice, H1, CDF, D0, ...) to store and manage their events. For a fully fledged system the only missing ingredient is a relational database, to help in managing the events. Developments are under way to define this. Since Version 3 root offers "self describing data sets" which make the exportation and use of root files on other systems simple, and which relieve the user of the task of managing the versions of root with which the data were written. Anaphe is a software tool kit developed by CERN as the eventual replacement of the CERNLIB and other tools. It is based on object oriented technology, and a careful definition of abstract interfaces for each module. This makes the implementation of this package into other systems quite simple. At the moment Anaphe uses objectivity as their persistency model, and NAG C as the main mathematical library. Both are commercial libraries, with license fees etc. To provide a free version they also offer a persistency model based on the old CERNLIBRARY systems HBOOK/ PAW, and the math (in particular fitting) libraries from there. JAS is an American development based on Java and a Windows like GUI. In its most recent implementation JAS is isolated from the outside world through the definition of a Data Interface Module, which makes the adaptation to different input formats very simple. JAS supports a client server model for the CPU intensive computing. Discussion ---------- The discussion was structured in three parts: - interfaces - software tools - reconstruction and analysis tools Interfaces: ----------- There was a broad agreement that one of the most difficult and important tasks is the definition of interfaces between the simulation engine (e.g. GEANT4) and the outside world. Geometries: =========== Essentially two slightly different approaches are being followed. Both agree in that the definition of the geometries has to happen outside of GEANT4. In this way the same database / data can be used to define the geometries in the simulation and in the reconstruction. It also makes a checking and visualization of the geometries simpler, since less dependent on the simulation engine. No agreement was reached how this external geometry definition should look like. A few statements however were not disputed: - the geometries need to be managed by some sort of database, so that a clear versioning is possible. - the relationships between geometrical volume have to be present in the definition file already - it is desirable to maintain all properties of a geometrical object in one place: the geometry definition actually might also contain information relevant for the reconstruction (alignment parameters are one example), and clearly includes material properties as needed in both simulation and reconstruction. A general consensus was reached that for the moment the GEANT4 geometry engine -- with all its limitations -- will be used. It was decided that volunteers will be identified who will look in more detail into the system adopted by the LHC experiments, and see in how far these can be adopted to our needs. The system introduced by the MOKKA developers will compared to this, and a decision which to use will be made in the next meeting. Persistency: ============ All LC simulations at the moment use plain ascii files to store the events. A clear majority favoured to replace this by a root based IO system, assuming a way can be found to cleanly separate the IO from the rest of the simulation. This will be followed up further, and a report is expected at the next meeting. The advantages offered by this system were generally deemed worthwhile the overhead and price to pay when going to root. Questions like the compatibility to the American effort, etc need to be sorted out. A very important aspect is the definition of the data objects to the stored. Groups of three people each will be asked to make proposals for the definition of a hit structure for the tracking, for the calorimetry, and for the final event storage format. Code Management: ================ There was general agreement that all code will be managed under CVS. A central depository will be installed in Zeuthen. A depository already exists in Lyon. Also the old programs will be made available under CVS in the near future. A release strategy will have to be decided, and will probably be closely modeled on the H1 strategy. In this a librarian defines a released version in CVS, which is identified by a unique tag. Every developer is allowed to access the depository to check in new versions of the code. A proposal was made and accepted to install a hypernews system to inform people about releases, and have a forum for bug-reports etc. Software Tools ============== A discussion was started whether the central software development should be done using specific software development tools. Examples of such systems are kdevelop, sniff, source navigator. While sniff is proprietary and requires a license, the others are open source. An attempt will be made to gain some experience with these, and to report back on one of the next meetings. Reconstruction and Analysis tools: ================================== To maintain and encourage an involvement of a wide community it is essential that as few constraints as absolutely possible are composed on these tools. Therefore multi-language support is essential. A clean definition of the interfaces was stressed again. In particular a proposal for a hit interface is urgently needed, to provide a place where simulation and reconstruction can merge. At the moment there is no major effort to develop a new version of the fast simulation tools. SIMDET and SGV will both remain available for physics analysis. It is important to make sure that they remain compatible with the new simulation scheme, so that people can easily switch between the full and the fast simulation. No clear preference was visible for either SIMDET or SGV. SUMMARY ======= Milestones: - release of SIMDET and BRAHMS and freezing: towards the end of the year, though BRAHMS might require more time depending on the complexity of separating simulation and reconstruction - definition of a data model: in part end of the year, overall for discussions next meeting - study the implications of using root as a persistency model: next meeting - study the geometry implementations: progress report next meeting Meetings: There was general agreement that one more intermediate meeting is needed before the St Malo workshop, and that there should be a dedicated simulation day directly before St Malo. Next meeting: end of February, either at CERN or at DESY April, day before St Malo: Paris