Abnormal Job Termination

Termination because of CPU limit exceeded
- jobs get an XCPU signal that can be catched by the job. In that case termination procedures can be executed, before the SIGKILL signal is sent
- SIGKILL will be sent a few minutes after XCPU was sent. It cannot be catched.

Restart after the ececution host has crashed
- if a host crashes when a given job is running, the job will be restarted. In that case the variable RESTARTED is set to 1
- The job will be reexecuted from the beginning on any free host. If the job can be restarted using results achieved so far, then check for the variable RESTARTED and force the job to be executed on the same host by inserting
  qalter -q $QUEUE $JOB_ID
  in your job script

Signalling the end of the job
- with the qsub option -notify a SIGUSR1/SIGUSR2 signal is sent to the job one minute before the job is suspended/killed (configurable queue attribute notify)
  (see: http://www-zeuthen.desy.de/www_users/rz/maillists/linux/msg00005.html)

Termination because of CPU limit exceeded