Abnormal Termination Processing (ATP) -Cray

Abnormal Termination Processing (ATP) monitors user applications. When the atp module is loaded and ATP is enabled, ATP is launched when a job is started and delivers a heuristically determined set of core files in the event of an application crash. If an application takes a system trap, ATP performs analysis on the dying application. All stack backtraces of the application processes are gathered into a merged stack backtrace tree and written to disk as the file, atpMergedBT.dot

The stack backtrace tree for the first process to die is sent to stderr as is the number of the signal that caused the application to fail. The atpMergedBT.dot file can be viewed with statview, (the Stack Trace Analysis Tool viewer). The merged stack backtrace tree provides a concise yet comprehensive view of what the application was doing at the time of its termination.

Available Version: 2.1.1

To load the module

module load atp/2.1.1

Note: The statview command is available only when the stat module is loaded.

module load stat

To load the module in debugging mode

module load atp/2.1.1_debug

ATP is designed to analyze failing applications. It does not play any role with commands. That is, an application must use a supported parallel programming model, such as MPI, SHMEM, OpenMP, CAF, or UPC, in order to benefit from ATP analysis. When the atp module is loaded, ATP sets the MPICH_ABORT_ON_ERROR, SHMEM_ABORT_ON_ERROR, and DMAPP_ABORT_ON_ERROR environment variables. This enables MPI, SHMEM, and DMAPP applications to raise a signal when they discover usage errors—rather than only printing to stderr and exiting—which therefore enables ATP to notice the problem and perform its analysis.

Note: ATP is disabled by default. To use it you have to set ATP_ENABLED=1 in your batch script.

Report Problems to:

If you encounter any problem in using this software please report to SERC helpdesk at the email address helpdesk.serc@auto.iisc.ac.in or contact System Administrators in #103, SERC