Torque Configuration on Delta Cluster

Introduction

Torque version 3.0.2 is installed on the delta cluster. Delta cluster consists of seven compute nodes each with 16 cores. It is configured to allow users to run compute based jobs within and across nodes.

How to use Torque

Users can log on to delta-cluster, use qsub command to submit their jobs.

Environmental setup

For c shell users :

Add the following lines in your .cshrc file

set path=(/opt/torque-3.0.2/bin $path)
set path=(/opt/torque-3.0.2/sbin $path)

run the command source .cshrc

For bash shell users:

Add the following lines in your .bashrc file

export PATH=/opt/torque-3.0.2/bin:$PATH
export PATH=/opt/torque-3.0.2/sbin:$PATH

run the command source .bashrc

Important Note: Specially first time users, Please check for “ssh passwordless login” to other computes nodes. Run command csh /admin/pass_delta from head node before running the mutinodes jobs.

Queue Configuration

There are four queues configured on the delta cluster :

idqueue: This queue is meant for testing codes. Users can use thisqueue, if the number of processors are 16. For this queue the walltime limit is 2 hrs. To submit a job to this queue give as below

#PBS – l nodes=1:ppn=16:debug

in your script file.

qp16: Users can use this queue, if the number of processors are 16. For this queue the walltime limit is 24 hrs. To submit a job to this queue give as below.

 #PBS – l nodes=1:ppn=16:typical

in your script file.

qp32: User can request for thirty-two CPUs through this queue. To submit a job to this queue give as below

#PBS – l nodes=1:ppn=32:large

in your script file.

qp64: User can request for sixty-four CPUs through this queue. To submit a job to this queue give as below

#PBS -l nodes=4:ppn=16:regular

in your script file.

For qp16,qp32 and qp64 the max walltime limit is 24hrs.

batch: This is the default queue in which all jobs are placed when submitted. The purpose of this queue is to route the jobs to the queue based on the parameters specified in the job script.

Note: Users can not directly submit their jobs to a particular queue. All the jobs are routed through batch.

How to submit jobs

Torque is configured on delta to allow users to submit compute intensive parallel jobs using MVAPICH2. In Torque, jobs are submitted through queues.

To Submit Job Using Torque:

qsub scriptfile

A sample scriptfile can be like this:

#!/bin/csh
#PBS -N jobname
#PBS -l nodes=x:ppn=16 (or 32):regular (or typical or large)
#PBS -l walltime=24:00:00
#PBS -e /path_of_executable/error.log
cd /path_of_executable
NPROCS=`wc -l < $PBS_NODEFILE`
HOSTS=`cat $PBS_NODEFILE | uniq | tr '\n' "," | sed 's|,$||'` 
mpirun -np $NPROCS --host $HOSTS /name_of_executable

Here

Note: Local scratch : /localscratch/<loginid> is available for Job runtime use. User must access this space through job scripts only. Files older than 10 days in this area will be deleted. Please do not install any software in this area.

Commonly used Torque commands

1. To check the status of the job

qstat -a
Gives the details of the job like the job number, the queue through which it was fired etc.

2.To remove a job from the queue

qdel <job_id>

3. To know about the available queues

qstat -q

Report problems to :

For any problems in using this software , please contact helpdesk.serc@auto.iisc.ac.in by E-mail or contact System Administrators in 103, SERC.