Torque Configuration on Tyrone Cluster

Introduction

Torque version 3.0.2 is installed on the tyrone cluster.
tyrone cluster consists of nine compute nodes each with 32 cores. It is configured to allow users to run compute based
jobs within and across nodes.

How to use Torque

Users can log on to tyrone-cluster, use qsub command to submit their jobs.

Environmental setup

For c shell users :

Add the following lines in your .cshrc file

set path=(/opt/toque-305/bin $path)
set path=(/opt/toque-305/sbin $path)

run the command source .cshrc

For bash shell users:

Add the following lines in your .bashrc file

export PATH=$PATH:/opt/toque-305/bin
export PATH=$PATH:/opt/toque-305/sbin

run the command source .bashrc Important Note: Specially first time users, Please check for “ssh passwordless login”
to other computes nodes. Run command csh /admin/pass_tyrone from head node before running the mutinodes jobs.

Queue Configuration

There are five queues configured on the tyrone cluster :

idqueue: This queue is meant for testing codes. Users can use this queue, if the number of processors is between

16-32. For this queue the walltime limit is 2 hrs. To submit a job to this queue give as below

#PBS – l nodes=1:ppn=x:debug

x can be 16 to 32 processors

qp32: User can request for thirty-two CPUs through this queue. To submit a job to this queue give as below

#PBS – l nodes=1:ppn=32:regular
in your script file.

qp64: User can request for sixty-four CPUs through this queue. To submit
a job to this queue give as below

#PBS -l nodes=2:ppn=32:regular

#!/bin/sh
#Sample mpi job script for 32-core fluent run
#PBS -N jobname
#PBS -l nodes=1:ppn=32:regular
#PBS -l walltime=24:00:00
#The localscratch directory on different execution nodes of 
the tyrone cluster is hosted on
#different storage servers. Hence, files required for execution 
need to be copied to 
#/localscratch from the user home area in the script.
cd /localscratch/${USER}
cp $HOME/<program_directory>/* .
NPROCS=`wc -l < $PBS_NODEFILE` /home/pkg/lic/ansys/13/linux/64/v130
/fluent/bin/fluent -g 2ddp -t${NPROCS} -ssh -i /localscratch
/${USER}/inputfile > /localscratch/${USER}/outputfile
mv ./* $HOME/<program_directory>

 

in your script file.

qp128: User can request for one hundred twenty-eight CPUs through this
queue. To submit a job to this queue give as below

#PBS -l nodes=4:ppn=32:regular

in your script file.

For qp32, qp64 and qp128, the max walltime limit is 24hrs.

batch: This is the default queue in which all jobs are placed when submitted. The purpose of this queue is to route the jobs to the queue based on the parameters specified in the job script.

Note: Users cannot directly submit jobs to a particular queue. All the jobs are routed through batch.

How to submit jobs

Torque is configured on tyrone to allow users to submit compute intensive parallel jobs using MVAPICH2. In Torque, jobs are submitted through queues.

 

To Submit Job Using Torque:

qsub scriptfile

A sample scriptfile can be like this:
#!/bin/sh
#PBS -N jobname
#PBS -l nodes=x:ppn=32:regular
#PBS -l walltime=24:00:00
cd /path_of_executable
NPROCS=`wc -l < $PBS_NODEFILE`
HOSTS=`cat $PBS_NODEFILE | uniq | tr '\n' "," | sed 's|,$||'` 
mpirun -np $NPROCS --host $HOSTS /name_of_executable

Here

Note:
Local scratch /localscratch/<loginid> is available for Job runtime use. User must access this space through job scripts only. Files older than 10 days in this area will be deleted. Please do not install any software in this area.

Commonly used Torque commands

1. To check the status of the job

qstat -a

Gives the details of the job like the job number, the queue through which
it was fired etc.

2.To remove a job from the queue

qdel <jobid>

3. To know about the available queues

qstat -q
ReportProblems to:

If you encounter any problem in using Torque please report to SERC helpdesk at the email address helpdesk.serc@auto.iisc.ac.in or contact System Administrators in #103, SERC.