PBS Tesla Cluster – SUPERCOMPUTER EDUCATION AND RESEARCH CENTRE

Introduction

PBS PRo version 10.0 is installed on the tesla cluster. Tesla cluster consists of three compute nodes tesla1, tesla2 and tesla3, each with 16 cpu cores and 4 Nvidia Tesla GPU’s. PBS Pro on tesla cluster is configured to allow users to run both compute based and GPU based jobs. Each node is divided into two virtual nodes (vnodes), cpu-node and gpu-node.

tesla1 – cpu-node-1 & gpu-node-1
tesla2 – cpu-node-2 & gpu-node-2
tesla3 – cpu-node-3 & gpu-node-3

The cluster consists of six vnodes. PBS scheduler is configured such that compute based jobs are run on cpu-nodes and GPU based cuda jobs are run on gpu-nodes.

How to use PBS Pro
Users can log on to tesla1, use various PBS commands to submit their jobs.

Environmental set up

For c shell users
Add the following lines in your .cshrc file

set path=($path /usr/pbs/default/bin)
set path=($path /usr/pbs/default/sbin)

run the command source .cshrc

For bash shell users
Add the following lines in your .bashrc file

export PATH=$PATH:/usr/pbs/default/bin
export PATH=$PATH:/usr/pbs/default/sbin

run the command source .bashrc

PBS Queue Configuration

There are four queues configured on the tesla cluster.

routeq: This is the default queue in which all jobs are placed when submitted. The purpose of this queue is to route the jobs to specific queues based on the parameters specified in the job script.
Note: Users cannot directly submit jobs to a particular queue. All the jobs are routed through routeq

There are three queues configured for running compute intensive parallel jobs

q14cpu: User can request for 14 cpus through this queue. To submit a job to this queue give ncpus=14 in your script or on the qsub command. For this queue the cpu time is 1400hrs.

#PBS -l ncpus=14 in your script.

q08cpu: User can request for 8 cpus through this queue. To submit a job to this queue give ncpus=8 in your script or on the qsub command. For this queue the cpu time is 512hrs.

#PBS -l ncpus=8 in your script.

To make use of the highly parallel Nvidia Tesla GPU, users can submit jobs to the qgpu queue. Each job that uses a GPU will run as a thread on a CPU core. To run a GPU based CUDA job, user has to specify the number of GPU’s as well as the corresponding number of CPU cores.

qgpu: User can request for 1 to 4 cpu and 1 to 4 gpu through this queue. To submit a job to this queue give ncpus=1,2,3 or 4 or gpu=1, 2, 3 or 4 in your script or on the qsub command. The number of cpu’s requested must be equal to the number of gpu’s.

The job script for submitting two gpu job must have the following lines

# PBS -l gpu=2
# PBS -l ncpus=2

How to submit jobs
PBS is configured on tesla to allow users to submit compute intensive parallel jobs using MPICH2, openMP and GPGPU based jobs that use cuda programming methodology.
In PBS jobs are submitted through queues.

Sample PBS Scripts
A PBS job is a shell script with PBS directives requesting for resources

Note: Local scratch /home/localscratch/<loginid> is available for Job runtime use. Files older than 10 days in this area will be deleted. Please do not install any software in this area.

To submit MPICH2 based jobs
PBS is configured on tesla to run MPICH2 jobs which can use a maximum of 14 cpu cores. All the 14 cpu cores will be on the same node. MPICH2 jobs are not allowed across nodes.

To run the job1 MPICH2 program on 14 cpu cores, create a script script1 as follows

#!/bin/csh
#PBS -lselect=1:ncpus=14:mpiprocs=14 // No of CPU’s required for job1
#PBS -lplace=excl // Allocate cpus in exclusive mode
#PBS -o /home/hpcscratch/loginid/job1.out // Output file for job1
mpirun -n 14 /home/hpcscratch/loginid/a.out // Job to be run

The jobs submission command
qsub script1

To submit CUDA based GPU jobs

Each CUDA based job requries one cpu thread to run one GPU. Each GPU on the tesla node consists of 240 cores. All the four GPU’s on each node are in compute exclusive mode, where a GPU at any time can run only one job.To run the CUDA job cuda-job1 that requires one GPU, create a script cuda-script1 as follows

#!/bin/csh
#PBS -l select=1:ncpus=1:gpu=1 // No of CPU’s and GPU’s required for cuda-job1
#PBS -o /home/hpcscratch/loginid/cuda-job1.out // output file
/home/hpcscratch/loginid/cuda-job1 // cuda job to be run

The jobs submission command
qsub cuda-script1

To submit gaussian 09 jobs

Frequently Used PBS Commands

1. To submit a job to the queue
qsub <script-name>
e.g. qsub script2

2. To check the status of the job
qstat -a
Gives the details of the job like the job number, the queue
through which it was fired etc.

3. To defer the execution of a job
qsub -a
e.g qsub -a 0401102230 script1
This could submit the job on year 2004 month 01 date 10 and hr:mm as 22:30. the same with in the script would be like:-
#PBS -a 0401102230

4. To give the job a name
qsub -N
This can also be put in the script as
# PBS N <job-name>.
This will give the job a particular name.
e.g qsub -N job1 script1.

5. To remove a job from the queue
qdel <jobid>

6. To know about the available queues
qstat -Q and qstat -q

For any help on writing job scripts. Please refer to or e-mail to: helpdesk.serc@iisc.ac.in or contact System Administrators in #109@SERC .