IntroductionPBS PRo version 10.0 is installed on the tesla cluster. Tesla cluster consists of three compute nodes tesla1, tesla2 and tesla3, each with 16 cpu cores and 4 Nvidia Tesla GPU’s. PBS Pro on tesla cluster is configured to allow users to run both compute based and GPU based jobs. Each node is divided into two virtual nodes (vnodes), cpu-node and gpu-node. tesla1 – cpu-node-1 & gpu-node-1 The cluster consists of six vnodes. PBS scheduler is configured such that compute based jobs are run on cpu-nodes and GPU based cuda jobs are run on gpu-nodes. How to use PBS Pro Environmental set up For c shell users set path=($path /usr/pbs/default/bin) run the command source .cshrc For bash shell users export PATH=$PATH:/usr/pbs/default/bin run the command source .bashrc PBS Queue Configuration There are four queues configured on the tesla cluster. routeq: This is the default queue in which all jobs are placed when submitted. The purpose of this queue is to route the jobs to specific queues based on the parameters specified in the job script. There are three queues configured for running compute intensive parallel jobs q14cpu: User can request for 14 cpus through this queue. To submit a job to this queue give ncpus=14 in your script or on the qsub command. For this queue the cpu time is 1400hrs. #PBS -l ncpus=14 in your script. q08cpu: User can request for 8 cpus through this queue. To submit a job to this queue give ncpus=8 in your script or on the qsub command. For this queue the cpu time is 512hrs. #PBS -l ncpus=8 in your script. To make use of the highly parallel Nvidia Tesla GPU, users can submit jobs to the qgpu queue. Each job that uses a GPU will run as a thread on a CPU core. To run a GPU based CUDA job, user has to specify the number of GPU’s as well as the corresponding number of CPU cores. qgpu: User can request for 1 to 4 cpu and 1 to 4 gpu through this queue. To submit a job to this queue give ncpus=1,2,3 or 4 or gpu=1, 2, 3 or 4 in your script or on the qsub command. The number of cpu’s requested must be equal to the number of gpu’s.
The job script for submitting two gpu job must have the following lines # PBS -l gpu=2 # PBS -l ncpus=2
Sample PBS Scripts Note: Local scratch /home/localscratch/<loginid> is available for Job runtime use. Files older than 10 days in this area will be deleted. Please do not install any software in this area. To submit MPICH2 based jobs To run the job1 MPICH2 program on 14 cpu cores, create a script script1 as follows #!/bin/csh The jobs submission command To submit CUDA based GPU jobs Each CUDA based job requries one cpu thread to run one GPU. Each GPU on the tesla node consists of 240 cores. All the four GPU’s on each node are in compute exclusive mode, where a GPU at any time can run only one job.To run the CUDA job cuda-job1 that requires one GPU, create a script cuda-script1 as follows #!/bin/csh The jobs submission command Frequently Used PBS Commands 1. To submit a job to the queue 2. To check the status of the job 3. To defer the execution of a job 4. To give the job a name 5. To remove a job from the queue 6. To know about the available queues |