IntroductionTorque version 3.0.2 is installed on the delta cluster. Delta cluster consists of seven compute nodes each with 16 cores. It is configured to allow users to run compute based jobs within and across nodes. How to use Torque Users can log on to delta-cluster, use qsub command to submit their jobs. Environmental setup For c shell users : Add the following lines in your .cshrc file set path=(/opt/torque-3.0.2/bin $path)
set path=(/opt/torque-3.0.2/sbin $path)
run the command source .cshrc For bash shell users: Add the following lines in your .bashrc file export PATH=/opt/torque-3.0.2/bin:$PATH
export PATH=/opt/torque-3.0.2/sbin:$PATH
run the command source .bashrc Important Note: Specially first time users, Please check for “ssh passwordless login” to other computes nodes. Run command csh /admin/pass_delta from head node before running the mutinodes jobs. Queue Configuration The queue configuration on the delta cluster : qp16: Users can use this queue, if the number of processors are 16. For this queue the walltime limit is 24 hrs. To submit a job to this queue give as below. #PBS – l nodes=1:ppn=16:typical in your script file. For qp16, the max walltime limit is 24hrs. batch: This is the default queue in which all jobs are placed when submitted. The purpose of this queue is to route the jobs to the queue based on the parameters specified in the job script. Note: Users can not directly submit their jobs to a particular queue. All the jobs are routed through batch. How to submit jobs Torque is configured on delta to allow users to submit compute intensive parallel jobs using MVAPICH2. In Torque, jobs are submitted through queues. To Submit Job Using Torque: qsub scriptfile A sample scriptfile can be like this: #!/bin/csh
#PBS -N jobname
#PBS -l nodes=x:ppn=16 typical or debug
#PBS -l walltime=24:00:00
#PBS -e /path_of_executable/error.log
cd /path_of_executable
NPROCS=`wc -l < $PBS_NODEFILE`
HOSTS=`cat $PBS_NODEFILE | uniq | tr '\n' "," | sed 's|,$||'`
mpirun -np $NPROCS --host $HOSTS /name_of_executable
Here x is 1 (:typical) for 16 cpus, 2 for 32cpus and 4 for 64 cpus respectively. Sample Job Scripts: Note: Local scratch : /localscratch/<loginid> is available for Job runtime use. User must access this space through job scripts only. Files older than 10 days in this area will be deleted. Please do not install any software in this area. Commonly used Torque commands 1. To check the status of the job qstat -a
Gives the details of the job like the job number, the queue through which it was fired etc.
2.To remove a job from the queue qdel <job_id>
3. To know about the available queues qstat -q
Report problems to : For any problems in using this software , please contact helpdesk.serc@auto.iisc.ac.in by E-mail or contact System Administrators in 103, SERC. |