Queues on Param Pravega

NAME Priority Min Core/gpu Max core/gpu Max Walltime (HH:MM:SS) Max Queued jobs per user Max Running Job per user Overall Running jobs
DEBUG 4000 96 528 01:00:00 3 1 20
SMALL 3400 96 1056 24:00:00 9 4 70
SMALL72 3400 96 1056 72:00:00 4 2
MEDIUM 3700 1104 8160 24:00:00 2 1 15
LARGE 4000 8160 20400 24:00:00 2 1 1
GPUSINGLENODE 1000 Gres/gpu=2,
node=1
96:00:00 8 3 20
GPUSINGLENODE2DAYS 1000 Gres/gpu=2,
node=1
48:00:00 8 3 20
GPUMULTINODE 2000 gres/gpu=3 cpu=800,
gres/gpu=40
48:00:00 6 2 20
HIGHMEMORY 3700 240 24:00:00 9 4 35
Note:
  • The node limit for SMALL is 250 and for SMALL72 it is 100.
  • The node limit for gpusinglenode and gpusinglenode2days is 20 nodes each.

 
Param Pravega Queue Policy and Parameters that Impact Waiting Times

Parameter values set in slurm for all partitions
Partitions PriorityJobFactor PriorityTier
Debug 4000 3400
Small 1500 3400
Small72 1500 3400
Medium 3000 3400
Large 5000 3400
Gpusinglenode 1000 1
Gpusinglenode2days 1000 1
Gpumultinode 2000 1
Highmemory 3000 3400
PriorityWeightAge 750000
PriorityWeightFairshare 10000
PriorityWeightJobSize 1000
PriorityWeightPartition 200000
PriorityWeightQOS 600000

Age Factor: The age factor represents the length of time a job has been sitting in the queue and eligible to run. In general, the longer a job waits in the queue, the larger its age factor grows. However, the age factor for a dependent job will not change while it waits for the job it depends on to complete. Also, the age factor will not change when scheduling is withheld for a job whose node or time limits exceed the cluster’s current limits.
PriorityWeightAge: An unsigned integer that scales the contribution of the age factor.

Fair-share Factor: The fair-share component to a job’s priority influences the order in which a user’s queued jobs are scheduled to run based on the portion of the computing resources they have been allocated and the resources their jobs have already consumed. The fair-share factor does not involve a fixed allotment, whereby a user’s access to a machine is cut off once that allotment is reached.
PriorityWeightFairshare: An unsigned integer that scales the contribution of the fair-share factor.

Job size Factor: The job size factor correlates to the number of nodes or CPUs the job has requested. This factor can be configured to favor larger jobs or smaller jobs based on the state of the PriorityFavorSmall boolean in the slurm.conf file.
PriorityWeightJobSize: An unsigned integer that scales the contribution of the job size factor.

Partition Factor: Each node partition can be assigned an integer priority. The larger the number, the greater the job priority will be for jobs that request to run in this partition. This priority value is then normalized to the highest priority of all the partitions to become the partition factor.
PriorityWeightPartition: An unsigned integer that scales the contribution of the partition factor.

QOS Factor: Each QOS can be assigned an integer priority. The larger the number, the greater the job priority will be for jobs that request this QOS. This priority value is then normalized to the highest priority of all the QOS’s to become the QOS factor.
PriorityWeightQOS: An unsigned integer that scales the contribution of the quality of service factor.

For Submitting the job

Launches a Parallel job on Param pravega -srun 

Ex: srun -Nx -p debug –pty /bin/bash
Launches  an Open MPI  jobs on Param pravega  – mpirun  
Ex: mpirun -np 1056 ./main.out > output.txt
Launches an MPI job using the Hydra process manager – mpiexec.hydra
Ex : mpiexec.hydra  -np 1040  ./main.out > output.txt

Partition Name Srun command (x=number of nodes) Sbatch command (x=number of nodes) Script
DEBUG srun -Nx -p debug –pty /bin/bash sbatch -Nx -p debug samplescript.sh #SBATCH –partition=debug
SMALL srun -Nx -p small –pty /bin/bash sbatch -Nx -p small samplescript.sh #SBATCH –partition=small
MEDIUM srun -Nx -p medium –pty /bin/bash sbatch -Nx -p medium samplescript.sh #SBATCH –partition=medium
LARGE srun -Nx -p large –pty /bin/bash sbatch -Nx -p large samplescript.sh #SBATCH –partition=large
GPUSINGLENODE srun -Nx –gres=gpu:2 -p gpusinglenode –pty /bin/bash sbatch –Nx –gres=gpu:2 -p gpusinglenode samplescript.sh #SBATCH –partition=gpusinglenode #SBATCH –gres=gpu:number of gpu
GPUMULTINODE srun -Nx –gres=gpu:2 -p gpmultinode –pty /bin/bash sbatch –Nx –gres=gpu:2 -p gpumultinode samplescript.sh #SBATCH –partition= gpumultinode #SBATCH –gres=gpu:number of gpu

NOTE: For gpu partition use –gres=gpu:number of gpu

For Submitting a job with High Priority***

To submit the jobs in high priority user need to add “#SBATCH -A (high priority account name)” in the job script.