NAME | Priority | Min Core/gpu | Max core/gpu | Max Walltime (HH:MM:SS) | Max Queued jobs per user | Max Running Job per user | Overall Running jobs |
DEBUG | 4000 | 96 | 528 | 01:00:00 | 3 | 1 | 20 |
SMALL | 3400 | 96 | 1056 | 24:00:00 | 9 | 4 | 70 |
SMALL72 | 3400 | 96 | 1056 | 72:00:00 | 4 | 2 | |
MEDIUM | 3700 | 1104 | 8160 | 24:00:00 | 2 | 1 | 15 |
LARGE | 4000 | 8160 | 20400 | 24:00:00 | 2 | 1 | 1 |
GPUSINGLENODE | 1000 | Gres/gpu=2, node=1 |
96:00:00 | 8 | 3 | 20 |
|
GPUSINGLENODE2DAYS | 1000 | Gres/gpu=2, node=1 |
48:00:00 | 8 | 3 | 20 |
|
GPUMULTINODE | 2000 | gres/gpu=3 | cpu=800, gres/gpu=40 |
48:00:00 | 6 | 2 | 20 |
HIGHMEMORY | 3700 | 240 | 24:00:00 | 9 | 4 | 35 |
- The node limit for SMALL is 250 and for SMALL72 it is 100.
- The node limit for gpusinglenode and gpusinglenode2days is 20 nodes each.
Param Pravega Queue Policy and Parameters that Impact Waiting Times
Partitions | PriorityJobFactor | PriorityTier |
Debug | 4000 | 3400 |
Small | 1500 | 3400 |
Small72 | 1500 | 3400 |
Medium | 3000 | 3400 |
Large | 5000 | 3400 |
Gpusinglenode | 1000 | 1 |
Gpusinglenode2days | 1000 | 1 |
Gpumultinode | 2000 | 1 |
Highmemory | 3000 | 3400 |
PriorityWeightAge | 750000 |
PriorityWeightFairshare | 10000 |
PriorityWeightJobSize | 1000 |
PriorityWeightPartition | 200000 |
PriorityWeightQOS | 600000 |
Age Factor: The age factor represents the length of time a job has been sitting in the queue and eligible to run. In general, the longer a job waits in the queue, the larger its age factor grows. However, the age factor for a dependent job will not change while it waits for the job it depends on to complete. Also, the age factor will not change when scheduling is withheld for a job whose node or time limits exceed the cluster’s current limits.
PriorityWeightAge: An unsigned integer that scales the contribution of the age factor.
Fair-share Factor: The fair-share component to a job’s priority influences the order in which a user’s queued jobs are scheduled to run based on the portion of the computing resources they have been allocated and the resources their jobs have already consumed. The fair-share factor does not involve a fixed allotment, whereby a user’s access to a machine is cut off once that allotment is reached.
PriorityWeightFairshare: An unsigned integer that scales the contribution of the fair-share factor.
Job size Factor: The job size factor correlates to the number of nodes or CPUs the job has requested. This factor can be configured to favor larger jobs or smaller jobs based on the state of the PriorityFavorSmall boolean in the slurm.conf file.
PriorityWeightJobSize: An unsigned integer that scales the contribution of the job size factor.
Partition Factor: Each node partition can be assigned an integer priority. The larger the number, the greater the job priority will be for jobs that request to run in this partition. This priority value is then normalized to the highest priority of all the partitions to become the partition factor.
PriorityWeightPartition: An unsigned integer that scales the contribution of the partition factor.
QOS Factor: Each QOS can be assigned an integer priority. The larger the number, the greater the job priority will be for jobs that request this QOS. This priority value is then normalized to the highest priority of all the QOS’s to become the QOS factor.
PriorityWeightQOS: An unsigned integer that scales the contribution of the quality of service factor.
For Submitting the job
Launches a Parallel job on Param pravega -srun
Ex: srun -Nx -p debug –pty /bin/bash
Launches an Open MPI jobs on Param pravega – mpirun
Ex: mpirun -np 1056 ./main.out > output.txt
Launches an MPI job using the Hydra process manager – mpiexec.hydra
Ex : mpiexec.hydra -np 1040 ./main.out > output.txt
Partition Name | Srun command (x=number of nodes) | Sbatch command (x=number of nodes) | Script |
DEBUG | srun -Nx -p debug –pty /bin/bash | sbatch -Nx -p debug samplescript.sh | #SBATCH –partition=debug |
SMALL | srun -Nx -p small –pty /bin/bash | sbatch -Nx -p small samplescript.sh | #SBATCH –partition=small |
MEDIUM | srun -Nx -p medium –pty /bin/bash | sbatch -Nx -p medium samplescript.sh | #SBATCH –partition=medium |
LARGE | srun -Nx -p large –pty /bin/bash | sbatch -Nx -p large samplescript.sh | #SBATCH –partition=large |
GPUSINGLENODE | srun -Nx –gres=gpu:2 -p gpusinglenode –pty /bin/bash | sbatch –Nx –gres=gpu:2 -p gpusinglenode samplescript.sh | #SBATCH –partition=gpusinglenode #SBATCH –gres=gpu:number of gpu |
GPUMULTINODE | srun -Nx –gres=gpu:2 -p gpmultinode –pty /bin/bash | sbatch –Nx –gres=gpu:2 -p gpumultinode samplescript.sh | #SBATCH –partition= gpumultinode #SBATCH –gres=gpu:number of gpu |
NOTE: For gpu partition use –gres=gpu:number of gpu
For Submitting a job with High Priority***
To submit the jobs in high priority user need to add “#SBATCH -A (high priority account name)” in the job script.