For Traditional HPC Simulations: Param Pravega

 

Introduction

PARAM Pravega is a National Supercomputer Mission (NSM) Supercomputer added to SERC’s HPC (High-Performance Computing) class of systems in January 2022. The PARAM Pravega system is a mix of heterogeneous nodes built of Intel Xeon Cascade Lake processors for the CPU nodes and NVIDIA Tesla V100 cards on the GPU nodes. Hardware is an ATOS Bull Sequana XH2000 series system, boasting a comprehensive peak compute power of 3.3 PetaFlops. The Software stack on top of the hardware is provided and supported by C-DAC. The HPC Technologies team of CDAC, Pune is instrumental in the design and implementation of the solution for end-use along-with ATOS and BULL.

The Param Pravega supercomputer consists of 11 DCLC racks of compute nodes, 2 Service racks of Master/Service nodes, and 4 Storage racks of DDN storage. The node configuration includes 2 Master nodes, 11 Login nodes, 2 Firewall nodes, 4 Management, 1 NIS Slave and 624 (CPU+GPU) compute nodes. The compute nodes are of three categories, namely, Regular CPU, High-memory CPU, and GPU nodes. All the nodes in the system are connected using Mellanox high-speed HDR-Infiniband interconnection network using a FAT-tree topology with a 1:1 subscription ratio. The system is also augmented with a 4 Petabyte parallel storage from DDN for parallel filesystem access.

Regular CPU nodes: CPU nodes are built using Intel Xeon Cascade Lake 8268 2.9 GHz processors in a 2-socket configuration with 48 cores, 192GB RAM (4 GB per core), and 480GB SSD local storage per node. There are a total of 428 such nodes on PARAM Pravega constituting 20,544 cores for CPU-only nodes for computations resulting in 1.9PF peak capability.

High-Memory CPU nodes: The system also hosts High-memory CPU-only nodes that are similar in configuration to the CPU-only nodes except that these high-memory nodes have higher RAM of 768 GB per node (16 GB per core). There are a total of 156 such nodes on this system yielding a maximum of 7488 cores for high-memory computations giving 0.694PF of peak computing capability.

GPU nodes: PARAM Pravega also hosts 40 GPU nodes. The CPU of each node consists of Intel Xeon G-6248 2.5 GHz processor in a 2-socket configuration with 40 cores, 192GB RAM and 480GB SSD local storage. The GPU of each node is made up of two Nvidia V100 Tesla 16GB (HBM2 device memory) GPU cards. Thus, the 40 GPU nodes consist of a total of 1600 host CPU cores and 80 Nvidia V100 cards. The accelerator nodes contribute a total of 0.688 PFs (0.128 (host cpus) + 0.560 (GPUs)) computational capability.

High-Speed Parallel Storage: 4 PetaBytes of usable space is provided by a high-speed parallel Lustre filesystem with a throughput of 100 GB per second. The storage subsystem is connected to the machine using Infiniband interconnection.

High-Speed Interconnect: The system is integrated with a BullSequana XH200 Mellanox HDR Infiniband interconnection using FAT-Tree topology for MPI communications. The line speed of the interconnection is 100Gbps. Apart from this the system also has a secondary 10Gbps Ethernet connection for login and storage communication.

Software environment: The entire system is built to operate using Linux OS based on CentOS 7.x distribution. The machine hosts an array of program development tools, utilities, and libraries for ease of development and execution of HPC applications on the heterogeneous hardware of the machine. Common compilers from GNU and Intel are accessible on the system for MPI and OpenMP parallel libraries. For use on the GPU nodes the system has CUDA and OpenACC SDKs installed. Further, popularly used parallel mathematical, scientific, and application libraries like Intel-MKL, GNU Scientific library, HDF5, NetCDF, range of Python-based mathematical and data manipulation libraries etc. are installed on the system. Added to these, the machine also hosts system monitoring and management tools developed by CDAC team.

The system is accessible to users through the login nodes with domain name parampravega.iisc.ac.in and can be used for launching jobs in batch execution mode. User is expected to ssh into the login nodes and create appropriate job scripts and submit these scripts to the SLURM batch scheduling software. The parallel file system is accessible to users through job scripts and is meant for use during job execution.

Hardware Vendor:

ATOS
Software stack provided and supported by:
CDAC

Param Pravega has login nodes, through which the user can access the machine and submit jobs. The machine is accessible for login using ssh from inside IISc network (ssh computational_id@parampravega.iisc.ac.in).

The machine can be accessed after applying for basic HPC access, for which:

  • Fill the online computational account form here & submit through the mail to nisadmin.serc@iisc.ac.in.
  • HPC Application form must be duly signed by your Advisor/Research Supervisor.
  • Once the computational account is created, Kindly fill the Param Pravega Access Form .

 

Steps for execution on Param Pravega.

 

Documents Provided by CDAC

  • Param Pravega User guide – pdf
  • Spack User Guide – pdf

 

Helpdesk

For any queries, raise a ticket in the https://parampravega.iisc.ac.in/support helpdesk portal, powered by osTicket.