TESLA CLUSTER

Introduction :-

Tesla cluster in SERC is consist of three compute nodes. Each compute node is an SMP node built using 16 AMD-Opteron cores housed in 4-Quad-core CPUs. Each of these compute nodes is also connected to a NVIDIA-Tesla S1070 GPGPU node. Each Tesla node is composed of 4 GPUs with each GPU made up of 240 processor cores. The cluster is managed by PBSPro workload manager to distinguish and allow compute as well as GPU based jobs. Each compute job can use a maximum of 14 CPUs on this cluster since multi-node jobs are disabled. For GPU-based jobs each GPU needs a CPU-bound thread to drive the computation on it and hence the compute node CPU-resources are divided into two PBS virtual nodes namely the cpu-node and gpu-node. The jobs that get to be scheduled on these virtual nodes are identified based on the PBS job script variables as described under the workload manager section. The user needs to define appropriate PBS variables to define whether his jobs are GPU based or only CPU-based. Based on these variables PBSPro workload manager automatically routes the job into execution queues to schedule to appropriate vnodes. Each GPU is configured to be used in exclusive mode by a job and the job can use one or a maximum of 4 GPUs at a time. The compute jobs can use MPI or OpenMP based codes and the GPU jobs are built using the NVIDIA CUDA libraries.

Vendor :-

1. OEM – SuperMicro
Authorised Seller – Netweb Technologies Bangalore India.
2. OEM – Nvidia Tesla
Authorised Seller – M/S. INT Infosolution Gmbh Hamburg Germany.

Hardware Overview :-
Each node of the cluster consists of

  • Four AMD Quad-Core Opteron 8378 processors with 2.4Ghz clock speed
  • 64GB Main Memory
  • 500GB of Disk Space with 250GB localscratch
  • Nvidia Tesla S1070 1U server with 4 GPU’s operating at 1.296Ghz
  • Gigabit Ethernet Connectivity

System Softwares/Libraries :-

  • Fedora 10(Cambridge) Operating System- Linux x86_64 Platform
Application Softwares/Libraries :-
  • Intel Software Suites

Intel C++ Compiler Professional Edition for Linux
Intel Fortran Professional Edition for Linux

  • Workload management

Portable Batch System Professional (version 10.0)

  • CUDA Programming Tips for MPI Programmer

On Using Multiple CPU Threads to Manage Multiple GPUs under CUDA

Recent Activities on CUDA Programming
Location of Tesla Cluster
  • CPU Room Ground Floor SERC.
Hostname of the machine
  • tesla1.serc.iisc.ernet.in

Accessing the systemThe Tesla cluster has one login node,tesla1, through which the user can access the cluster and submit jobs. The machine is accessible for login using ssh from inside IISc network (ssh computational_userid@tesla1.serc.iisc.ernet.in). The machine can be accessed after applying for basic HPC access, for which:

  • Fill the online HPC application form here & submit at Room: 117, SERC.
  • HPC Application form must be duly signed by your Advisor/Research Supervisor.

Helpdesk

For any queries, raise a ticket in the helpdesk or please contact System Administrator, #103,SERC.