The NVIDIA DGX-1 is a deep learning system, architected for high throughput and high interconnect bandwidth to maximize neural network training performance. The core of the system is a complex of Eight Tesla V100 GPUs connected in the hybrid cube-mesh NVLink network topology. In addition to the eight GPUs, DGX-1 includes two CPUs for boot, storage management, and deep learning framework coordination. DGX-1 is built into a three-rack-unit (3U) enclosure that provides power, cooling, network, multi-system interconnect, and SSD file system cache, balanced to optimize throughput and deep learning training time.
NVLink is an energy-efficient, high-bandwidth interconnect that enables NVIDIA GPUs to connect to peer GPUs or other devices within a node at an aggregate bi-directional bandwidth of up to 300 GB/s per GPU: over nine times that of current PCIe Gen3 x16 interconnections. The NVLink interconnect and the DGX-1 architecture’s hybrid cube-mesh GPU network topology enables the highest achievable data-exchange bandwidth between a group of eight Tesla V100 GPUs.
OEM – NVIDIA Corporations.
Authorized seller – LOCUZ Enterprise Solutions Ltd
GPUs – 8 x Tesla V100
GPU Memory – 256 GB total system
CPU – Dual 20-core Intel Xeon E5-2698 v4 2.2 GHz
NVIDIA CUDA cores – 40,960
NVIDIA Tensor cores (on V100 based systems) – 5,120
System Memory – 512 GB 2.133 GHz DDR4 RDIMM
Storage – 4 x 1.92 TB SSD RAID-0
Network – Dual 10 GbE
Performance – 1 Peta FLOPS.[Mixed Precision] ? Read More
|TESLA V100 GPU (NV-LINK) Performance||Single V100 GPU||TOTAL ( 8 * V100 GPU)|
|Single Precision||Up to 7.8 TFLOPS||Up to 62.4 TFLOPS|
|Double Precision||Up to 15.7 TFLOPS||Up to 125.6 TFLOPS|
|Deep Learning(Mixed Precision)||Up to 125 TFLOPS||Up to 1 PFLOPS|
Ubuntu 16.04 Linux OS – Linux x86_64 Platform
|“Running jobs without SLURM will lead to blocking of the computational account”
” Please note that the “/localscratch/” space is meant for saving your job outputs for a temporary period only. The localscratch space data older than 14 days (2 Weeks) will be deleted.
SERC does not maintain any backups of the localscratch space data, and hence will not be responsible for any data loss after the data deletion.
How to Use DGX1:
Accessing the system:
The NVIDIA-DGX1 cluster has one login node,nvidia-dgx, through which the user can access the cluster and submit jobs.
The machine is accessible for login using ssh from inside IISc network.
The machine can be accessed after applying for basic HPC access, for which:
- Fill the online HPC application form here & submit at Room: 103, SERC.
- HPC Application form must be duly signed by your Advisor/Research Supervisor.
Location of DGX 1 Cluster:
CPU Room – Ground Floor, SERC, IISc