National Energy Research Scientific Computing (NERSC)#


Perlmutter, a HPE Cray supercomputer at NERSC, is a heterogeneous system with both GPU-accelerated and CPU-only nodes. Phase 1 of the installation is made up of 12 GPU-accelerated cabinets housing over 1,500 nodes. Phase 2 adds 12 CPU cabinets with more than 3,000 nodes. Each GPU node of Perlmutter has 4x NVIDIA A100 GPUs.

Conda environment#

For connecting to Perlmutter, check documentation. One can also configure SSH according to the instructions. To connect to Perlmutter via terminal, use:

$ ssh <username>

Load the pre-installed modules available on Perlmutter:

$ module load PrgEnv-nvidia cudatoolkit python
$ module load cudnn/8.2.0

Then, create a conda environment:

$ conda create -n dh python=3.9 -y
$ conda activate dh
$ conda install gxx_linux-64 gcc_linux-64

Now a crucial step is to install CUDA aware mpi4py, following the instructions given in the mpi4py documentation:

$ MPICC="cc -target-accel=nvidia80 -shared" CC=nvc CFLAGS="-noswitcherror" pip install --force --no-cache-dir --no-binary=mpi4py mpi4py

Perlmutter (as of November 2022) lacks support for a few mpi4py functionalities such as the MPI_Mprobe. More details are provided on the NERSC issues page. The workaround is to disable using matched probes to receive objects as follows:

$ export MPI4PY_RC_RECV_MPROBE='False'

Then we install some of the other dependencies:

$ pip install tensorflow==2.9.2
$ pip install kiwisolver
$ pip install cycler
$ pip install matplotlib
$ pip install progressbar2
$ pip install networkx[default]

Finally we install deephyper:

$ pip install deephyper

Finally you can verify the version of the installed deephyper package:

$ python
>>> import deephyper
>>> deephyper.__version__

Do not forget to reload the installed dependencies each time you want to use DeepHyper:

module load PrgEnv-nvidia cudatoolkit python
module load cudnn/8.2.0
source /global/common/software/nersc/pm-2022q3/sw/python/3.9-anaconda-2021.11/etc/profile.d/
export MPI4PY_RC_RECV_MPROBE='False'
conda activate dh